Power Law, Bell Curves and 100%

In 2014 all students could read and do math.  Fact.  That’s what the No Child Left Behind (NCLB) law promised.  100%!

No, in reality, we fell short.  We did as well as law enforcement solved the crime problem–our focus got better (standards!  Common Core!), we used new tools (statistics!) and the numbers went down.  It brought a lot of needed focus on those groups normally ignored, especially those from lower socioeconomic backgrounds.  I don’t know if we’ll ever fix either problem 100% because, humans.

The Bell Curve

Prior to NCLB the “bell curve” was the norm.  Here’s a picture of it.

staninepicture

That is, a few kids got the “A”, a few kids failed, and a whole bunch took up the middle.  I remember my sister, in her physics class, would purposely get a few problems wrong because she would otherwise mess up the curve.  I had the same teacher, and getting half correct was good for a “B”.

Those days are (mostly) gone.  With the introduction of the rubric educators, began to realize that the goal of education should be 100% of students meeting the standard.  Parsing out a few “A” grades does not show rigor, but control–a great teacher should get all students up to their high standard.  The focus became on finishing the marathon, not so much worrying about who finished first (or last).  The bell curve, as far as measuring achievement, has (mostly) been phased out.

In the early days of NCLB our school became really interested in the “stanine”.*  The stanine is where that bell curve is divided up into 9 sections, based on standard deviations.  The “norm” falls in sections 4-6  and accounts for about half the students.  You’ll note that a few kids are a 9 and a few kids are a 1.  We used it a lot and the stanine was in the data sheets we always got.

At some point, someone realized that while it told us where the student was in relation to his or her peers, it did not tell us much about where they were in relation to the content.  After all, “In the kingdom of the blind, the one eyed man is king!”

That was when the age of the rubric and standards arrived to save us all.

The 100% of the Rubric

You can’t have no children left behind AND use a bell curve.  The bell curve assumes some kids are at the bottom.

It is important to understand the philosophical shift from one to the other.  There are many.  With the bell curve you not only assume that there are kids who will fail but also that you can only have a few superstars.  A graph like this should report known information, but too many see it as destiny–only a few can be stars.

In fact, in the bell curve world, if there are too many stars it diminishes the value of that designation.  Our entire discussion on grade inflation is centered around this point of view–only a few should be getting “A” grades or grades have no value.

Use of the bell curve also slots kids.  I was a “B-” student.  Educators crammed kids into the curve instead of looking at the data they had.  The data should determine graph!

At some point someone asked “What is an ‘A’ anyway”?  Thus began the movement towards rubrics.**   What helps me explain it to parents is the metaphor I hinted at above–the marathon.  Only one person can win a marathon, but nearly anyone can run and finish one.  Our school system is built to get everyone running, so why are we still using a stopwatch?  The rubric is the race’s distance.

Philosophically, having a rubric means all students can meet the standard.  Or, to use the metaphor, finish the marathon.  Some, with IEPs, might need support or a lot more time, but all can do it.  A plethora of “A” grades means that everyone is meeting the standard (and that the bar needs to be raised a bit, but that’s another discussion).

Of course, having a rubric means you have to define the standard.  And what “success” looks like.  The days of “I’ll know it when I see it” grading have faded.  Students and parents now ask for the rubric.  Teachers offer “exemplars” of good work.  Schools began “calibrating” and double scoring to make sure a “3” essay was a “3” essay no matter who grades it.  Grading became fair, or at least more objective.

The philosophical shift continued as letter grades began to be replaced with the 1-4 score.  Letters are a leftover of the old bell curve, with a history that parents intuitively and reflexively know and react to.  In fact, our grade level kept letters for that reason–a parent saw a “C” and they’d act, lean on the kid and come in for a conference.  A “2” never has the same effect.

Those words changed, too.  A “2” is for “approaching” while a “1” indicates “starting”.  Compare that to the “F” for “fail”.  There is little growth mindset in an “F”.***  Thanks to Carol Dweck educators now see students as pliable.  We talk about growth and fixed mindsets, even if not everyone (students, parents, some teachers) got the memo.  Anyone can do it.  No child SHOULD be left behind.

The philosophical shift of rubrics and no child being left behind has bloomed into other ideas, most notable the Proficiency Based Learning (PBL) movement.  Some notable changes include moving from a 100 point scale to using 1, 2, 3, 4 with targets, removing averaging in the “0” for work not done, and the idea of practicing a skill with formative assessments before offering the summative.  These have spawned the “re-do” and no-homework movements.  Full disclosure, I agree.  I agree with whatever means kids leave my classroom with the knowledge.

Of course, upsetting the applecart of traditional “winners” and “losers” rankles a few (mostly, the traditional winners).  The bell curve is great for those on the right side of the curve.  In a fixed mindset world, even those throughout the bell curve accept their fate.  Besides grade inflation concerns, this whole movement has been lumped into the complaints that if “everybody gets a trophy” then they are worth little.  If you adhere to the original orthodoxy, that’s true.  At our 8th grade “graduation” we stopped giving specific academic awards.  Instead, we celebrated a high level of achievement with the “Presidential” award and then let each kid identify a way they excelled over the past year.  No one in the audience felt it cheap.

While the PBL movement has done a lot to push all kids to succeed using a growth mindset, how can schools look at data in a meaningful way?

Power Rule

Vilfredo Pareto was a 19th century Italian economist who first articulated the 80:20 rule.  He showed that 80% of the land in Italy was owned by 20% of Italians.  Joseph M. Juran later theorized that 80% of effects come from 20% of causes. and named it the Pareto Principle.

As I look at the data laid out before me, I wonder how often I see this.  80% of my time seems to be taken up by 20% of my students (this is true in both academics and behavior).  80% of my students seem to meet the standard while 20% struggle.  It certainly paints a nice picture.  My fear is that, like the bell curve, we begin to slot kids.  If 20% of the kids are destined to fail, I’m off the hook of expecting ALL kids to make it.  If we are speaking of equity, why do 20% of kids get my attention (if you have your own kid, you know you want your kid’s share of the teacher’s attention).  I try and remember: Fair does not always mean equal.  Still….

But the 80:20 rule is something to keep in mind.  For example, my wife covers the papers of her students in comments.  I make about 20% worth of comments that she does, and I suspect (anecdotal) that it gets 80% of the effect.  Perhaps, 80% of the learning is due to 20% of my teaching (probably the lesson on Tuesday).  But, just as there are 20% that might not be meeting the standard, it might also show 20% are above the standard and need more.

Ah, I’ve just make a defacto bell curve.  That is the danger here.  Once the 80:20 rule starts become a rule more than a way to state what you observe there is a danger of fixed destiny.  As Sara Connor says, “No fate.”

So how about “power law”?  Power law demonstrates a relationship between two quantities.  The change in one results in a change in the other, but as a power of the other.300px-long_tail-svg  Like most things, it is best explained by Wikipedia.  To use their example, increase the side of a square and the area increases squared.

Here is a nice graph of it, and the 80:20 rule, too (green:yellow).

If you think in terms of power law, you begin to think of two things.  First, what variable can you change to get exponential change on the other end.  Not assigning homework, for example, and having kids write in class made a huge difference in learning for my students.  Second, power law makes one look at the tail–that’s what the 20% is called (in the graph, you can see it to the right, and it’s not always 20% or even close to that).

That 20% will take 80% more effort to move.  But, you could find the 20% change that moves it 80% results-wise.  Of course, I’m having a bit of fun here–if only it were that easy!  But playing with those numbers breaks the rut we educators put ourselves in.

Let’s look at using this graph in a very different way, though.  What if the 80% was a student being successful and 20% falling short.  How to measure that?  Traditionally, that would be an average.  But averages are fickle.  We distinguish between the median and mean in case Bill Gates walks in and someone thinks I’m a billionaire because that would be our average salary.  In our PBL world we are moving away from the average.

Does that far right end of the tail show the story?  Of course, in the grade book, that extreme tail score would be in there.  It might be a lesson learned.  But the story of competence is the 80% as much as the 20%

Our current grading program, Jumprope, allows for power law to be used in grading.  In short, the weight of older grades shrinks.  The more recent grades weigh the most, because we care more for what students can do today.  I thought it was the best between a history of competence (average) and what they can do now.

Except, if the last attempt was a bomb.  Thus, a history of success gets marred by a stumble.  Fair?  No, but such snapshots rarely are.

Where Does That Put Us?

Changing.  Always.  That’s where it puts us.  Education is a forever war.  And why is that important?  Because anytime an administrator or consultant (or both!) comes at you with charts, be skeptical.  I love graphs and data and especially predicting (which is a lot of folly, but fun folly).

But I got human being in front of me.  That’s who I gotta teach.  And believing in a growth mindset, each is a forever battle in our long, wonderful forever war.

Suit up.

*Some called it the “Stat 9”.  The story I heard was that, during World War II, the Army was looking for pilots.  They wanted to rank them on a ten scale, but with the limited, new IBM computers using the tens column would require a lot of extra power–so they went 1-9.

**In our Proficiency Based Learning (PBL) paradigm our school is now using the term “target” instead of rubric.  The target is the next step towards proficiency, while the rubric is a static tool in a document that, with a little tweaking, becomes a target on professional development days.

***Our school’s solution to the “F = fail” issues was to use an “N” for “no evidence”.  Parents always asked, “What the heck is an ‘N’?”  

Not knowing an “F” was for “fail” students often asked why it went from “D” to “F”–what about the “E”?  At one school I worked, they had an “E” which was “failing with effort.”  Instead of blowing work off, their work was marginal.  Because of how the final yearly grade was calculated, an “E” for the semester averaged in better than an “F” and allowed hapless kids to get credit if they rebounded with a “D” or better in the other semester and final exam.

Evaluating a Program With Minimal Data Points: Step 3: Where in the Year is the Gain?

Now we are getting to the meat of the sandwich!

Why Precise Accountability Is Important

Just because a student is learning does not mean a teacher is teaching.

Can we take credit for success?  Or failure?  We need to know the effectiveness of our program if we hope to increase that effectiveness.  What, for example, if the school year gains only make up for a huge summer regression?  What if students, after a year of work, only gain slightly?

Knowing precisely where gains and losses occur is essential for change–or, for standing by a program that works!  Well meaning initiatives are the norm, but they are often based on assumptions.  Two things then happen: 1. The reality of how learning occurs does not match, so the results show no change, or 2. Those forces against change raise concerns that are as valid (or more valid) than the evidence supporting the original initiative.  In the end, the initiative fails, fades or simply disappears and everyone feels just a tad more of initiative fatigue.  Precise knowledge of where programs succeed and fail stops such failures and creates real change.

For example, our SU had a push for something called “Calendar 2.0”, a re-imaging of the school year calendar that would break the year into six week unit sized chunks with a week or two break between.  It added no more days to the year, but spread out school days.  The intent was good.  During the breaks teachers could plan their next unit in reaction to the results of the previous unit’s results, and students could get help mastering skills they had yet to gain.  Schools might even be able to offer enrichment during that time!  The shorter summer break was designed to prevent summer regression.

For families, it shrunk the summer break, and created more week-long breaks.  It cut into summer camps and made child care difficult and somewhat random.  There were a vocal group, too, that argued for the unstructured time that summer afforded.  In reality, many families were going to plan vacations during that time–told that their child was reading poorly and needed to stay would not trump the money already spent to go to Disney World.

Because Calendar 2.0 was not based on the clear need of students in our schools, it failed.  It sounded good–summer regression!  Planning time!  Time for shoring up skills!–but there was no local data supporting it.  Of all the areas teachers saw in need of shoring up, working in the summer did not rank highly.

Where in the Year is the Gain?

In my last post, we looked at year-to-year gains and regression.  When we look at the 12 kids who regressed, half did so over the summer.  But half did so over the school year.  So, over 180 days of instruction student reading actually regressed for 6 students.  Our school made them go backwards.  Both summer or school year regression are results we should be concerned about, but the latter points to something we can control but are not.

Below, I identify students who had a dramatic difference between their summer and school year gains.  Some students showed consistent gains, while others consistently stagnated.  My two periods of examination–summer and school year–are based on discussions people at my school were having about programs we could institute (Calendar 2.0 being the most prominent).  You should do those time periods that you feel are important to you.  It could be the November-January holiday period (where family strife makes learning difficult), or May-June (end-of-year-itis) or days of the week (Mondays are about mentally coming back, Wednesday is the drag of “hump day” and Friday is checking out–so when DO kids focus?).  The important part is to use data to examine it.  All of my parenthetical asides are assumptions, many of which I have found untrue.  You will be surprised how assumptions and old saws are wrong.

Students Who Gain Over School Year

Instead of always looking at the negative, let’s try and determine where kids succeed.  These students (scores highlighted in yellow) showed significantly more gains in reading over the school year than the summer.  In fact, relatively, this group lost ground over the summer.

Note how, because I gave the DRP in the spring, fall and again in the spring I was able to measure a) year-to-year growth, b) spring-to-fall growth (in short, gains or losses during the summer) and c) school year growth.

Reading Data Gains

The difference between the school year and summer demonstrates the importance of being in a literate environment for reading growth to occur.  Being forced to spend time with text leads to reading success.

Note that growth occurred both in students who struggle with reading (expected) and those who are in the top group.  Even good readers benefit from the literate school environment.  If these students get more time in a literate environment–more reading time–these gains should continue and increase.

Summer Gains, School Year Loss

There is a population who sees a loss in reading progress over the school year (notes in that dark khaki color), yet sees gains over the summer.  They are a diverse group for which this phenomena could have many causes.

Reading Data Gains 2

Still, one factor carries through many of those identified: Time.  Many of these students lead busy lives, with responsibilities including sports, family, work and school.  Reading for fun, and leisure time in general, is at a premium.  Without practicing their reading, they show no growth–or a loss.

In the summer, these students enjoy a lighter schedule.  They fill it with reading.  In order for them to see year round growth they need time.  These same results have been observed in other studies and are especially prevalent in middle class students saddled with activities.

Will Successes Scale Up?

Okay, so we saw some success.  Often, when we do nothing, someone gains.  Was it me?  The conclusion I reached–more time spent reading will improve reading skills–makes instinctual sense.  And the research backs that up.  Good, right?

That said, the data I have at hand is thin.  I am relying on a certain knowledge of my students and that invites bias.  My sample sizes are small.  As a data point, the DRP is more like a machete than a scalpel. (Read The DRP as an Indicator Of….)  Will more Sustained Silent Reading (SSR) result in progress?  It will take some time for the data to prove me out or cause me to change course.  And, of course, more data, and more precise data.  But, I am aware of all of this as I move forward.  My program will react not to theory, but to what students experience in the classroom.

For example, of those 12 students who regressed, 3 are in the top stanines–they have nowhere to go.  Similarly, the glut of students with none to minimal gains are also in the top stanines nationally–they had nowhere to go.  But, I am wary of making excuses, so data.

*

Restatement: Introduction to These Next Few Blog Posts (Backstory for those coming to this post first).

We get a lot of data. It may come in the form of test scores or grades or assessments, but it is a lot.  And we are asked to use it. Make sense of it. Plan using it.

Two quotes I stick to are:

  • Data Drives Instruction
  • No Data, No Meeting

They are great cards to play when a meeting gets out of hand.  Either can stop an initiative in its tracks!

But all of the data can be overwhelming.  There are those who dismiss data because they “feel” they know the kids.  Some are afraid of it.  Many use it, but stop short of doing anything beyond confirming what they know–current state or progress.  And they can dismiss it when it does not confirm their beliefs. (“It’s an off year”)  Understanding data takes a certain amount of creativity.  At the same time, it must remain valid.  Good data analysis is like a photograph, capturing a picture of something you might not have otherwise seen.

This series of blog posts will take readers through a series of steps I took in evaluating the effectiveness of my reading program.  I used the DRP (Degree of Reading Power), a basic reading comprehension assessment, as my measure because it was available.  I’m also a literacy teacher, so my discussion will be through that lens–but this all works for anything from math to behavior data.

* A stanine (STAtistic NINE) is a nine point scale with a mean of 5.  Imagine a bell curve and along the x-axis you divide it into nine equal parts.  The head and tail is very small area (4%) while the belly is huge (20%).  Some good information can be found in this Wikipedia entry.

Evaluating a Program With Minimal Data Points: Step 2: Year-to-Year Gain

Do We See A Year-to-Year Gain?

Many schools measure a program in yearly gains.  In one year, students should show a year of growth.  What we mean, of course, is the school year; that from September to June students will gain a grade level.  And we hope there is little regression over the summer.  We will discuss the difference between yearly and school year gains in our next post.  For now, let’s focus on yearly gains.

Are students learning?  After identifying where you students are (the previous post) your next task is to measure growth over a year.  To start, I suggest spring to spring because that measures where they are after a year of instruction.  Again, here is my spreadsheet from Grace Haven Elementary.  Note Column E, which simply takes the DRP (Degree of Reading Power) score from 6th grade and subtracts it from the 7th grade score for a single number I call “gain”:

Reading Data 1

It is really important to compare apples to applies.  One of the strengths of the DRP is that the scores are comparable from year to year.  Whatever measure you use, please make sure the measures match so that you are able to measure a year’s worth of growth.

What is not always clear is what a year’s worth of growth IS.  For example, the DRP offers an I90 score (the reading level a student is able to read independently with 90% understanding).  I would expect that every student gains by just being in the building.  By having a year of life under their belt.  But what does a year’s worth of gain look like?  At Grace Haven, they gained a bit over five (5) points on the I90.  That does not seem like a lot.

Except that, as we learned in Step 1. over half of the students were in the top three stanines*, or top 23% nationally.  Where do they have to go?  If half of our students cannot gain much, it dampens the possible growth of the group.

So take them out.  When we look at those who are not in the top stanines another picture emerges.  In the case of Grace Haven Elementary, the growth is mixed.  Some students gain a lot, some regress, and others stay put.  If you have the former, pat yourself on the back before moving into the tough analysis.  If the latter–stagnation–you might think about the questions those in regression need to be ask because your program is not where you want it to be.

Regardless, just because a student is learning does not mean a teacher is teaching.  Can we take credit for success?  Or failure?  We need to know the effectiveness of our program if we hope to increase that effectiveness.  What, for example, if the school year gains only make up for a huge summer regression?  What if students, after a year of work, only gain slightly?

When we look at the 12 kids who regressed, half did so over the summer.  But half did so over the school year.  So, over 180 days of instruction student reading actually regressed for 6 students.  Our school made them go backwards.  Both summer or school year regression are results we should be concerned about, but the latter points to something we can control but are not.

Of those 12 students, 3 are in the top stanines–they have nowhere to go.  Similarly, the glut of students with none to minimal gains are also in the top stanines nationally–they had nowhere to go.

Still, it raises a basic question: Does our program help kids raise their game, or does it rely on previously done work and simply maintain that.  If the latter, those who did not “get it” earlier are not getting what they need now.

Restatement: Introduction to These Next Few Blog Posts (Backstory for those coming to this post first).

We get a lot of data. It may come in the form of test scores or grades or assessments, but it is a lot.  And we are asked to use it. Make sense of it. Plan using it.

Two quotes I stick to are:

  • Data Drives Instruction
  • No Data, No Meeting

They are great cards to play when a meeting gets out of hand.  Either can stop an initiative in its tracks!

But all of the data can be overwhelming.  There are those who dismiss data because they “feel” they know the kids.  Some are afraid of it.  Many use it, but stop short of doing anything beyond confirming what they know–current state or progress.  And they can dismiss it when it does not confirm their beliefs. (“It’s an off year”)  Understanding data takes a certain amount of creativity.  At the same time, it must remain valid.  Good data analysis is like a photograph, capturing a picture of something you might not have otherwise seen.

This series of blog posts will take readers through a series of steps I took in evaluating the effectiveness of my reading program.  I used the DRP (Degree of Reading Power), a basic reading comprehension assessment, as my measure because it was available.  I’m also a literacy teacher, so my discussion will be through that lens–but this all works for anything from math to behavior data.

* A stanine (STAtistic NINE) is a nine point scale with a mean of 5.  Imagine a bell curve and along the x-axis you divide it into nine equal parts.  The head and tail is very small area (4%) while the belly is huge (20%).  Some good information can be found in this Wikipedia entry.

Evaluating a Program With Minimal Data Points: Step 1: Sorting Proficiency

Introduction to These Next Few Blog Posts (Skip down to the meat)

We get a lot of data. It may come in the form of test scores or grades or assessments, but it is a lot.  And we are asked to use it. Make sense of it. Plan using it.

Two quotes I stick to are:

  • Data Drives Instruction
  • No Data, No Meeting

They are great cards to play when a meeting gets out of hand.  Either can stop an initiative in its tracks!

But all of the data can be overwhelming.  There are those who dismiss data because they “feel” they know the kids.  Some are afraid of it.  Many use it, but stop short of doing anything beyond confirming what they know–current state or progress.  And they can dismiss it when it does not confirm their beliefs. (“It’s an off year”)  Understanding data takes a certain amount of creativity.  At the same time, it must remain valid.  Good data analysis is like a photograph, capturing a picture of something you might not have otherwise seen.

This series of blog posts will take readers through a series of steps I took in evaluating the effectiveness of my reading program.  I used the DRP (Degree of Reading Power), a basic reading comprehension assessment, as my measure because it was available.  I’m also a literacy teacher, so my discussion will be through that lens–but this all works for anything from math to behavior data.

Step 1: Sorting Proficiency

The first, most basic step in analyzing a program is to find out how many of your students can do the skill you are interested in.  It seems basic, but so many teachers assume they know the answer.  Never assume.  A number of our students read a lot, but don’t really think about their reading–their ability is on the surface and their memory of what they read is weak.  Because we see them with their nose in a book, though, we tag them as a reader.  Others can, but don’t read.  They do, though, test well.  In a previous post I discussed if the DRP was a measure of reading or stamina (or ability to focus).  That may an issue for some.  It is certainly an excuse–they don’t test well, or they’re unable to focus.  You can do analysis later, but you first have to see where your class stands before you begin asking why and proposing solutions.

Choose an assessment and give it.  You want to give one that you would consider valid–that is, few variables.  You can measure books and/or pages read, stamina in SSR, depth of reading using reading logs, or a good old standardized test.  What that means is up for debate, but I used the basic off-the-shelf DRP to measure reading comprehension.  You also want to administer an assessment that you give multiple times.  We administer it in the fall and spring to allow for tracking progress (more on that later).

Here is a sample of a class from Grace Union Elementary:

Reading Data 1

Students highlighted in lavender are in the top three stanines* of achievement nationally, or the top 23% of readers in the same grade.  In this cohort there are 24 out of 47 in that group.  So, half of our students are top readers.  In addition, 3 other students met our local standard, but fell short of the top national stanines (highlighted in purple).  Twenty-seven out of 47 students scoring well is great news, right?

It depends.  Looking at your data, you do have to decide where the line between proficient and below are.  Our supervisory union does that by pegging the “local standard” to a certain national average point.  You might disagree with your local designation–I used the stanine to raise my bar above ours–but since the results change depending on that line your choice is important.  What line will reveal the most about your program?.

For example, an additional 15 students were in the 5th or 6th stanine, putting them at or above the mean nationally.  Not bad, when added to the 24 who were in the 7th, 8th and 9th stanines.  I could comfortably go to our admins and the school board and talk about 39 out of 47 being average or above.  If I wanted to, I could point out that a number of our struggling readers have IEPs or other plans.  Everyone would agree that my program is solid.

Except, locally, that’s not good enough.  When they get to high school they will struggle if they are merely average.  Six of my students were in the lowest stanines, or about 1 out of 8.  Not great numbers.  And 20 students don’t meet our local standard of proficiency.  They are leaving my classroom unprepared for what awaits them.

Rubrics are a start.  What is “proficiency” for you?  The NCLB data is a nice yardstick–what measures do you have that correlate with the data you are seeing there?  Think about if they do, in fact, correlate.  Our old NCLB writing data (NECAP) seemed to inflate our ability, so we have created a local assessment that gives us a little guidance on what to work on.  I correlate that with what I see in my classroom assignments.  If we still gave the old NECAP, I’d take the data with a grain of salt.

For half of our students, reading is a natural activity and they do it well.  Twenty-seven students can claim to be proficient.  But, even for them, I have no idea if I can claim their success a result of my program.

That is the next question.

* A stanine (STAtistic NINE) is a nine point scale with a mean of 5.  Imagine a bell curve and along the x-axis you divide it into nine equal parts.  The head and tail is very small area (4%) while the belly is huge (20%).  Some good information can be found in this Wikipedia entry.

DRP as an Indicator Of….

When I arrived at my current school a decade ago, there was no definitive measure of a student’s ability to read.  That may be hard to conceive in this data-drives-instruction landscape (although in education you can find plenty of instances where a lot of data will not tell you the most basic things about students), but the feeling was that teachers knew the students and the word was spread as they moved from grade to grade.

In November of that first year, I found that I had a student who did not know alphabetical order.  I had asked her to look something up in the dictionary; she faked it for a few minutes and then threw a fit that got us distracted from the whole enterprise.  Fortunately, my aide noticed the faking and followed up.  This student was a known non-reader, but no one knew she was reading at a second grade level.  How, after all, could someone be reading at so low a level?  And she carried around grade level appropriate books and sat quietly during SSR.  When pressed, she created a scene of distraction.  In trying to fit in, she slipped through the cracks.  She is exactly why we have and use data today.

What is the DRP? (Skip Down for Discussion on Validity)

At a previous job, we had used the Degree of Reading Power, or DRP, on 10th graders.  Created by Questar, the DRP measures reading comprehension.  In the assessment a passage is provided with certain words removed.  Students are asked to fill in the blank from a selection of five words.  The 7th grade test is 63 questions, while the 10th grade was 110 when I gave it years ago (I doubt it’s changed).  The questions start out easy, and get progressively harder as the student goes on.

We use bubble sheets–it’s that dull.  But, in adopting the DRP, we have a screening tool that lets us question who has mastered the basics of reading.  From that baseline, we ask follow-up questions, plus have students write reading journals and answer prompts to measure understanding and deeper meaning throughout the year.  For the hour we put into it, we get what we need out of it.

The DRP also provides some good data.  From the raw score, Questar gives you an Independent Reading Score, or I90.  The I90 indicates the level of a book a student can independently read without problems or assistance.  So, Harry Potter and the Sorcerer’s Stone is ranked a 56, meaning a student who scores an I90: 56 should be able to read it unassisted (this does not take into account cultural literacy or maturity, which is why caution should be used on Of Mice and Men’s “easier” score of 53).  It also offeres I80 and I70 scores, which indicate increasing support offered for understanding, plus a “frustration level”–the point where a student might throw the textbook across the room.  Questar also ranks a student’s score against the nation, providing national percentiles and stanines.  I’ve never asked what database they get this information from (is it against other users of the DRP, or larger pools?), or if it updates every year, but it’s a larger sample size nonetheless.

When I first did this, there was a booklet filled with tables that converted all of this for you, but they later came out with a database for the computer.  The company also provided a directory of popular classroom texts and their DRP, so you could match students with books.  None of these CD ROMs ever really worked for our computers.  Questar seemed locked in the 1970s.  The online information today smells like a dying company or division being run out of habit, where each year someone has an idea to update stuff but never to revamp the entire test for the NCLB age.  Even the name Questar sounds like one of the lesser computers of the early 80’s competing with Tandy and Commadore.  I think they know what they have and keep plugging.

The neatest thing about the DRP, though, is that the I90 score measures across grades.  You can compare an I90 score taken in 2nd grade with the I90 taken with the 7th grade test.  So, if a 2nd grader scores a 43 as a 2nd grader and a 45 as a 7th grader, you know they have not progressed over five years of schooling.  You can also give a poor reading 7th grader a 5th grade test and they will not meet their frustration point until much deeper in, providing a more accurate result.  In the end, I like to measure growth.  The DRP is great for that.

What Does the DRP Really Measure?

Every September we give the DRP to our 7th grade, and every May we give it again.  Because of the design, we can measure I90 growth over the year.  We can also measure it against their 6th grade result.  If we use the 6th grade spring results against the 7th grade fall, we are be able to measure gain (or loss) over the summer.  We can do the same thing when we measure the same kids in 8th grade.

But the DRP is dull.  And, remember, the questions start out easy, and get progressively harder as the student goes on.  For some students, the first hard question throws them.  Then, they just color in dots.  One way Questar makes money is that they sell their bubble sheets and then correct them for you (and put the results on a disk, ready for manipulation).  Instead of paying for that, we took our answer key and made an overlay (overhead transparency sent through the photocopier).  In correcting it ourselves we can see where kids give up from a series of wrong answers.

Which lead us to the question we’ve wondered for a while: Does the DRP measure reading comprehension or stamina?

To answer that question, last September I broke up the 63 question DRP I usually give my 7th graders into three parts with 21 questions each.  Then, I measured growth (or not) with their 6th grade scores from the previous May.  In the end, nothing significant showed up except that one group got better at reading over the summer: Over scheduled kids.  I had read that high achieving middle class kids who participate in a lot of activities–soccer, music, the school play–cannot find time to read during the school year.  My data showed that, but nothing about stamina.  In fact, the ups and downs over the summer made little sense.

But discoveries often happen by accident.  This May, I went back to the old administration of the test–63 questions in one sitting.  Our 8th graders were taking a NCLB mandated Science assessment, so I used that time to give the 7th graders the DRP.  Because the 8th grade were monopolizing our aides and classrooms, I set the 7th graders up in the cafeteria while the kitchen staff were whipping up lunch in the kitchen.  My hope was that the blowers and bacon smell would be white noise and calming as the students and DRP assessments spread out across the antiseptic tables in the grey room.  Some finished quickly, while others lingered over an hour.

The results were not inspiring.  I had been unhappy with my reading program–I’m unhappy every year with both my reading and writing programs, but this year I now had weak data to prove it.  I uploaded my scores into a spreadsheet, looked at growth, ranked and sorted.  The high kids stayed high, and the middle kids stayed in the middle, with a few growing or dropping a bit.  Even that assessment is a bit inflated, if I’m honest.  It was not a good year.

Then there were the kids at the bottom.  About ten students had dropped between ten and thirty points over the school year (on a scale topping out at 80).  This was significant.  Our entire Tier II intervention placement was based on these scores.  Several students who had moved out of Tier II were looking at returning in 8th grade.  Those receiving Tier II were seeing regression.  What, I wondered, was I doing wrong?  (I had ideas, and it started with sacrificing SSR time for any distraction that came down the pipe).

In looking at the names of the students, I realized that those students who either had a diagnosis of ADD or ADHD, or we suspected of having ADD or ADHD, had tanked.  Our literacy group had often wondered to what extent the DRP was a test of stamina as much as it tested reading.

In looked at their answer sheets, I noticed that around the 20th question these students began to get questions wrong.  Not just a few as the questions got harder, but a string wrong and then another string wrong.  The break, I suspected, was because, even when guessing, a student will get some correct, because probability.  They had given up and were just filling in bubbles.  Bad data.

The next week, I had a few of these students redo questions 22 through 42.  They were placed in a quiet room in two groups of four.  I had explained my belief in them and personally appealed to their sense of pride and in controlling the environment.  In short, was trying to get them to focus on the task and then setting up an environment that fostered focus.  Six of the eight did significantly better, from 5 to 12 questions better.  When I had them redo the last twenty questions, I saw the same results.  Five the students went from the 4th or 5th stanine for reading nationally to the 8th.

Of the two students who showed little improvement, one is not ADD or ADHD.  The other is suspected of ADHD and was even more hyperactive than the first administration and openly hostile to the retake.  Either they had learned to read in a week, or I had been measuring stamina before.

Why does it matter beyond the one assessment?  Our school uses the DRP data to decide who get Tier II help and who has “graduated” to Tier I.  Tier II instruction happens against World Language, so it can be a reward or punishment depending on the family.  There is some pressure for students to be taking a World Langauge (often from their parents), or a desire by students to “flunk” into Tier II so they can a) avoid the hard work of learning French and b) be with their Tier II friends.  These numbers weigh heavily in the court of “what’s best for the child”.

It also matters on how we take other, higher stakes assessments.  For their NCLB assessment, Vermont uses the SBAC.  Entirely online, students have a lot of control–if they choose to take it.  Those who click through quickly and take a long break find those answers locked when they return.  They can, though, slowly go through a small number and break.  Then return for a few more.  This is different from past assessments, which means we need to retrain and empower students.  These results tell us that we need to instruct some students in how to take a test–an instruction that in tailored to individual students and different from just attacking the questions themselves.  The results also tell us we need to create a different environment–one in which students can move about without disturbing others, and they are less tied to a clock.

All of this leads me to a more outlandish proposition that I am still thinking about: Our school uses the DRP to measure where students are, but I’d like assessment to be more predictive about potential.  Why?  Because when an assessment just measures, I find the school’s reaction is to address what they think it measures.  So, those who tank the DRP get put into the standard Tier II reading program.  But if we can measure elements that go into that measure–like stamina–it gives us a better idea of what to address.  The potential is there.  The fix, then, might be more around Habits of Mind than more phonics.  At present, we are not sure.

Of course, our support services responds by offering more assessments.  But that is often guesswork and time consuming.   If the cafeteria with bacon wafting through the room is not condusive to results, I cannot image the forty minutes a special educator can give me to do a “quick” BES is much better.  And the coordinator who battered kids with an AIMS-Web in a noisy hallway (the only space available) produced little that was useful.  And, if anything is found, the student is often dumped into a program with a promise that “we’ll work on that” cause when they have time after the reading instruction is done.  No one has time.  In identifying causes, we might find the solution can be had with greater efficiency.

My hope is assessments that can be more predictive, and can be done by empowering students.  By having the students value the assessment, and understanding the consequences of their choices, they own it.  When we give them the tools to do their best work, they use them.  In the end, the measure becomes about reading.

Then we’ll have to find another test for stamina.