Elo Balanced Grading

How do you fairly measure two students of different academic ability?

One way is to use a rubric.  It’s cold and fair, giving a score on where each falls on the assignment.  Aligned with the Common Core and it objectively shows if a student is on grade level.  But, if you have heterogeneous classes, how can you push that scoring to better reflect progress, disincentivise coasting  and add more options to your differentiated offerings?

Elo ratings.

Think sports.  On a traditional scoreboard, one team earns a win and the other earns a loss–it does not matter if the teams are evenly matched, mismatched, or how close the score is.  After the match, one team is 1-0 and the other 0-1.  Those of us who have watched underdogs come close–or triumph–know that the score does not always reflect what happened on the field.  Elo weighs those factors and results.

The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games.  Created by Arpad Elo, a Hungarian-born American physics professor, it was originally used for chess.  It has been adapted for other competitive sports, including football and soccer.

In short, Elo ratings give a score to two competitors prior to a match based on their ability, and update scores based on new results.  Let’s say a poor team goes against a very good one.  Intuitively, we expect the poor team to lose.  If the poor team loses, their Elo score goes down, but not by much–no surprise in the results.  The very good team rises in Elo, but, also, not by much–again, no surprise.  We expect that outcome.

If, though, the poor team wins, they get a bunch of points in the Elo, and the good team loses a bunch.  The poor team earns more in a victory than the very good team because their victory required playing above their norm.

If you use a more advanced algorithm, Elo can also account for the closeness in a score.  If the poor team should have lost by a huge margin, but the game is close, their points lost are minimal–it recognizes they played above their norm.  Similarly, the points gained by the very good team are also minimal, even with a win, because it should have been a blow-out.

Student Scores, Assignments and Differentiation

In using a rubric with a scale of 1 to 4, the struggling student will inevitably earn a 2.  That score is accurate, but does not take into account risk, progress or the difficulty of the task.  Similarly, a high achieving student typically collects 3s and 4s.  No surprise.  It is the sports equivalent of being 0-1 or 1-0.

With Elo the struggling student can score against the assignment, earning more for tackling one more difficult–even if they fall short.  So, if a student who typically earns a 2 does well on a particularly difficult assignment, they might earn a personal score of 2.5.  In addition, if the high flyer decides to coast, their scores will not be as stellar as if they chose a harder assignment.  Perhaps, a 3.5 personal score instead of an automatic 4.

Not that the rubric should go out the window–that is the objective measure.  It would make sense to have two scores–a cold rubric to have a norm, plus an Elo personal score.  The former would be the equivalent of the scoreboard, the latter the classic Elo.  This personal score would show the struggle.

Many teachers already offer differentiated assignments.  Now, they can make it clear what each’s is score.  Students could choose based not only on interest, but challenge.

Even more advanced, teachers could offer their normal assignments and use student scores to rate the assignment’s difficulty–the scores of various students would weigh it in relation to other students.  There is a bit of bell-curve to this, but that’s why the Elo is for the personal score.

Elo Teacher Scoring

Another application is in balancing grades across varying teachers.

When a local high school implemented Proficiency Based Learning (PBL), and Proficiency Based Grading (PBG, aka SBG) it did not go smoothly.  Those teachers who embraced the formative/summative model tended to produce higher grades, on average, than teachers who clung to the older rating system.  At issue was the lead-up to assessments, supports and allowance of retakes in the PBL classes–with the emphasis on mastery, all that mattered was getting it in the end.  Those old-school teachers tended to use one-and-done assessments, average everything into the grade, dock points for late assignments, and use a 100 points scale.

Whatever your position on PBL and PBS the debate exposed the age-old debate of “easy” teachers vs. the old-school hard -ss.  As GPA becomes more and more important for college acceptance (even if that is only a teenager’s perception) many students are avoiding hard graders and hard classes because of real and perceived grading discrepancies.  That’s a crime.

But what if teachers were graded using the Elo?  Each class could be averaged for GPA and an Elo assigned to that teacher based on that average.  Then, when a student earns a grade in that course, a second Elo score is used to indicate relative difficulty of the teacher.  A more advanced algorithm might take into account individual student GPA in relation to each class’ earned grade.  And even adjust teacher Elo based on that.  If not on the transcript, it might at least give administrators and teachers an idea of grade inflation or deflation.  Correlated with other measures–SAT and the like–it might show which teachers make students earn their learning, and which are just difficult.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s