0:00
[MUSIC]
>> Gavin, we've been talking a lot over the whole course, weeks one,
two, three, and four about assessment and about multiple choice.
And in these two weeks, you
you talk a bit more about subjective assessment.
Well, that sounds pretty dodgy - "subjective
assessment", because we've had all this objective assesment.
So, what are the virtues, if you like,
and what are the cautions around this whole notion of subjective assessment?
1:42
And the tragedy of being a human being is that we're really bad at scoring.
We're really terrible at it.
So many studies have been done
giving university professors essays to mark and then
three weeks later giving them the same essays
to mark again, and you're lucky if people
get a mark within three marks out of
twenty compared to the first time they got marked.
>> Seriously worrying.
>> Yeah, higher education marking is you know a lottery in some cases.
2:13
So, the problem is as humans were easily distracted,
by environmental things, like, well, in the west everyone will
tell you about how they're constantly interrupted by their cell
phones and their texts and their emails and their twitters.
And it's when you're not paying attention or
if you're paying attention to the wrong things, you just
make poor quality judgments.
That's why they ban answering your cell phone in your car while you're driving,
because next thing you know you're not paying attention.
>> So, when it comes to schooling, the workloads for marking, I remember being an
English teacher, and the workloads were often very large because we would mark
our own class's work and then if it was a mock exam
or an assignment that we count towards a final grade, we would then switch marking -
so, I have to mark another class's.
Now, that was fine if you're a math
teacher and you just go "Tick, seven, nine, seven."
But, when you actually have to read something the student's written and
make a weighted judgement as to its merits, that's a complicated thing.
3:21
It's not just a question of judging which
one's taller or shorter, or which one's heavier or lighter -
you're asking on a multi-dimensional scale, what are the
various virtues and qualities of a piece of student work.
And this is just difficult to do.
And, we have mechanisms that improve the quality of our marking, like
guided rubrics or exemplars, that show us what quality looks like.
But, even with those tools, it's easy
to get distracted when you're tired and
it's 11 p.m
and you have to be up at whatever and you're still marking and
the boss is saying the marks have to be handed in tomorrow at 8 a.m.
and you're only halfway there - you can bet that
there's a huge error component as time goes on.
And the consistency of your marking is
rapidly diminishing.
>> Yes, you trigger a thought there,
if you know the research in to judges giving sentences.
They give tougher sentences in the morning
and then I think in the afternoon when they're tired--
>> Or maybe they've had one too many during lunch!
>> Absolutely, that's part of it.
So, the answer here is don't
have one too many when you're marking.
>> Oh, definitely not.
4:56
the average marks that they gave were lower
than if they could make the teacher feel happy.
>> So, you know, having happy, smiling kittens
around you before you start marking, might
give your students more credit than they uh--
>> So, we have a proposal here.
>> Yeah.
>> Happy, smiling kittens
when you're doing the marking.
>> Yeah, I mean humans are-- we're soft-willed beings, we're easily led astray.
And then this is the tragedy, we're making
decisions about children's lives that matter to them,
both whether it's for an external qualification or
a grade that makes a difference for promotion or
for streaming, or simply to be able to
identify, "What feedback should I give this kid?"
5:48
One of the reasons journals invite authors
to review is because you've jumped through the hoops,
you know how to get published, so you must understand what it takes,
when you read someone else's manuscript, what it takes to get over the line.
Maybe part of the problem is that as teachers, we're not
doing these things that we're asking children to do, enough ourselves?
Can you really teach and mark student writing if you're not writing regularly?
Should you be teaching literature if you don't read regularly?
6:34
>> Yeah, there's some very reassuring kind of messages
in there and some kind of slightly worrying
messages as well.
I mean about this whole business of
assessment, which you've obviously devoted a great deal of
time in your life to exploring,
and I think the whole course really gets down to
some of those really key issues and then actually assesses people at the same time.
One of the things it does,
of course, is this three people marking an essay.
And, there you're looking for, I suppose, inter-rated reliability.
>> Yes.
>> I've just come from a Coursera conference in London,
actually, just two days ago, and what they were saying was that actually
students marking one another's essays, subjective marking, proved to
be just about as good as when the professors mark them.
I kind of sat back a bit and thought "That's... Hmm...
a bit surprising" - is that surprising?
7:34
>> The toughest marker in the university is the
newly-minted grad student, and the professor is usually much more relaxed.
So, if the students take seriously the content and
the standards that are being promulgated and they're writing an essay task
around that and they've engaged with it, they should
be able to judge each other around those criteria.
8:02
And if the content is reasonably - well, I
guess it depends on the design of the task -
if the task lends itself to a fairly clear description of
what it is you have to do in this essay,
then it's much easier to mark, than if it is very
open and broad, "Reflect on and give your personal
opinion on", whereas if it's more of an academic display of
knowledge or interpretation and understanding
based on evidence, then it's much
easier to mark in a similar way to the teacher.
The experts.
Because it's more constrained.
8:53
we might have a wide variety of opinions, as to whether it's good.
And you know, the same with judging movies and music, so much more
subjective, so much, less concrete in terms of the standards we're going to use.
>> Yeah, i just watched the Wolf Of Wall Street and wondered why my
my judgment of that move is so at odds with--
>> The critics?
>> --with the critics' judgement, for example, yeah.
>> Or, the other example that you give, which
is the nice one about subjective judgement, is figure skating.
>> Yes.
>> Elaborate a bit on that.
>> Sure, well, Olympic judging has to happen
in the moment and on the fly, so they only do each each sub skill in their
routine, whether it's diving or figure skating, in a very short period of time.
So, the people sitting in the panel have to very
quickly decide on an impression basis, how good was this?
And, because there's the tension and the probability of bias, they simply
cancel out some of the bias by, removing the highest score and the lowest score,
so that if a judge is being bribed to give a
high score or a low score we just get rid of it.
So, that gets rid of some extreme responses.
And then, there's a panel of five or seven judges left over and we average that.
But, we don't just rely on one set of scores -
in figure skating, there's technical
merit, and artistic, but it's also multiple performances.
There is compulsory figures as well as free, and so what
you get is multiple sets of data judged multiple times by
multiple experts and by the time it gets to the end,
you have a pretty good idea as to who is actually best
and the difference between first and second
is often in the hundredths of a point.
10:54
But, it's consistent enough across all of those instances
to lead to a reasonably robust judgment.
There have been cases of cheating and at
least the Olympic committees have found it
and disclosed it.
The Salt Lake City's figure skating comes to
mind when Canada was cheated out of the gold,
but we got it back after we found out about the cheating. But without those
multiple judges, and multiple instances, it's hard to really say very much
about the quality of work.
The generalizability theory research people have shown that
to get reliable scoring of student writing, for example,
you need anywhere between three and five pieces of
writing, judged by anywhere between three and seven judges.
11:51
And, in one medicine school study, they found that you needed a minimum of
three to four hours testing, before you could
make a reliable judgment about a student's competence.
So, these are difficult skills to judge, that
if we're going to use them robustly to make big decisions,
we better be pretty confident of the quality of the judgments we're making.
>> Yeah.
>> And, we used to-- in my high
school teaching, we used to be satisfied with two teachers marking.
You mark your class and I mark your class, and vice versa.
And we used to mark our student essays out
of 20 marks and generally we had a rule of
thumb that said if we were within three marks of
each other out of 20, we just split the difference.
12:45
And, if we were within only one and a
half marks, we always took the higher of the two.
But if we were more than three marks apart than we would have
to argue why it was higher or lower than the other person suggested.
And we would compromise after listening to the insights the other marker had
and we found that in general because
13:20
usually we were only discrepant on 25%.
Now, 25%, is still a lot of work to go over and debate
and you can get really passionate about it - "No
way is this anything higher than a bare pass",
and the other person thinks it's an amazing piece of work, and you go "No, no, no, no.
It hasn't done this, hasn't done this", but
she's going "It's done this, it's done this."
And this is the wonderful debate where we learn what to
value and how to value it and how to identify it.
14:00
>> You know, you go to a physician
and he says, "How long have you been in practice?"
You know, "Wait a minute, I don't want you to practice on me.
I want you to be good."
And then teachers are still in practice in a sense,
and that's a sensible position to take.
We're in practice.
That means, I'm better this year than I was last year and I'll
probably be better in two years than I am now, if I keep
paying attention professionally to what I'm doing.
>> Yeah, that's not really reassuring for me being in your class, however.
>> Well, fortunately my marking is cross-checked by someone else.
14:37
And then the final decision is given on the consensus of those two markers.
So, in that case, there is some protection for the individual.
But, yeah, it's the "What if I got the weakest teacher in the school?" effect.
>> Mm-hm.
>> And that is a real tension.
15:01
Johnny's parents don't want him taught by the weak teacher.
Johnny wouldn't want to be taught by the weak
teacher, and I don't want to be the weak teacher.
But somebody's the weak teacher
in my school, and so, it's up to
us to be professional, to say "Well, how
can we build systems that help compensate for
our competencies, or weaknesses in our competencies."
But that, the scale of our competencies are not so
severe that we shouldn't even be teachers in the first place.
15:32
Most countries have entry standards and entry requirements and if
you can't pass those, you don't get to be a teacher.
>> This is where in a commonwealth context,
of course, the whole thing becomes extremely fraught
because we have such a range, from expert teachers
from Canada and New Zealand,
who are at the very peak of their profession,
and at the other end you have teachers in
some of the African countries, who have no qualifications, who have
150 children in their class.
We're trying to do all of that, so that is our constituency.
These are the people these courses are for and they come online and they--
But the wonderful thing about the whole program is that people who
are in say Ghana or in another African country and who are struggling.
16:19
don't necessarily get the best advice from the people in New Zealand or Canada,
but from people who are in similar situations, and are able to say to
them, "Now, have you thought of the small things that you can do in your context?"
But, I think everything that you've done
and talked about here, Gavin, on this course,
is going to be extremely provocative, I think, in the best
sense of the word, for the people who are taking this course
and I think it's an extremely valuable course to have and the insights
from these discussions that we've had,
I think you're going to contribute immensely.
So, thank you very much indeed - a) for agreeing to do
the whole course, but for allowing us to have these kind of conversations.
>> Sure, you're welcome.
It's been a pleasure.
[MUSIC]