[ Parent Question — How Can People Evaluate Complex Questions Consistently? ]

Can this model grade a test without knowing the answers?

by Elizabeth 2mo31st Aug 20191 min read3 comments

21


In the 2012 paper "How To Grade a Test Without Knowing the Answers — A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing" (PDF), Bachrach et al describe a mathematical model for starting with

  • a set of questions without answers
  • a set of answerers (participants) of unknown quality

and ending up with assessments of:

  • correct answers for the questions
  • assessed difficulty for the questions
  • evaluations of the participants, including overall reliability and areas of expertise

Can the model actually do this? Is it the best model for doing this? What kind of problems does it handle well and poorly?

New Answer
Ask Related Question
New Comment
Write here. Select text for formatting options.
We support LaTeX: Cmd-4 for inline, Cmd-M for block-level (Ctrl on Windows).
You can switch between rich text and markdown in your user settings.

1 Answers

One example of the ability of the model: in the paper, the model is run on 120 responses to a quiz consisting of 60 Raven's Progressive Matrices questions, each question with 8 possible answers. As it happens, no responder got more than 50 questions right. The model correctly inferred the answers to 46 of the questions.

A key assumption in the model is that errors are random: so, in domains where you're only asking a small number of questions, and for most questions a priori you have reason to expect some wrong answers to be more common than the right one (e.g. "What's the capital of Canada/Australia/New Zealand"), I think this model would not work (although if there were enough other questions such that good estimates of responder ability could be made, that could ameliorate the problem). If I wanted to learn more, I would read this 2016 review paper of the general field.