I am looking for an algorithmic way to solve the following problem.
Problem
Say we are given a multiple choice test with 100 questions, 4 answers per question (exactly one of those four being correct), each correctly given answer is worth one point, wrong answers are worth zero points. If now we got a database D of lots of answer sheets and their corresponding points, e.g.
D:= { ('ABAA...', 80), ('ABAB...', 80), ('ABAC...', 80), ('ABAD...', 81), ... }
How can we find out which answers are correct? I am not looking for something probabilistic, but for answers which are definitely correct.
Some ideas
There are some obvious strategies like:
- look for someone who reached 100 points; you got all your answers
- look for tests, whose answers differ only by one (in the example database given, we can deduce the answer to question 4 is "D")
- look for someone who answered everything wrong, you can rule out those answers
But what information can we get from other combinations of answers?
Viewing the answer sheets as a metric space we get
100 - S(test) = d(test, correct)
for the hamming distance d(.,.), the score S(.) and the correct sheet correct.
Maybe someone could give me a reformulation of the problem, which yields a more obvious implementation. Any contribution is appreciated.
Edit:
Not considering computational complexity, couldn't I achieve something by intersecting the balls $$ \bigcap_i B_{d(t_i,\textrm{correct})}(t_i), $$ with tests $t_i$ and balls $B_d(x) := \{y: d(x,y)\leq d\}$?