This question is inspired by Determining information in minimum trials (combinatorics problem). Here is the problem statement with some modifications
A student has to pass an exam, with $k$ questions to be answered by yes or no, on a subject he knows nothing about. Assume the questions are independently distributed with a half-half probability of being either yes or no. The student is allowed to pass mock exams who have the same questions as the real exam. After each mock exam the teacher tells the student how many right answers he got, and when the student feels ready, he can pass the real exam. How many mock exams on average (a.k.a. take the expectation) must the student take to ensure he can get every single question correct in the real exam, and what should be his optimal strategy?
This problem happens to be the topic of my bachelor's thesis, but I am now completely lost, so instead of striking for the optimal strategy, I decided to take a smaller step first by calculating the performance of some "sensible" strategies.
One of the strategies uses information entropy as a heuristic. The idea is that at each stage, make the student construct his answers so that the number of right answers he got has maximum entropy. Note that here I consider the conditional distribution of the number of right answers given the information extracted by previous mock tests. This is essentially a greedy algorithm and cannot ensure global optimum, but as I have said before, it's just a baby's first step.
The next step could be trying to maximize the "expected" entropy of the next stage after taking the current mock exam with some specific answer. To elaborate, the expectation is calculated with respect to the number of right answers he got for the current mock exam, because the maximum entropy he can achieve when taking the next mock exam depends on the value of this number. As a special case, if all answers are recovered after a mock exam, I would say the "expected" entropy is $0$, because there is no more information left to be extracted. Whereas the "baby's first step" approach plans $1$ move ahead, this strategy considers $2$ moves.
Next, we can analogously consider $3$ moves, $4$ moves, etc, until we find a solution which can recover all answers at some point. This process will intuitively give me the optimal strategies, because maximizing the entropy means extracting as much information as possible from each mock exam.
The problem is, does taking the expectation of entropy make sense, and how do I write a rigorous proof to convince others? to reiterate, this will be my bachelor's thesis, so please just leave hints/pointers instead of spoiling too much :)