2

Our design of algorithms class requires all students to enroll in an online $AI$ competition, where each team has to come up with a bot. Before the final lockdown, each team is allowed to challenge any other team in order to test their strategies, including the random bot provided by the course assistants.

For the first testing round, each team had to play $10$ matches with a random bot provided by the course staff. By random I mean a bot which chooses a random move, out of the possible set of moves available to it at that game state. For a draw with the random bot you get $0$ points, for a win you get $+1$ and for a loss you get $-1$ points.

Unlike other teams, we chose to avoid hardcoding, going instead with an altered version of the minimax algorithm that fits this game. Needless to say, our strategy is far from flawless, but it's a lot better than what most others came up with.

Relevant facts :


$\bullet$ We lost $5$ times and won $5$ times in the testing round. So we got $0$ points.

$\bullet$ During or practice matches we got an $80 \text{%}$ win rate with the random bot.

$\bullet$ Also during the practice matches we played against a lot of other competing teams. One of the teams we played against had a very weak strategy, done through hardcoding. We got $4/4$ wins in the matches against their bot. Another team (their bot was also hardcoded) we've played against $4$ times, managed to beat us $1/4$ times, but we still beat them $3/4$ times. The former got a score of $10/10$ in the testing matches, while the latter got a $7/10$ ($3$ out of $10$ were draws).

$\bullet$ None of the $2$ teams I mentioned above updated their strategy between the time they played against us and the time of the testing.

$\bullet$ We had the worst score off all the teams tested in this round, though more than half of them were much weaker than us (as we've seen in the practice matches).

$\bullet$ The rules of the game in question can be found here (MSE link).


Not much can be done about our wasted time, but I would really love to see if there's a mathematical way to quantify that their grading is really flawed. I'm certain the randomness factor is quite relevant here, but I don't have any training in probability theory or chaos theory, so I can't model this situation.

How would you mathematically prove the grading system is wrong?

Victor
  • 3,238
  • 4
    Please rewrite your post to focus on the real issue. The rules of the competition do not matter at all. It sounds like your complaint is that the results in the run that counted do not match the practice runs. With short runs like this, statistical variation is large. – Ross Millikan Mar 31 '16 at 15:35
  • To me it seems reasonable that a simple program that just tries to get to a victory position does very well against any opponent that just let it take its victory without putting up a fight (like the random bot) even if it does badly against a bot that actively blocks its winning chances (like your bot). So your classmates' results aren't very surprising. For your own results: if we take your 80% for granted, there is a 3.3% chance to lose 5 or more times. Could be a mistake, could be bad luck. – Marc Mar 31 '16 at 17:04

2 Answers2

2

The grading system that is proposed here is unfair in the sense that it is random. Suppose a win chance of $p$ and a lose chance of $q$ against the random bot, where $p+q\le1$. Then the probability of getting a high score in ten games becomes better when $p$ gets higher and/or $q$ gets lower.

However, suppose you have a very high $p$ and a very low $q$. Then it is still possible to get a low score, resulting in a error between the actual level and the perceived level of the program. Since, I assume, every move in the game is calculated by a computer program you could just run the programs a high amount of times, say a 1000 times, against eachother. Note, the resulting score is still random. However, by the Law of Large Numbers, we know that in the limit the fraction of won games converges to $p$ and thus the error between the actual and perceived level converges to zero. In other words, by playing more games the probability of having a large error becomes smaller and smaller resulting in a high probability of having a fair score.

A completely other argument is that your programs are only scored on their performance against a random bot. Note that one program could do bad against a random program but perform very well against all other strategies. In most eyes this program would be considered superior to a program which loses to everything that is not random, but has a high winning chance against a random program.

Marc
  • 6,856
0

You need to define what you mean by the grading system being wrong. With short runs like this it is reasonably likely that the results will not match the expected result over a long run of matches. It sounds like the grading system had an announced algorithm for scoring and you have not presented anything that would suggest the algorithm was not implemented correctly.

Ross Millikan
  • 383,099
  • The goal of the competition is to build a functional $AI$ from scratch. If I haven't provided a convincing enough argument, how about the fact that even those who implemented bots that were mostly random and only checked for a few conditions got the full score? What I mean by their grading system being broken is that even if I've uploaded a bot that simply chose random moves I would've gotten a better score than the one I actually got. – Victor Mar 31 '16 at 15:43
  • It sounds like you are arguing for a grading system where the program is understood and rated rather than relying on the results of a competition. You could certainly make a mathematical argument that the short run makes it likely that the results will not reflect the expected result over the long run. You don't know that a random bot would have done better than you did-it could have had even worse luck. – Ross Millikan Mar 31 '16 at 15:51