14

I'm currently reading about machine learning and wondered how to apply it to playing Connect Four.

My current attempt is a simple multiclass classificator using a sigmoid function model and the one-vs-all method.

In my opinion, the input features have to be the state (disc of player 1, disc of player 2, empty) of the 7x6=42 grid fields.

The output would be the number of the row to put the disc into. Because that is a discrete number between 1 and 7, I guess this can be treated as a multiclass classification problem.

But how do I generate training examples usable in supervised learning?

The main goal is to win the game but the outcome obviously isn't known when doing every but the last turn. If I just let two players who decide randomly what to do play against each other thousands of times, will it be sufficient to simply take all turns made by the winner of each game round as training examples? Or do I have to do this in a completly different way?

Edit: As suggested in the comments I read a little about reinforcement learning. From what I know understand, Q-Learning should do the trick, i.e. I have to approximate a function Q of the current state and the action to take to be the maximum cumulative reward beginning in that state. Then each step would be to choose the action which results in the maximum value of Q. However, this game has way too many states in order to do this e.g. as a lookup table. So, what is an effective way to model this Q-Function?

Tom
  • 141
  • 1
  • 4

2 Answers2

8

Just to offer a simpler alternative to reinforcement learning, you can use the basic minimax algorithm to search for good moves, and use machine learning to evaluate board positions.

To clarify, minimax builds a game tree where each node is labeled with the outcome from the leaves up (1=player A wins, 0=player B wins), assuming that A chooses the moves that maximize this number, and B chooses the moves that minimize it.

Unless the game is very simple, you won't be able to construct the whole game tree down to the terminals. You'll instead need to stop at unfinished board positions and evaluate the leaves with some heuristic (essentially the probability that player A will win from the given position). You can let a machine learning algorithm like a neural network try to learn this probability from connect four positions with known outcomes.

To generate training examples you could build your minimax player with a simple heuristic, let it play itself a thousand times, use those games to train your first neural network, then let that pay itself a thousand games and so on. With a little luck, your system will improve with each generation.

Peter
  • 1,505
  • 14
  • 20
2

I wrote a blogpost about using minimax to play connect four a while ago. You can see the code in action here. If you need to train your models, you can perhaps let it play a couple of thousand games against my minimax implementation.