AIs for Games, Part III, Pruning Min-Max for Pentago

Performance before any optimisations

I created a few boards and timed how long it took the algorithm to run a depth-2 search on these boards, starting with both Player 0 and Player 1. An image showing the boards used is at the end of the post. The results are:

Board: 0, Player: 0, Time taken: 1.59
Board: 0, Player: 1, Time taken: 1.58
Board: 1, Player: 0, Time taken: 0.00117
Board: 1, Player: 1, Time taken: 0.00103
Board: 2, Player: 0, Time taken: 5.04
Board: 2, Player: 1, Time taken: 4.94
Board: 3, Player: 0, Time taken: 29.2
Board: 3, Player: 1, Time taken: 32.0
Board: 4, Player: 0, Time taken: 9.55
Board: 4, Player: 1, Time taken: 9.25

I also tried running a depth-3 search on Board 4 for Player 1 (because Player 1 can find a winning move with this depth), but it did not finish running even after a couple of hours of running.

The optimisations

I made several optimisations. I describe them here chronologically, i.e., in the order I tried implementing them.

Alpha-beta pruning

There are situations where a board position does not need to be analysed, because based on the boards we have already analysed, this board position will definitely not be chosen in optimal player. This resulted in the following times:

Board: 0, Player: 0, Time taken: 1.34
Board: 0, Player: 1, Time taken: 1.39
Board: 1, Player: 0, Time taken: 0.00120
Board: 1, Player: 1, Time taken: 0.00104
Board: 2, Player: 0, Time taken: 5.28
Board: 2, Player: 1, Time taken: 4.61
Board: 3, Player: 0, Time taken: 26.1
Board: 3, Player: 1, Time taken: 29.4
Board: 4, Player: 0, Time taken: 8.62
Board: 4, Player: 1, Time taken: 8.28

This was not a significant improvement in the times. I was surprised by this. I decided to run cProfile to try to determine why there was not a significant time-save. It seemed to be that the process of creating new boards was taking up a lot of time - and I was only pruning a board after the board was created. I needed to prune before the board was created.

To achieve this, I had to significantly re-structure the whole program, removing the frontier and instead writing the main function find_move recursively. The resulting times were a significant improvement:

Board: 0, Player: 0, Time taken: 1.07
Board: 0, Player: 1, Time taken: 1.06
Board: 1, Player: 0, Time taken: 0.00118
Board: 1, Player: 1, Time taken: 0.00102
Board: 2, Player: 0, Time taken: 1.90
Board: 2, Player: 1, Time taken: 1.86
Board: 3, Player: 0, Time taken: 5.33
Board: 3, Player: 1, Time taken: 5.41
Board: 4, Player: 0, Time taken: 2.98
Board: 4, Player: 1, Time taken: 2.90

Phew! It was satisfying to see the times drop so much, and this motivated me to keep going.

Stop if winning move found

If a winning move was found in a current board position, there is no need to continue analysing this position, so can stop this early. Pruning does not detect this (at least, not how I implemented it. This could be a sign I did it wrong…), so I had to manually add this. This resulted in further improvements:

Board: 0, Player: 0, Time taken: 1.09
Board: 0, Player: 1, Time taken: 1.11
Board: 1, Player: 0, Time taken: 0.00104
Board: 1, Player: 1, Time taken: 0.00119
Board: 2, Player: 0, Time taken: 0.324
Board: 2, Player: 1, Time taken: 0.344
Board: 3, Player: 0, Time taken: 0.141
Board: 3, Player: 1, Time taken: 0.897
Board: 4, Player: 0, Time taken: 3.12
Board: 4, Player: 1, Time taken: 2.93

I also tried running the depth-3 search on board 4, and to my surprise it ended in under a minute!

Board: 4, Depth 3, Time taken: 59.1

Note this would not be representative of a generic depth-3 search, because the winning move is found about 1/6 of the way into the full search.

Lists within lists

Running cProfile revealed that having nested lists to encode the board slows things down considerably, so I re-wrote the program so that the board was represented by a single list. I was worried this would take a lot of effort, but fortunately it consisted of replacing board[i][j] with board[6*i + j], and other similar simple changes. This halved the times:

Board: 0, Player: 0, Time taken: 0.407
Board: 0, Player: 1, Time taken: 0.411
Board: 1, Player: 0, Time taken: 0.000520
Board: 1, Player: 1, Time taken: 0.000450
Board: 2, Player: 0, Time taken: 0.175
Board: 2, Player: 1, Time taken: 0.161
Board: 3, Player: 0, Time taken: 0.0645
Board: 3, Player: 1, Time taken: 0.424
Board: 4, Player: 0, Time taken: 1.41
Board: 4, Player: 1, Time taken: 1.41
Board: 4, Depth 3, Time taken: 30.7

Checking if the game is over

Running cProfile again showed that the new bottle-neck was checking if the game had ended. This involved looping through the set of all possible winning lines, and checking to see if Player 0 or Player 1 occupied all the positions in each line.

Originally, I had one for-loop to check if Player 1 won, then another to check if Player 0 won. I changed this to have a single loop, and for each line check if Player 1 or Player 0 won. This resulted in another big chunk of time-saving.

Board: 0, Player: 0, Time taken: 0.292
Board: 0, Player: 1, Time taken: 0.283
Board: 1, Player: 0, Time taken: 0.000499
Board: 1, Player: 1, Time taken: 0.000334
Board: 2, Player: 0, Time taken: 0.101
Board: 2, Player: 1, Time taken: 0.107
Board: 3, Player: 0, Time taken: 0.0359
Board: 3, Player: 1, Time taken: 0.261
Board: 4, Player: 0, Time taken: 0.943
Board: 4, Player: 1, Time taken: 0.915
Board: 4, Depth 3, Time taken: 21.6

Checking cProfile showed that I had halved the time to check if the game had ended, but it was still the biggest bottle neck.

I then tried to re-design the program to cut down further, but to no avail. For example, I tried to group the set of lines into groups that could be ruled out together, e.g. if I know that position (2,2) in the board is empty, that rules out 7 of the lines. It will be interesting to know if there is a more efficient way to check if the game has ended!

Tidying up and fixing a “bug”

My code was becoming untidy (I was not using version control properly, and instead was creating multiple versions of functions in the same file) so I tidied up all the code. While doing this, I discovered that I did not correctly update the prune function during the ‘Lists within lists’ step: I was only pruning boards of at least depth 2, when it could be pruning board of depth 1. I made the necessary tweak, resulting in the following times:

Board: 0, Player: 0, Time taken: 0.0896
Board: 0, Player: 1, Time taken: 0.0682
Board: 1, Player: 0, Time taken: 0.000272
Board: 1, Player: 1, Time taken: 0.000204
Board: 2, Player: 0, Time taken: 0.100
Board: 2, Player: 1, Time taken: 0.0539
Board: 3, Player: 0, Time taken: 0.313
Board: 3, Player: 1, Time taken: 1.13
Board: 4, Player: 0, Time taken: 0.204
Board: 4, Player: 1, Time taken: 0.310
Board: 4, Depth 3, Time taken: 3.18

Woohoo! What big progress. What used to take hours now only takes 3 seconds.

Duplicate board positions

The last thing I wanted to try was dealing with repeat positions. Previously I only skipped these if the same position occurred and they had same parent. But now I wanted to have a way of skipping board positions regardless of where they were in the game-tree. This took many hours to get correct, because my first attempt caused the algorithm to produce sub-optimal moves, and I had no idea why.

The error was that when I pruned a board, I would finalise the board’s value, though the board was not fully analysed. Then, when the board occurred somewhere else in the tree, I would use this incomplete value and miss out all the analysis that was pruned the first time around.

After fixing the bug, the new times are:

Board: 0, Player: 0, Time taken: 0.2888009548187256
Board: 0, Player: 1, Time taken: 0.15045881271362305
Board: 1, Player: 0, Time taken: 0.00028705596923828125
Board: 1, Player: 1, Time taken: 0.0002300739288330078
Board: 2, Player: 0, Time taken: 0.0822603702545166
Board: 2, Player: 1, Time taken: 0.03888416290283203
Board: 3, Player: 0, Time taken: 0.30017614364624023
Board: 3, Player: 1, Time taken: 0.7593698501586914
Board: 4, Player: 0, Time taken: 0.3503570556640625
Board: 4, Player: 1, Time taken: 0.3380570411682129
Board: 4, Depth 3, Time taken: 3.19

The times are not always better, and some are worse.

Next steps and final thoughts

The next step is to introduce neural networks. A brief google search reveals that min-max is not appropriate and that I should have been using reinforcement learning. Doh! In the back of my mind, I was curious as to how the neural network could learn the heuristic function; what would the loss/error be that it would minimise?

Though the optimisation of the min-max algorithm is incomplete (e.g. I do not understand why the latest version is not faster than the previous version), I will end it here. This is because I have already spent a couple of days on this, I have already learnt from this, and it is not necessary for the bigger goal of developing a neural network.

Some final takeaways:

I should have sketched out a plan of the whole project. Though I had basic knowledge of neural networks, I should have researched a bit more and found out that min-max is not appropriate for neural networks.
Be more thorough with testing. It makes spotting bugs easier and quicker.
Seed random number generators. I used used random heuristics (to see effects of pruning), but I did not seed them. This means the times above are not fair comparisons, as some random numbers could have lead to more pruning than others.
Use proper version control. My code got hideous at one point. At least now I have a better sense of the workflow of git.

The code

The code, at this stage of project, can be found on github.