October 06, 2017
Hex is a simple perfect information board game, which, while having rules arguably no more complicated than Tic-Tac-Toe, has deep strategy and tactics. It is mostly played online, with the most popular site being Little Golem.
I’ve written a web app which calculates what could be called the advantage of each board position occuring in hex games (roughly, the effective adjustment to a player’s Elo rating when starting from that board position), using the set of games played on Little Golem, while also showing which moves are most common in each position. This is mostly useful for analyzing different openings.
The Bradley-Terry model is a mathematical model describing the outcomes of games, which includes a concept of the strength of a player called their rating. Specifically, the model says that we can associate to each player a number r, called their rating, such that given two players A and B with ratings rA and rB, the probability that A will win in a game between them is given by 1 / (1 + exp(rA - rB)).
This can be extended to describe games which have a first move advantage, by assigning a rating value (call it rfirst) to that advantage. Then the probability that player A will win when going first becomes 1 / (1 + exp(rA + rfirst - rB)), and when going second it is 1 / (1 + exp(rA - rfirst - r_B)). Essentially, we assume that difference between playing as the first and second player amounts to a constant rating difference.
It is natural to extend this to any board position, defining the advantage of a board position as the effective change to a player’s rating from starting in that board position. This web app does that for all the board positions which have occurred in games on Little Golem, calculating a maximum likelihood value for the advantage of each move, and an error estimate.
Compared to, for example, simply using the ratio of wins to losses, move advantage should be a better measure of how good a move is. We would expect that win ratios bias towards moves which good players like to play, regardless of whether they are the best possible move they could play. With move advantage, even if a good player frequently wins when playing a certain opening move, if they win less often than expected based on their rating, that move will be calculated to have negative advantage based on their games.
Since normally hex would have a strong first player advantage, it is played with the swap rule, which incentivizes the first player to make a first move which is neither too good nor too bad, rather than playing the best possible move. This means that the first move played should have an advantage as close to 0 as possible (on subsequent moves, a higher advantage is better).
a3 is by far the most popular opening move overall, and has an advantage of 11±3 (5±8 when restricting to the latest 5 years of data). a7, a13, a10, k2, and c2 are more popular among top players. Most are balanced within the margin of error when filterering to 2000+ players over the last 5 years (though there is not enough data for particularly accurate calculation), except for a7, which while it is the most popular opening move among this set of players, has an advantage of 228±72 rating points, suggesting that it is not at all a balanced opening move. However, this advantage disappears when looking at a broader range of players. I’m not exactly sure of the reason for this.
While every single position has been played in the first move of some game, this is not the case when restricting only to high-ranked players. For higher ranked players, the opening move is generally limited to b2, c2, k2, and a3-a13.
We can not calculate an advantage when all games played from a particular board position have been won, or when all such games have been lost. In this case a maximum likelihood value does not exist. These moves are labelled as ”∞” and ”-∞”, respectively.
These estimates are only as good as the players who produced them. Some moves may have a high advantage simply because players frequently respond incorrectly to them.
In order to calculate the advantage of moves, we need an estimate of the ratings of players in each game. Using the ratings from Little Golem is probably not sufficient, since ratings change over time. Also, these ratings can frequently be inaccurate, since when people stop playing they may forfeit games, in which case their rating is no longer accurate. To account for this, I recalculated all of the ratings, using the standard Elo algorithm, and ignoring forfeited games and games that are “too short”. To obtain accurate initial ratings we run the algorithm in reverse as well.
The maximum likelihood of the advantage is found using ternary search, and the error is obtained from the square root of the inverse of the second derivative of log likelihood at the maximum likelihood.
In swapped games the second player is simply considered as the first player, and the swap move is ignored.