Solving the Dice Game Pig Using Approximation Methods
Today I'll explain how to solve the Hold or Continue? problem, which appeared in the ICPC Latin American Regional – 2019 competitive programming contest. This was one of the hardest problems in the set, which you can find here.
The problem can be summarized as follows:
Pig is a simple dice game for two or more players. Each turn, a player repeatedly rolls a dice until either a 1 is rolled or the player decides to “hold”:
Hold or Continue (Problem statement). ICPC Latin American Regional – 2019
If the player rolls a 1, they score nothing in their turn and it becomes the next player’s turn.
If the player rolls any other number, it is added to their turn total and the player can decide between “hold” or “continue”.
If the player chooses to “hold”, their turn total is added to their score and it becomes the next player’s turn. Otherwise the player continues rolling the dice.
The first player to score exactly 75 wins the game. If a player’s score plus their turn total exceeds 75, they score nothing in their turn and it becomes the next player’s turn.
Based on the above game rules, your objective is then to determine, given a game state, which decision (either hold or continue) yields the greatest probability of winning. A game state is specified by a tuple containing three values: our total score, opponent's total score, and the current turn score. In this problem, there are only two players, and it's assumed that both play optimally.
This problem is very different to most of the Pig programs I've found on the internet. It's also different from all papers I've read about the subject. I found that in most cases, a simpler to implement variation was used.
As you'll see, this variation of Pig involves solving a rather hard subproblem involving infinite recursive calls when implementing the mathematical model naively. We are going to solve this issue by approximating some probabilities numerically.
Mathematical Modeling
This problem can be modeled as you would expect, by following the game instructions, and calculating probabilities normally.
All of the following functions return a value between 0 and 1 inclusive, and they represent the probability of winning with a total score of \(i\), opponent score of \(j\), and a turn score of \(k\).
What is the possibility of winning when we hold? The turn's accumulated score is added to our total, but it also becomes the opponent's turn, so we must swap the scores, and calculate the probability of the opponent not winning.
$$P_{hold}(i, j, k) = 1 - P(j, i + k, 0) $$
Similarly, when we continue rolling the dice, there's a probability of \(\frac{1}{6}\) that the result will be 1 (we score nothing and it becomes the opponent's turn), and there's also a probability of \(\frac{1}{6}\) for every other number, which will be added to the current turn score.
$$P_{continue}(i, j, k) = \frac{1 - P(j, i, 0)}{6} + \sum_{d=2}^6 \frac{P(i, j, k + d)}{6}$$
Next, putting it all together, we get the actual \(P\) function which is the one we need. This function is implemented as a piecewise function containing the following conditions:
- If the opponent's score is 75, then the opponent wins, and by consequence we lose.
- If our total score plus the current turn score is greater than 75, then we score nothing and it becomes the opponent's turn.
- For every other case, return the probability of the decision that yields the greatest probability of winning (hold or continue). This is because it's assumed that both players play optimally at every point in the game.
$$
P(i, j, k) = \left\{\begin{aligned}
&0 && , && j = 75\\
&1 - P(j, i, 0) && , && i + k > 75\\
&max(P_{hold}(i, j, k), P_{continue}(i, j, k)) && , && \text{default case}
\end{aligned}
\right.$$
So far it seems to make sense, but there is one huge problem with this mathematical model. If this was to be implemented this way, there would be a cycle when calling \(P(i, j, 0)\), since it would oscillate infinitely between \(P(i, j, 0)\) and \(P(j, i, 0)\), never converging to any real number.
Using Iterative Approximation Methods
One way to solve this, is by approximating the probability when \(k = 0\).
Intuitively, it's possible to tell that if both players have just started a new game, and both have a score of 0, then the probability of winning would be near 0.5, since both players have almost equal probability of winning. Since one player always starts first, though, that player has a slightly greater probability of winning. According to my implementation, that value is \(P(0, 0, 0) = 0.52692\).
Similarly, if our total score is 70, and the opponent has 0, then we can assume that we have a greater probability of winning than the opponent does. Using my code, that probability is \(P(70, 0, 0) = 0.929041\).
It's also possible to intuitively tell that \(P(i, j, 4)\), \(P(i, j, 3)\), \(P(i, j, 2)\), \(P(i, j, 1)\) and \(P(i, j, 0)\) are decreasing values, and that they are close to each other. Here are some examples (again, calculated by my program):
$$
P(45, 10, 4) = 0.771326\\
P(45, 10, 3) = 0.768724\\
P(45, 10, 2) = 0.766302\\
P(45, 10, 1) = 0.763970\\
P(45, 10, 0) = 0.761767
$$
Note that the probability gradually decreases when going from \(k=4\) to \(k=0\). This shows that the value for \(k=0\) could somehow be interpolated from all the other values, assuming they were known (which they aren't.)
As a side note, remember that it's actually not possible to have a \(P(i, j, 1)\) since the player loses their turn when they roll a one, but the probability can still be calculated nevertheless.
Let's refactor \(P\) to include a way to handle the problematic case:
$$
P(i, j, k) = \left\{\begin{aligned}
&0 && , && j = 75\\
&1 - P(j, i, 0) && , && i + k > 75\\
&max(P_{hold}(i, j, k), P_{continue}(i, j, k)) && , && k > 0\\
&P_{approximate}(i, j) && , && k = 0
\end{aligned}
\right.$$
Now we must implement \(P_{approximate}\), which can be done iteratively using something like this:
#define MAX_SCORE 75 // ... double p_approximate(int i, int j) { double prob = 0.5; array<double, MAX_SCORE + 1> approx; for (int t = 0; t < 41; t++) { for (int k = MAX_SCORE; k >= 0; k--) { double hold = 0; double cont = 0; if (k > 0 && i + k <= MAX_SCORE) { hold = p_hold(i, j, k); } for (int dice = 1; dice <= 6; dice++) { if (dice == 1 || k + dice > MAX_SCORE) { cont += 1 - prob; } else { cont += approx[k + dice]; } } approx[k] = max(hold, cont / 6); } prob = approx[0]; swap(i, j); } return prob; }
This code snippet will be explained in the following section.
Full source code in C++ can be found here.
How Does It Work?
This algorithm basically simulates the entire game, but using a slight variation.
It starts by assuming that the \(prob\) variable, which actually represents the value for \(P(i, j, 0)\) is equal to \(0.5\). Then we simulate the entire game by starting from \(P(i, j, 75)\), and decreasing the turn score until it reaches \(P(i, j, 0)\). Repeat the process several times to make \(prob\) converge to the correct value. Different initial values also work, but \(0.5\) requires less iterations for the approximation to converge.
The \(approx\) array is a temporary array representing \(approx[k] = P(i, j, k)\). In the end we are only interested in \(approx[0]\), which is equal to the value we are looking for, \(P(i, j, 0)\).
Since each probability being approximated depends on probabilities which index in the \(approx\) array is higher, the loop is done backwards, so that the values are computed in dependency order:
for (int k = MAX_SCORE; k >= 0; k--) { // ...
The logic for hold and continue is implemented following the game rules, and we choose the greatest value among the two, similar to the \(P\) function modeled above:
double hold = 0; double cont = 0; if (k > 0 && i + k <= MAX_SCORE) { hold = p_hold(i, j, k); } for (int dice = 1; dice <= 6; dice++) { if (dice == 1 || k + dice > MAX_SCORE) { cont += 1 - prob; } else { cont += approx[k + dice]; } } approx[k] = max(hold, cont / 6);
After one iteration of the approximation process is done, we swap the scores before doing the next iteration. This is the equivalent of simulating the opponent's turn.
prob = approx[0]; swap(i, j);
Remember that \(prob\) represents \(P(i, j, 0)\), so after the scores are swapped, the \(prob\) variable being used in the continue logic now represents \(P(j, i, 0)\), which is exactly the same as the mathematical model defined earlier.
for (int dice = 1; dice <= 6; dice++) { if (dice == 1 || k + dice > MAX_SCORE) { cont += 1 - prob; } else { cont += approx[k + dice]; } }
The code snippet above is exactly the same as:
$$\frac{1 - P(j, i, 0)}{6} + \sum_{d=2}^6 \frac{P(i, j, k + d)}{6}$$
The \(P(i, j, 0)\) calculated in one iteration becomes the \(P(j, i, 0)\) used in the next one. Since \(P(i, j, 0)\) and \(P(j, i, 0)\) depend circularly on each other, it's necessary to compute both simultaneously by approximating both. This is why we swap the scores on each iteration.
In other words, the only incognita in this approximation process is \(prob\), which is equal to \(P(i, j, 0)\), and since \(P(j, i, 0)\) also depends on that value, we need to approximate both at the same time.
It's also important to note that in order for this function to return \(P(i, j, 0)\), you must make sure to execute an odd amount of iterations, otherwise it'd return its swapped counterpart \(P(j, i, 0)\).
Alternative Method: Adding an Extra State for the Round Number
Another very simple way of approximating \(P(i, j, 0)\) is by adding a new variable: number of rounds passed. The C++ solution is here. Let's examine the most important aspects of this alternative solution.
My implementation uses top-down dynamic programming using the following 4D array for memoization, which represents the state \((i, j, k, t)\), where \(t\) is the number of rounds passed:
double memo[76][76][76][38];
For the first round with \(t = 0\), we simply assume the probability is \(0.5\) for any given score tuple. Just like the previous method, we can use different initial values, but deviating too much from the \([0, 1]\) range will require more iterations, so \(0.5\) is a safe guess.
double p(int i, int j, int k, int t) { if (t == 0) return 0.5; if (j == 75) return 0; if (i + k > 75) return 1 - p(j, i, 0, t); if (memo[i][j][k][t] > -1) return memo[i][j][k][t]; return memo[i][j][k][t] = max(p_hold(i, j, k, t), p_continue(i, j, k, t)); }
Finally, in order to resolve the problematic cycles, we can simply refer to the values of previous rounds. These values are all known.
double p_hold(int i, int j, int k, int t) { return 1 - p(j, i + k, 0, t - 1); } // ... for (int dice = 1; dice <= 6; dice++) { if (dice == 1 || i + k + dice > 75) ret += 1 - p(j, i, 0, t - 1); // ...
In order to get the correct probabilities, you must call the \(P\) function using a relatively large number of rounds passed.
p(C, H, X, 38);
This solution may look different from the previous one, but actually it does something very similar. The approximation mechanism may be slightly different, but it follows the same principle.
Doing Some Optimizations
As you may have already noticed, this solution is pretty inefficient. By adding an extra state, we are using approximately 40 times more memory than we should.
It's possible to optimize it by replacing the value from the previous round, with the value of the next round. This way we don't consume extra memory, and still manage to simulate several rounds.
Full C++ solution for this version is here. I'd say this is the simplest and most compact version of all. It's also the one with the fastest CPU time.
int iter = 35; while (iter--) { for (int i = 0; i < 75; i++) { for (int j = 0; j < 75; j++) { for (int k = 75 - i; k >= 0; k--) { solution[i][j][k] = max(p_hold(i, j, k), p_continue(i, j, k)); } } } }
So, Which Decision to Take? Hold or Continue?
Once all of the above is implemented, getting the correct decision is trivial, since all we have to do now is, for every given query state, compare \(P_{hold}(i, j, k)\) with \(P_{continue}(i, j, k)\) and find the one with the greatest value.
If \(P_{hold}\) is greater than \(P_{continue}\), then the player should hold. Otherwise, the optimal move is to continue.
Conclusion
Three different methods were explained in order to approximate the values of \(P(i, j, 0)\). They work slightly differently, but in the end achieve the same result.
Proving why the value converges is out of the scope of this article, since it seems some extra math knowledge is required, which sadly I don't have at the moment.
I'm not entirely sure why it converges so quickly, and why in some implementations it works well even if the initial value is an invalid probability like \(1.5\). I guess an in depth trace of the approximation algorithm would be necessary in order to know what's going on at every step.
List of C++ solutions used in this article: