return to index1. History
In early 1996 Fredrik Dahl and Effect Software AS released JellyFish 2.0. One of its features was the ability to roll out positions with "variance reduction." This feature allowed it to make short rollouts as accurate as much longer rollouts. Using this feature a rollout of a hundred games might well be just as accurate as a rollout of 2500 games done without the feature.
This was a breakthrough in backgammon technology. With variance reduction it became feasible to roll positions out using a program�s higher levels, like JellyFish level 6 or Snowie 2- or 3-ply. On these higher levels the programs think longer about each move, playing slower and better. But the loss of speed is mostly compensated for by the increased accuracy of variance reduced rollouts.
Variance reduction also made it practical��for the first time��to get trustworthy results from human rollouts. JellyFish�s interactive rollout feature allowed a user to play either or both sides of a position in a rollout, with the program then analyzing the games to calculate an equity far more accurate than the average result.
Although players had long been rolling positions out by hand, the reliability of these rollouts was very poor. It simply takes too many games to get an accurate result. This was not fully appreciated. Throughout the backgammon literature writers describe rollouts and propositions of��gasp��a hundred games. Playing them, this is a lot of games. But a hundred games aren�t much if we are hoping that the results will tell us the true value of a position. When variance reduction makes those hundred games worth 2500��now we have something.
Now Snowie 3 is out, and for the first time we have variance reduction for cubeful rollouts. This is another breakthrough in backgammon technology. Our ability to do meaningful cubeful analysis skyrockets with this new tool.
But shockingly it appears that many users are still doing rollouts as though it were 1995. Because of a distrust of variance reduction, they stick with rollouts performed by weaker players (JellyFish level 5 and Snowie 1-ply) and don�t bother to do their manual rollouts with JellyFish�s interactive rollout feature. This distrust stems from a lack of understanding of how variance reduction works.
Fredrik Dahl deserves great credit for bringing variance reduction to backgammon. But Fredrik is also partly responsible for the mistrust people have of variance reduction. When JellyFish 2.0 was released, Fredrik wanted to keep the exact formulation of his algorithm secret, for competitive advantage. Thus he couldn�t tell us exactly what he was doing. On the other hand, he must certainly have wanted buyers to know what a great and wondrous new feature this was (it was great and wondrous), so he couldn�t just report what we are now used to calling the "equivalent" games as the number of games rolled out. (If he had done this, it would probably have prevented a lot of misunderstanding.)
Instead, the algorithm was described in general terms. The JellyFish 2.0 User Manual says that the algorithm "can be interpreted as compensating for the luck of the rolls." This hint was not enough to make the procedure clear, but just enough that people mistrusted the algorithm. Compensating for the luck must rely on programs� evaluations, and as Rob Maier points out, "aren�t we doing the rollouts because we don�t really trust their evaluations?"
Variance reduction is no longer a secret. Many people have independently worked out how the algorithm works. Descriptions and variations of the basic idea have been posted to the backgammon newsgroup. Snowie has had it from its first release. It�s time for variation reduction to be described plainly.
2. Variance reduction procedure| 6 4 | |
| Board image courtesy of GO-Figure | |
The position is simple so that we can easily walk through several complete rollouts and so that we can check our rollouts against the exact answer. Blue�s exact winning chances are 1/6 (when he rolls doubles) + 5/6 x 1/4 (when he doesn�t and White fails to bear off) = 9/24 = 37.5%.
We�ll roll this position out using variance reduction, doing all our calculations in terms of Blue�s winning percentage. Each trial the game is played out to the end, just as in a normal rollout, and we note the final result. In addition, for each roll of the dice we estimate how lucky Blue was, and we accumulate these estimates over the whole game. When the game is finished we�ll subtract this accumulated luck from the game result to "cancel out" Blue�s luck, and this will give our variance reduced result for the game.
We�ll discuss how a computer program estimates the luck later. But for now we�ll use the luck adjustments in the table below. The numbers in the "Quantified" luck column aren�t the "right" values��I just made them up. Later on we�ll see that the exact adjustments aren�t particularly important.
Turn | What Happened | Blue Lucky? | How Lucky/Unlucky? | Quantified |
First | Blue rolled doubles | Yes | Very Lucky | +50% |
First | Blue rolled non-doubles | No | Marginally Unlucky | �10% |
Second | White failed to bear off | Yes | Pretty Lucky | +45% |
Second | White bore off | No | Somewhat unlucky | �15% |
When we say that Blue has been lucky, we mean that his chances are higher after the roll than they were before. Conversely with unlucky rolls: Blue would prefer to roll over.
Each turn is considered independently for deciding whether a roll is lucky or not. Here, whenever White gets to roll she has already had a little good fortune (Blue didn�t roll doubles), but we don�t consider the past when estimating the luck for a particular turn.
With our luck estimates in hand we�re ready to do a variance reduced rollout.
Game 1These games and seven more are summarized in the table below. All figures are percentages from Blue�s perspective.
Game |
First Turn | Blue�s Luck |
Second Turn | Blue�s Luck | Game Result | Total Luck | Variance Reduced |
1 | Non-doubles | �10 | White fails | +45 | 100 | +35 | 65 |
2 | Non-doubles | �10 | White bears off | �15 | 0 | �25 | 25 |
3 | Doubles | +50 | 100 | +50 | 50 | ||
4 | Non-doubles | �10 | White bears off | �15 | 0 | �25 | 25 |
5 | Non-doubles | �10 | White bears off | �15 | 0 | �25 | 25 |
6 | Doubles | +50 | 100 | +50 | 50 | ||
7 | Non-doubles | �10 | White bears off | �15 | 0 | �25 | 25 |
8 | Non-doubles | �10 | White fails | +45 | 100 | +35 | 65 |
9 | Non-doubles | �10 | White bears off | �15 | 0 | �25 | 25 |
10 | Non-doubles | �10 | White fails | +45 | 100 | +35 | 65 |
Total | 500 | 420 | |||||
Average | 50 | 42 |
After ten games, Blue has won five, so that the result of the unadjusted rollout is that Blue wins 50%. The variance reduced results give Blue 4.2 games out of ten, for 42%. Blue�s true chances are 37.5%, so the variance reduced result is closer to the true value. This is generally the case with variance reduction.
How a program estimates luck
Program luck estimates for second roll | |||
What Happened | How Often | Evaluation | Blue�s Luck |
White bears off | 3/4 of the time | 0% | �25% |
White fails to bear off | 1/4 of the time | 100% | +75% |
Average | 25% | ||
The program actually goes through all 36 rolls and plays them out on the board. After 3/4 of the rolls the program sees that Blue has lost the game, so Blue has 0% chances. After the other 1/4 of the rolls the program sees that Blue has the game won, so Blue has 100% chances. Putting all the rolls together the program calculates that before the roll Blue has 25% chances. Thus, when Blue loses, the luck of the dice have cost him 25%, but when Blue wins the dice have given him 75% he didn�t have before.
Similarly, for the first roll:
Program luck estimates for first roll | |||
What Happened | How Often | Evaluation | Blue�s Luck |
Blue rolls doubles | 1/6 of the time | 100% | +62.5% |
Blue rolls non-doubs | 5/6 of the time | 25% | �12.5% |
Average | 37.5% | ||
The luck estimates in these two tables are the correct values, and any computer program with even a tiny racing database should use these numbers.
Procedure summaryBy doing many trials we reduce the effect of this luck. Sometimes one side gets lucky, and sometimes the other. Most of the luck will get canceled out if we do many trials because one side�s good luck in game 7 is offset by the other side�s good luck in game 13.
Variance reduction takes a more aggressive approach. Rather than relying on the luck to even out, variance reduction tries to remove the luck directly. Every turn the procedure estimates how lucky a player was, and this luck is subtracted out at the end. If the luck estimates are any good, they cancel out much of the luck.
Let�s look at our ten game rollout from before, but now using the correct luck values:
Game |
First Turn | Blue�s Luck |
Second Turn | Blue�s Luck | Game Result | Total Luck | Variance Reduced |
1 | Non-doubles | �12.5 | White fails | +75 | 100 | +62.5 | 37.5 |
2 | Non-doubles | �12.5 | White bears off | �25 | 0 | �37.5 | 37.5 |
3 | Doubles | +62.5 | 100 | +62.5 | 37.5 | ||
4 | Non-doubles | �12.5 | White bears off | �25 | 0 | �37.5 | 37.5 |
5 | Non-doubles | �12.5 | White bears off | �25 | 0 | �37.5 | 37.5 |
6 | Doubles | +62.5 | 100 | +62.5 | 37.5 | ||
7 | Non-doubles | �12.5 | White bears off | �25 | 0 | �37.5 | 37.5 |
8 | Non-doubles | �12.5 | White fails | +75 | 100 | +62.5 | 37.5 |
9 | Non-doubles | �12.5 | White bears off | �25 | 0 | �37.5 | 37.5 |
10 | Non-doubles | �12.5 | White fails | +75 | 100 | +62.5 | 37.5 |
Total | 500 | 375 | |||||
Average | 50 | 37.5 |
With the correct luck values, variance reduction cancels the luck perfectly, and every game gives a result of Blue winning 37.5%. When luck estimates are perfect, a single game will always give the correct result, regardless of what happened during that game.
But in general a program cannot estimate the luck perfectly. Fortunately, it doesn�t need to estimate the luck very well for variance reduction to be successful. All that is necessary is to have some idea of which rolls are good and which bad.
A program will know that in a race 66 is good and 21 is bad. A program knows that (in the great majority of cases) entering is good and dancing bad; that hitting is good and missing bad; that making points is good and stacking up bad; and so forth. As long as a program is able to meet this crude level of reliability, it will be able to cancel some of the luck in a rollout, and thus make the rollout more accurate.
Even if positions come up where a program can�t tell a good roll from a bad one, all is not lost. Typically for these positions a program thinks that all rolls are roughly equivalent, and so it won�t add a lot of luck one way or the other. But when a real joker comes along, a program is very likely to understand that one side was very lucky, and it will compensate for this.
In our example, our original luck values were not very good. But they did recognize that it was good for Blue to roll a double, and bad for Blue when White bears off. That�s sufficient. With these luck values, a variance reduced rollout of our example position is typically as accurate as a regular rollout done for eight times as many games.
Variance reduction makes rollouts more accurate by subtracting some of the luck that occurs each game. This works as long as a program usually has a rough idea of which rolls are good and which rolls are bad in the positions that arise during the rollout.
4. Why bad evaluations don�t introduce biasAdding and subtracting zeroes
Similarly, if we add a bunch of numbers, where each is zero on average, then on average the final result will be zero.
Here�s an example of a number that is zero on average. We roll a red die and a green die, and we subtract the red die from the green die. If we roll a green 3 and a red 1, we have 3 � 1 = 2. If we roll a green 1 and a red 5, we have 1 � 5 = �4. On any particular throw the dice might combine for anything from �5 to +5, but on average the result will be zero.
If we add the result of many tosses of these dice, the total will be zero on average. Probably we won�t get exactly zero, but if we do the experiment many times and Take the average, we are very likely to get a result close to zero.
If we Take a number and subtract zero, we get the original number back. 7 � 0 = 7.
Similarly, if we subtract from a number another number that is zero on average, then on average we get back the starting number.
If we take our starting number 7 and modify it by subtracting the result of a toss of the red and green dice, we could get a value as low as 2 (7 � (green 6 � red 1)) or as high as 12 (7 � (green 1 � red 6)), but on average we�ll just get back 7. Modifying the 7 in this way doesn�t change the average result at all; it just introduces some luck into what we get.
Because adding many numbers that average zero gives a total that also averages zero, we could roll the red and green dice fifty times, total the result, and subtract that from 7. The result will still be 7 on average. Each toss of the dice will be zero on average; thus the sum of fifty tosses will be zero on average; thus subtracting the total from 7 will leave us with 7, on average.
On average, no one gets luckyLet�s look again at a program�s luck estimates for our example position:
Program luck estimates for first roll | |||
What Happened | How Often | Evaluation | Blue�s Luck |
Blue rolls doubles | 1/6 of the time | 100% | +62.5% |
Blue rolls non-doubs | 5/6 of the time | 25% | �12.5% |
Average | 37.5% | ||
On the first roll, 1/6 of the time Blue�s luck is +62.5%, 5/6 of thetime it is �12.5%. The average luck is zero: 1/6 x 62.5 + 5/6 x �12.5 =0.
Program luck estimates for second roll | |||
What Happened | How Often | Evaluation | Blue�s Luck |
White bears off | 3/4 of the time | 0% | �25% |
White fails to bear off | 1/4 of the time | 100% | +75% |
Average | 25% | ||
On the second roll, 3/4 of the time Blue�s luck is �25%, 1/4 of the time it is +75%. The average luck is zero: 3/4 x �25 + 1/4 x 75 = 0.
This isn�t true just for our example. Because of the way a program estimates luck, the average luck is always zero. The luck associated with a roll is the difference between the evaluation after the roll and the average of this value over all 36 rolls. The difference between any value and its average averages zero. That's the way averages work.
The average total luck for a game is also zero. When we add a series of values, where each value is zero on average, the total will also be zero on average.
But what if the evaluations are really bad?
Let�s work through a variance reduced rollout using the following extremely poor luck estimates:
Turn |
What Happened |
Blue Lucky? |
How Lucky/Unlucky? | Quantified Very Badly |
First | Blue rolled doubles | Yes | Very Lucky | �62.5% |
First | Blue rolled non-doubles | No | Marginally Unlucky | +12.5% |
Second | White failed to bear off | Yes | Pretty Lucky | �75% |
Second | White bore off | No | Somewhat unlucky | +25% |
Here, when Blue rolls doubles, our quantified luck says that his chances have gone down 62.5%. When White bears off on the second turn, we�re saying that Blue�s chances have gone up 25%. In fact, these adjustments are exactly backwards. We are saying Blue is lucky when he was unlucky, and vice-versa.
But note that the luck values still average zero for each roll. On the first roll, 1/6 of the time Blue�s luck is �62.5%, 5/6 of the time it is +12.5%, for an average of zero. On the second roll, 3/4 of the time Blue�s luck is +25%, 1/4 of the time it is �75%, also averaging zero. Because the average luck is zero, a variance reduced rollout with these values will still give the right answer, despite the horrifically bad evaluations.
Let�s see how our previous ten game rollout would play out with these luck estimates:
Game |
First Turn | Blue�s Luck |
Second Turn | Blue�s Luck | Game Result | Total Luck | Variance Reduced |
1 | Non-doubles | +12.5 | White fails | �75 | 100 | �62.5 | 162.5 |
2 | Non-doubles | +12.5 | White bears off | +25 | 0 | +37.5 | �37.5 |
3 | Doubles | �62.5 | 100 | �62.5 | 162.5 | ||
4 | Non-doubles | +12.5 | White bears off | +25 | 0 | +37.5 | �37.5 |
5 | Non-doubles | +12.5 | White bears off | +25 | 0 | +37.5 | �37.5 |
6 | Doubles | �62.5 | 100 | �62.5 | 162.5 | ||
7 | Non-doubles | +12.5 | White bears off | +25 | 0 | +37.5 | �37.5 |
8 | Non-doubles | +12.5 | White fails | �75 | 100 | �62.5 | 162.5 |
9 | Non-doubles | +12.5 | White bears off | +25 | 0 | +37.5 | �37.5 |
10 | Non-doubles | +12.5 | White fails | �75 | 100 | �62.5 | 162.5 |
Total | 500 | 625 | |||||
Average | 50 | 62.5% |
After ten games, the variance "reduced" result is 62.5%, and is less accurate than the actual game result of 50%. Let�s continue the rollout for another twenty games:
Game |
First Turn |
Second Turn | Game Result | Cumulative Actual | Variance Reduced | Cumulative Var Redux |
1-10 | 500 | 625 | ||||
11 | Non-doubles | White bears off | 0 | �37.5 | 587.5 | |
12 | Non-doubles | White fails | 100 | 600 | 162.5 | 750 |
13 | Non-doubles | White fails | 100 | 700 | 162.5 | 912.5 |
14 | Doubles | 100 | 800 | 162.5 | 1075 | |
15 | Non-doubles | White bears off | 0 | �37.5 | 1037.5 | |
16 | Non-doubles | White bears off | 0 | �37.5 | 1000 | |
17 | Non-doubles | White bears off | 0 | �37.5 | 962.5 | |
18 | Non-doubles | White bears off | 0 | �37.5 | 925 | |
19 | Non-doubles | White bears off | 0 | �37.5 | 887.5 | |
20 | Non-doubles | White bears off | 0 | �37.5 | 850 | |
21 | Non-doubles | White bears off | 0 | �37.5 | 812.5 | |
22 | Non-doubles | White bears off | 0 | �37.5 | 775 | |
23 | Doubles | 100 | 900 | 162.5 | 937.5 | |
24 | Non-doubles | White bears off | 0 | �37.5 | 900 | |
25 | Non-doubles | White bears off | 0 | �37.5 | 862.5 | |
26 | Doubles | 100 | 1000 | 162.5 | 1025 | |
27 | Non-doubles | White bears off | 0 | �37.5 | 987.5 | |
28 | Non-doubles | White bears off | 0 | �37.5 | 950 | |
29 | Non-doubles | White bears off | 0 | �37.5 | 912.5 | |
30 | Non-doubles | White fails | 100 | 1100 | 162.5 | 1075 |
Total | 1100 | 1075 | ||||
Average | 36.67% | 35.83% |
Both rollouts are moving closer to the true value of 37.5%. Despite using evaluations that are the exact opposite of the correct values, the variance reduced result is not too far off. With enough trials, both procedures will tend to give answers closer and closer to the true value. The skeptical reader is encouraged to extend the rollout for a few hundred games. (They go fast.)
With such terrible evaluations variance reduction actually becomes variance increasing. This is the reason the regular rollout is closer to the true value than the variance reduced rollout. Using these luck values it takes about four variance reduced games to equal one game of a regular rollout. Here we would be better off to not use variance reduction��not because we will get a different answer, but because we will get the same answer much more slowly.
In practice, evaluations are almost never this bad. Programs have at least a vague idea of which rolls are good and which bad over the course of a game for almost every type of position. And should this problem actually occur it is easily diagnosable. A rollout where variance reduction has done poorly will have a large standard deviation (JellyFish) or confidence interval (Snowie). For any kind of rollout we should pay attention to these numbers. With JellyFish, we can be very confident that if we were to roll the position out forever, we would get a result no more than two standard deviations away from the result we have so far. With Snowie, we can be similarly confident that the result of an infinite rollout would be within the confidence interval. If we need a tighter bound on the equity then we need to do more trials, whether or not the rollout was done with variance reduction.
5. Summary
Correctly implemented, variance reduction will never change the expected value of a rollout, because all variance reduction does is subtract numbers that average zero.Variance reduced rollouts give the same answers as regular rollouts. They just get there faster.