return to index| Due to the probabilistic nature of backgammon, it is often difficult to determine the proper play. One cannot prove that Play A is better than play B the way one can in chess by analyzing the possible variations which follow from each play. Since there are 21 possible dice rolls at everyturn, even looking ahead a couple of rolls is a difficult task. We haveto depend on our intuition and experience.
The one way we can test plays is by rolling out the position a number of times.This can give us a lot of insight into the position, but it can be quitelaborious. Rolling out a position 100 times takes several hours, andsuch a small sample is far from conclusive anyway. The luck element is just too great. Even if we put controls on the rollout which attempt to even out the luck element, the number of trials one can do in a short amount of time will not be sufficient to give us satisfactory results. The logical thing is to give the position to the computer. Our high-powered computers can roll out positions very quickly, and what would take human beingsdays can be done by the computers in a couple of minutes. The problem is that the computers may not play well enough for the rollouts to have any meaning. For simple positions such as bearoffs it is easy to program a computer to play pretty well, and the results of rollouts are likely to be accurate. For more complex positions, programming the computer to play well is much more difficult, and rollout results will be less trustworthy. Back in 1991 I wrote an article for Inside Backgammon about rollouts. This wasbefore the neural nets of today were invented. In that article I made theprediction that within five years we would have programs which played wellenough so that we could trust their rollout results, and that our knowledgeof backgammon would improve immensely. This turned out to be one of themost accurate predictions I have ever made. Shortly after this article, the program Expert Backgammon was released. This program was not a neural net -- its rules for making evaluations were handcarved. Still it played at a decent intermediate level, and had the capability of doing rollouts. The program wasn't as efficient as the ones we have today and the computers weren't nearly as fast, so it might takeover an hour to do a rollout of 1296 trials. Still the ability was there,and for many normal positions the results were believable. Our backgammon knowledge was starting to change. It soon became apparent to me that we needed many more trials than I had realized before our sample size was large enough, even assuming we could trust the computer to play well enough. For example, the first thing I did when I got Expert Backgammon was to roll out all the opening plays 1296 times each. Imagine my surprise when for an opening 4-2 the play 13/9, 13/11came out slightly better than making the four point! This wasn't due to the way Expert Backgammon was playing the positions; it was simply because thesample size wasn't large enough and the results for this particular rollouthappened to be on the far end of the bell curve. A larger rollout of over5000 trials showed that the result was a fluke and that making the fourpoint was considerably better. The play of the program was another matter. For more complex positions,back games in particular, it was immediately clear that the program hadno idea what it was doing and that the results were way off base. Forsimpler positions, it looked like most of the results were reasonable.Still there were problems. As a test, I compared rollouts from Expert Backgammon with rollouts from my own backgammon playing program I hadwritten. My program didn't play as well as Expert Backgammon, but it wasn'ttoo far behind. Still, the results from simple positions such as holdinggames sometimes differed by as much as .100 in equity between the twoprograms. This meant that one of the programs was badly misplaying eitherthe side playing the holding game or the side coming in against the holding game, and it wasn't particularly obvious where the misplays were coming from.What this illustrated was the importance of accurate play by the program ifa rollout was to be trusted. It should be noted that the main difficulties were with position evaluation.When it came to play vs. play decisions, the rollouts generally gave accurateresults if the plays led to fairly similar positions. The reason is that if the program was making mistakes playing the position, it would make similar mistakes for each play rolled out. Consequently, the mistakes wouldtend to cancel out, and the real differences between two plays being rolledout would show up. However if we were looking at the proper equity of the position for a cube decision, or if the two plays being rolled out led towildly different types of positions, there could be trouble. A little later, a different programming approach was taken. Instead of letting clumsy humans guess what weights to assign to various parameters,why not let the program decide for itself? The idea is to have the programstart from scratch with nothing or very little known about how to play backgammon except the rules, have the program play many thousands of games against itself, and from the results of these games determine what weights should be assigned to various parameters and how they should be intermixed. This is the neural network approach. Essentially the program "learns" the same way a human being learns, by seeing what is successful and modifying his behavior based on what he learns. However the program can dothis much more objectively, and can play many more games than a human canin his lifetime and remember the results. Would this approach compensatefor the human ability to think creatively? The first neural network backgammon program was written by Gerald Tesauro. It was called TD-Gammon. Bill Robertie played it a series of games, which he later published. The program made several obvious errors. Intechnical positions such as bearoffs it made some clear mistakes, and incomplex positions such as back games there were plenty of things it didwrong. For the most part, however, TD-Gammon played quite competently.In normal positions it consistently chose good moves, and demonstrateda reasonable understanding of concepts such as timing and flexibility.The best human players were still clearly better, but the program playedwell enough that it looked like its rollout results could be trusted. Infact Tesauro did have TD-Gammon roll out several interesting positions fromthe Robertie match, and the results looked quite reasonable. There was noquestion that TD-Gammon was considerably better than any previously writtenbackgammon-playing program. Tesauro did not stop there. Making use of ever-increasing computer speedand power, he upgraded TD-Gammon. He quadrupled its brain size, and hada much longer training session. Also, and very important, when the programplayed the computer it used was fast enough so that the program couldlook ahead one roll (we call this 2-ply). In other words, when the programanalyzed a candidate play it would look at all 21 possible dice rolls forthe opponent, find the best play (on its 1-ply analysis), and evaluate allthese resulting positions. It would then average them out to determineits equity estimate for the play under consideration. This is something ahuman would find impossible to do in a few seconds. However our high-speedcomputers were able to perform this look-ahead and still play at a normalpace. What were the results of these improvements? Once again, Robertie playeda series of games against TD-Gammon. The difference was like night and day.TD-Gammon was no longer a competent advanced player. It was a top-flightexpert, clearly competitive against the best players in the world. It stillmade some of the clearly wrong errors, but these didn't cost much. Howeverin the vague positional judgment areas where humans have the most trouble,TD-gammon would consistently find the best play. It was clear that ourtop experts would have to learn from the computer in order to keep upwith it. Unfortunately, the 2-ply analysis takes some time. This is fine for normalplay, but for rollouts it takes too long. The main gain from usingcomputers is that we can roll out a positions several thousand times ina few minutes, but to do this we have to live with the 1-ply rollout andthe weaker play of the 1-ply. Still, even at the 1-ply level the programplays decently. TD-Gammon was never sold commercially. However other programmers saw howsuccessul it was and tried writing their own neural network programs.One of them was Jellyfish, written by Fredrik Dahl. This was commercialized,with rollout capabilities and tutorial analysis during the play. For thefirst time players had access to the powerful neural networkanalysis. Not surprisingly, the average skill level of the tournamentplayer took a big jump. Also, some of the previously believed conceptsabout backgammon were overturned. The wild slotting style of the late 1970's and 1980's was, if the neural nets were to be believed, more costlythan previously thought. The race was found to be very important, andmany plays were based on racing potential. Purity was found to have beenoverrated, while ugly attacking plays proved to be stronger than expected.The style of the average good player drifted toward these new concepts.Of course, one does wonder if these results from the bots are somewhatself-fulfilling prophecies. Could it be that the bots prefers blitzes andraces to priming games and back games because it plays them better? The jury is still out on that topic, but we will be looking at the question. How strong do the neural nets play? Improved computer speed allowed anew version of Jellyfish which looks ahead two rolls (that is, both theopponent's roll and the program's next roll), which as might be expected leads to significant improvements. Would anybody be willing to put hismoney where his mouth is about the strength of the neural net? MalcolmDavis was. He challenged Nack Ballard and Mike Senkiewicz (two players who in anybody's opinion are among the top players of the world) to playJellyfish 300 games apiece for quite large stakes. The challenge wasaccepted, and in the summer of 1997 the match took place. I was there,helping to organize things. The players played on a regular board (so thatthey would be playing under conditions they were used to). Somebody satopposite them playing the computer's pieces, making the moves thecomputer recommended. The dice were rolled by the players at the board(not by the program), and the dice rolls and the plays of Nack and Mikewere input into the computer by a person operating the computer. Thematch went quite smoothly, and play moved along at a reasonably normalpace. At the end, Nack was +58 points and Mike was -58 points. This meantthat over 600 games Jellyfish had broken even against two of the topplayers in the world playing at their best since serious money was involved.While there is always a lot of luck in backgammon this is quite a few games, and in my opinion demonstrated that Jellyfish is competitiveagainst anybody. I have played through the games, and I don't think thatJellyfish had been particularly lucky overall -- it played just as wellas its opponents. In the last couple of years another commercial neural net has come out: Snowie, written by Olivier Egger. Snowie plays every bit as well as Jellyfish, perhaps a bit better. Its main feature in my opinion is theability to play or store a match and then have the program analyze theentire match. This is very valuable for learning. Now that we have programs which play well, and computers which are veryfast, can we trust the rollouts of these computers to answer all our questions about backgammon? Yes and no. There are always potential pitfalls, and ifwe are not careful we may find ourselves believing plays to be correct whichare, in fact, wrong. Let's look at some of the dangers. As we have seen, even 1296 trials may not be sufficient just due to thelaws of chance. How many trials do we need? That is hard to say. Regardlessof how many trials we run, there is always the possibility that we will geta freak result which gives us the wrong answer. In general, if we rollout a position 10,000 times or so it is very unlikely that play A willcome out ahead of play B when, in fact play B is better just due to theluck of the rollout. As we will see there are other greater dangersinvolved in the rollout. There are ways of controlling the dice to cut down on the luck factor, andthe bots make use of these. For play vs. play decisions, duplicate diceare used -- i.e. for each trial of play A, play B is then rolled with thesame dice. For many positions where certain rolls are likely to becritical, this is very helpful in cutting down the luck factor. For othermore general positions, such as how to play the opening roll, duplicatedice make relatively little difference. The games diverge too quickly,and what is a good roll in one game may be a bad roll in another. Also ifwe are looking at two plays which lead to very different types of positions, duplicate dice are of dubious value. Another way of controlling the luck element of the dice is to make sure thateach number comes up the same time. For example, if you are rolling outa position 1296 times, the Bot will arrange things so that every one of thecombinations of 36 opening rolls and 36 responses occurs. This is helpful,since the first couple of rolls are often the most critical in determiningthe outcome of a position. Jellyfish takes this a step further. Whendoing a rollout, it arranges things so that for each roll (Black's 10th roll,for example) will have each of the 36 possibilities occur an equal numberof times. This does not affect the randomness of the rollout, but does ensure that if a certain roll is generally good that roll will come up itsfair share of the time. I don't know if Snowie does the same thing. A further way to cut down on the luck element is to use truncated rollouts.This can be done when using hand rollouts also. The idea is to not roll theposition out all the way, but just a certain number of rolls and then evaluate. Not only is this a huge time-saver, but it cuts down on the luck element which can affect things later in the game. Thus, a smaller sample size is needed to get accurate results, again cutting down on thetime of the rollout. The catch, of course, is the evaluation of the positionafter the truncated rollout. If that evaluation is accurate, then thetruncated rollout is likely to give very good results. On the other handif the resulting position is one which the program has difficulty evaluatingproperly, then the results of a truncated rollout will be very suspect. The big danger with a rollout, of course, is that the bot is simply misplaying the position. As we have seen, in order to get top speed out of our rolloutit is necessary to have the rollout be done at 1-ply, where the program isat its weakest since it doesn't get to use its lookahead capability -- it hasto make an assessment of the position as is. How bad are some of its playsat 1-ply? They can be pretty bad. Here are a couple of examples:
Obviously the correct play is 9/7(2). There is no need to make the anchor.White's board is almost sure to collapse since it will be very difficultfor White to escape his back men, so the danger of being attacked is minimal.Yet, on its 1-ply assessment Jellyfish thinks that 23/22, 13/12, 8/7(2) isclearly the best play. Somehow Jellyfish doesn't appreciate the timing considerations involved in the position. Of course at its 3-ply Jellyfish "sees" what is going to happen and correctly determines that 9/7(2) is farsuperior. However a rollout is done at 1-ply, so if this position came upin a rollout Jellyfish would mangle it and so distort the results. Snowie had no problem with the double-aces play even on its 1-ply -- it thoughtthat 9/7(2) was clearly superior. However Snowie is not immune from bigaccidents. Consider the following position:
As any experienced player knows, making the ace point is wrong. There istoo much danger that Blue will not escape next turn and that his board willstart to crack. The correct play is 22/17, 5/1*, combining attack withescape. However on the 1-ply Snowie gets this wrong, thinking that makingthe ace point is clearly right. On its 3-ply it sees the danger, and properlyconcludes that 22/17, 5/1* is far better. Interestingly enough, Jellyfish gets this right on its 1-ply analysis. Thesetwo examples would appear to indicate that Jellyfish has a better understandingof attacking positions, while Snowie has a better understanding of primingpositions. This is consistent with my observations from seeing both ofthe programs play many games. I believe the difference comes from theinitial inputs which are put in by the programmer. Of course, this is thesame way two different humans are strong or weak in different areas of thegame, depending on their experience and schooling in the position types.Computers also have their own personal style of play. The good news is that while errors such as the above will be made in therollouts, most of the time it won't matter. A specific position and diceroll has to come up for it to make a difference, and that won't happenoften. Also in play vs. play problems the type of error involved may comeup for both plays, so the errors will cancel out. In general, problemssuch as this will only really matter on the first roll for each side inthe rollout. After that, things will diversify enough so it won't mattermuch. If you are really concerned about a rollout, it is worth checking tosee how the program plays the initial rolls for the position on its 1-ply.If the plays are decent, the rollout results are likely to be accurate. Another problem occurs if the program is thematically misplaying a position.This can happen in the most unusual situations. For example, Kent Gouldingin Inside Backgammon wrote an article which described how Jellyfish in itsrollouts mangled several bearoff positions by not taking a checker off whenit was supposed to (in fact, Expert Backgammon with its hand-carved parametersplayed these bearoffs much better). This problem with Jellyfish has beenfixed for latest versions -- it now refers to an accurate data base.However the problem could cause very misleading results with positions whichare likely to lead to a close race where one side has a crunched position(so cannot misplay it), while the other side has a smoother bearoff. Back games are another problem for the nets. While the latest versionsplay the initial structure surprisingly well, in the end-game after a checkerhas been hit the programs tend to flounder. This can cause misleadingresults in rollouts. For example, consider how to play an opening 2-1.It seems logical that the slotting play will lead to the slotter beinginvolved in more back games, since if the blot is hit he already has threemen back. However this is a long way down the road, and there are a lot ofother variations possible. How much affect this will have on a rollout isanybody's guess. However if the slotting and splitting plays are close (androllouts indicate that they are, with a slight edge to the split), it isquite possible that the weakness in back game play is sufficient to swingthe result and make slotting superior. For other positions which arelikely to result in back games, the problem is greater, particularly whentrying to analyze a cube decision. As usual the problem isn't as greatfor play vs. play problems, since the positions will probably be somewhatsimilar. What about incorporating the cube into rollouts? The main danger is that a couple of big cubes might distort the results. When rolling out a positionby hand, the popular way to get the cube into play is as follows: When theperson doing the rollouts judges that it is a close double but a clear take,he just rolls on. However when he judges that it is a clear double buta close take, he settles the position as a win for the doubling side. This isn't 100% accurate, of course, but is a good approximation and avoids the problem of big cubes. The computer programs can settle positions in the same way. A settlement equity is determined. If the estimated equity of the position is below thatequity the rollout is continued, but if it is above that equity for the personon roll if he has cube access, then that side is deemed the winner. Thequestion is, what should the settlement number be? In the average type ofposition, the break-even equity for a pass/take decision is usually about.570 (it is higher that .500 because of the recube potential). If theposition has a lot of gammon potential the break even point is higher. Thereason is that if the losing side is getting gammoned a lot he must havea higher percentage of wins to compensate, and the more wins he has themore he is likely to make use of the recube. After a lot of trial and error,most players have determined that .550 appears to be a reasonable number touse for the settlement figure. There are other ways to handle the cubeful rollouts. One, which is usedby Snowie, is to let the cube get as high as 8, but then settle. This allows for the effect of proper doubles and takes, but avoids the swings of really high cubes. Another idea which I like (but has not yet beenadopted by the programmers) is as follows: If during the rollout the programbelieves it is double and take, then split off into two rollouts (with thecube on 2), but give each rollout half value in the final total. If duringone of these rollouts the program again thinks it is double and take, againsplit into two rollouts with the cube on 4, and give each of these rollouts1/4 the overall value. This process can be continued as high as the cube gets.At the cost of the time spent on a few extra rollouts, this proceduretakes in the full value of the cube without having to worry about any settlement, yet it avoids the distortions of high cubes (since if thecube is on 32, each rollout there will only count 1/32 of the total).I would like to see this approach implemented. This concept wasoriginally thought of by Michael Zehr. Whatever procedure is used for cubefull rollouts, the results can bevery valuable. Each such rollout produces four equities: cubeless, centercube, side on roll owning cube, opponent owning cube. If the rolloutsare meaningful, there is a lot of information to be learned about doublingtheory from the resuts. Is there a way to use the superior 2-ply and 3-ply playing ability of theprograms in rollouts without taking too much time and still have sufficientsample size? The bots can use what is called variance reduction. I admit that I don't understand this very well -- hopefully someone with better understanding can explain it in a future article. From what I know, the bots roll out the position making the 2-ply or 3-ply analysis for each play,but estimate how good or bad the dice roll is so as to incorporate theluck element into the equation. Doing this, the claim is that it takesfar fewer trials to get meaningful results. It would seem as though thisapproach would depend a lot on the accuracy of the program's evaluation ofthe luck of the dice rolls. However from what I have seen these rolloutsdo tend to get better results in complex positions. So, what is in store for the future? If computer speed continues to increaseat the rate it has over the last 10 years, the next decade should produce computers available to the general public which are fast enough to letthe bots do mini-rollouts and still play at normal speed. If this happens,I am confident that the best player in the world will be a computer program. Also, I expect to see improvements in the neural nets, mainly by usingseveral different nets depending on the type of position. Both Jellyfish and Snowie do this to a limited extent (Snowie more so, I believe), butI think if this is done properly the overall evaluation function of thenets on their 1-ply can be improved considerably. With proper training,the nets might even learn to play end-games properly which they now havedifficulty with. When this happens, computer rollouts will be very trustworthy, and backgammon knowledge will reach new peaks. |