
In NIMThe number of optimal moves for a given board configuration is limited. If you don’t play one of them, you essentially cede control to your opponent, who can win by playing nothing but optimal moves. And then, the optimal moves can be identified by evaluating the mathematical parity function.
Therefore, there are reasons to think that the training procedures that work for chess may not be effective. NIM. The surprise is how bad it really was. Zhou and Reis found it NIM On a board with five rows, the AI recovered significantly faster and was still improving after 500 training iterations. However, adding just one more line slowed the rate of improvement dramatically. And, for a seven-row board, the gain in performance had essentially leveled off by the time the AI had played itself 500 times.
To better understand the problem, the researchers replaced the subsystem that suggested possible moves with one that operated randomly. on seven-row NIM Across the board, the performance of the trained and randomized versions was indistinguishable at over 500 training gains. Essentially, once the board became large enough, the system became unable to learn by observing the outcomes of games. The starting position of the seven-row configuration has three possible moves that correspond to the final win. Yet when their system’s trained move evaluator was asked to examine all possible moves, it evaluated each one as approximately equal.
The researchers concluded that players needed to learn the parity function to play Nim effectively. and training process that works very well for chess and Go Unable to do so.
not only NIM
One way to look at the conclusion is this NIM (and by extension, all fair play) is just plain weird. But Zhou and Reiss also found some indications that similar problems could arise in chess-playing AI that were trained this way. They identified several “incorrect” chess moves – those that missed a mating attack or an end-game throw – that were initially rated highly by the AI’s board evaluators. It was only because the software took many additional branches several steps into the future that it was able to avoid these mistakes.
<a href