How does AlphaZero learn chess? Why does it make certain moves? What values does it give to concepts such as king safety or mobility? How does it learn openings, and how is that different from how humans developed opening theory

Questions like these are being discussed in a fascinating new paper by DeepMind, titled Acquisition of Chess Knowledge in AlphaZero. It was written by Thomas McGrath, Andrei Kapishnikov, Nenad Tomasev, Adam Pearce, Demis Hassabis, Been Kim, and Ulrich Paquet together with Kramnik. It is the second cooperation between DeepMind and Kramnik, after their research from last year when they used AlphaZero to explore the design of different variants of the game of chess, with different sets of rules.
以上问题都在DeepMind团队最新的论文中得以讨论,论文题目是《解读AlphaZero的国际象棋理论》。该论文由Thomas McGrath, Andrei Kapishnikov, Nenad Tomasev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet 以及克拉姆尼克共同撰写。这也是DeepMind团队第二次和克拉姆尼克合作,去年他们共同研究了如何利用AlphaZero去创立不同的国际象棋变种以及相关走子规则。


In their latest paper, the researchers tried a method for encoding human conceptual knowledge, to determine the extent to which the AlphaZero network represents human chess concepts. Examples of such concepts are the bishop pair, material (im)balance, mobility, or king safety. These concepts have in common that they are pre-specified functions that encapsulate a particular piece of domain-specific knowledge.

Some of these concepts were taken from Stockfish 8's evaluation function, such as material, imbalance, mobility, king safety, threats, passed pawns, and space. Stockfish 8 uses these as sub-functions that give individual scores leading to a "total" evaluation that is exported as a continuous value, such as "0.25" (a slight advantage to White) or "-1.48" (a big advantage to Black). Note that more recent versions of Stockfish have developed into Alpha-Zero-like neural networks but were not used for this paper.

The third type of concepts encapsulates more specific lower-level features, such as the existence of forks, pins, or contested files, as well as a range of features regarding pawn structure.

Having established this wide array of human concepts, the next step for the researchers was to try and find them within the AlphaZero network, for which they used a sparse linear regression model. After that, they started visualizing the human concept learning with what they call what-when-where plots: what concept is learned when in training time where in the network.
在建立了广泛的人类知识图谱和模型后,研究人员下一步的工作则是在AlphaZero的神经网络里采用稀疏化线性模型来寻找人类模型的痕迹,然后再将整个学习过程进行可视化,可视化的展现形式为 what-when-where 图:即在神经网络的哪个地方,什么时间学习了哪种概念。

According to the researchers, AlphaZero indeed develops representations that are closely related to a number of human concepts over the course of training, including high-level evaluation of the position, potential moves and consequences, and specific positional features.

One interesting result was about material imbalance. As was demonstrated in Matthew Sadler and Natasha Regan's award-winning book Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI (New In Chess, 2019), AlphaZero seems to view material imbalance differently from Stockfish 8. The paper gives empirical evidence that this is the case at the representational level: AlphaZero initially "follows" Stockfish 8's evaluation of material more and more during its training, but at some point, it turns away from it again.
其中在‘子力不对等’这个概念中发现了一个有意思的现象。就像Matthew Sadler 和 Natasha Regan在其获奖著作中(Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI)说明的那样,AlphaZero在对待‘子力不对等’这个问题上似乎与Stockfish8不同,本篇论文在实验基础上证实了这个过程:起先,Alphazero在训练过程中,随着时间推移,越来越‘赞同’Stockfish对待子力的观点,但是就在某个节点,慢慢又出现了相反的观点。


The next step for the researchers was to relate the human concepts to AlphaZero's value function. One of the first concepts they looked at was piece value, something a beginner will first learn when starting to play chess. The classical values are nine for a queen, five for a rook, three for both the bishop and knight, and one for a pawn. The left figure below (taken from the paper) shows the evolution of piece weights during AlphaZero's training, with piece values converging towards commonly-accepted values.


The image on the right shows that during AlphaZero's training, material becomes more and more important in the early stages of learning chess (consistent to human learning) but it reaches a plateau and at some point, the values of more subtle concepts such as mobility and king safety are becoming more important while material actually decreases in importance.

AlphaZero的训练过程 Vs. 近代人类对与国际象棋的认知过程

Another part of the paper is dedicated to comparing AlphaZero's training to the progression of human knowledge over history. The researchers point out that there is a marked difference between AlphaZero’s progression of move preferences through its history of training steps, and what is known of the progression of human understanding of chess since the 15th century:

AlphaZero starts with a uniform opening book, allowing it to explore all options equally, and largely narrows down plausible options over time. Recorded human games over the last five centuries point to an opposite pattern: an initial overwhelming preference for 1.e4, with an expansion of plausible options over time.

The researchers compare the games AlphaZero is playing against itself with a large sample taken from the ChessBase Mega Database, starting with games from the year 1475 up till the 21st century.
研究人员们将AlphaZero自身产生的对局与Chessbase Mega Database里的对局进行大量比对,选择的人类对局时间范围为1475年-21世纪。

Humans initially played 1.e4 almost exclusively but 1.d4 was slightly more popular in the early 20th century, soon followed by the increasing popularity of more flexible systems like 1.c4 and 1.Nf3. AlphaZero, on the other hand, tries out a wide array of opening moves in the early stage of its training before starting to value the "main" moves higher.
人类一开始几乎都只走1.e4,在20世纪早期的时候1.d4开始越来越流行,然后1.c4 1.Nf3也开始慢慢普及。AlphaZero则相反,最开始的时候会它尝试每一种走法,而后慢慢的筛选出所谓“主流”走法。



A more specific example provided is about the Berlin variation of the Ruy Lopez (the move 3...Nf6 after 1.e4 e5 2.Nf3 Nc6 3.Bb5), which only became popular at the top level early 21st century, after Kramnik successfully used it in his world championship match with GM Garry Kasparov in 2000. Before that, it was considered to be somewhat passive and slightly better for White with the move 3...a6 being preferable.
拿西班牙开局柏林防御变例举例(1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6),该变例直到21世纪初期才开始流行,流行于2000年克拉姆尼克vs卡斯帕罗夫的世界冠军赛。在此之前3...Nf6这步棋被广泛认为略微被动,会给白棋稍优的局面,3...a6则是更流行的走法。

The researchers write Looking back in time, it took a while for human chess opening theory to fully appreciate the benefits of Berlin defense and to establish effective ways of playing with Black in this position. On the other hand, AlphaZero develops a preference for this line of play quite rapidly, upon mastering the basic concepts of the game. This already highlights a notable difference in opening play evolution between humans and the machine.



Remarkably, when different versions of AlphaZero are trained from scratch, half of them strongly prefer 3… a6, while the other half strongly prefer 3… Nf6! It is interesting as it means that there is no "unique” good chess player. The following table shows the preferences of four different AlphaZero neural networks:
值得注意的是,当不同AlphaZero版本在初期训练的时候,有半数的版本极其偏好3...a6,另一半则极其偏好3...Nf6! 这就意味着AlphaZero在这里产生了“人格分裂”。下图表格里显示了四种AlphaZero不同神经网络版本里的偏好:

AlphaZero四种不同神经网络版本里(在1. e4 e5 2. Nf3 Nc6 3. Bb5之后)的偏好,每种走法都是在经过100万次训练以后得出的答案。有时AlphaZero会倾向于走3...a6,有时也会倾向3...Nf6

In a similar vein, AlphaZero develops its own opening "theory" for a much wider array of openings over the course of its training. At some point, 1.d4 and 1.e4 are discovered to be good opening moves and are rapidly adopted. Similarly, AlphaZero's preferred continuation after 1.e4 e5 is determined in another short temporal window. The figure below illustrates how both 2.d4 and 2.Nf3 are quickly learned as reasonable White moves, but 2.d4 is then dropped almost as quickly in favor of 2.Nf3 as a standard reply.
同样,AlphaZero在自我训练的过程中,发展出了属于它自己的布局理论,在某个时间段,1.d4和1.e4被认定是最好的走法,也被迅速采纳。同样地AlphaZero在1.e4 e5 之后也是经过一点短暂的时间后才决定出来哪步棋最好。下图中显示了2.d4和2.Nf3迅速被认为是最佳走法,但是马上2.d4的走法被放弃,取而代之的是2.Nf3为标准走法。

AlphaZero在决定1.e4 e5之后的最佳走法。


Kramnik's contribution to the paper is a qualitative assessment, as an attempt to identify themes and differences in the style of play of AlphaZero at different stages of its training. The 14th world champion was provided sample games from four different stages to look at.

According to Kramnik, in the early training stage, AlphaZero has "a crude understanding of material value and fails to accurately assess material in complex positions. This leads to potentially undesirable exchange sequences, and ultimately losing games on material." In the second stage, AlphaZero seemed to have "a solid grasp on material value, thereby being able to capitalize on the material assessment weakness" of the early version.
根据克拉姆尼克的看法,“AlphaZero在早期训练过程中,对子力的理解非常粗糙,并且经常在复杂局面中出现分析失误。这就导致了很多错误的换子顺序,最终由于少子输棋。” 在第二个阶段的时候,AlphaZero看起来对子力价值有了充分理解,解决了第一阶段对于子力评估的问题。

In the third stage, Kramnik feels that AlphaZero has a better understanding of king safety in imbalanced positions. This manifests in the second version "potentially underestimating the attacks and long-term material sacrifices of the third version, as well as the second version overestimating its own attacks, resulting in losing positions."

In its fourth stage of the training, has a "much deeper understanding" of which attacks will succeed and which would fail. Kramnik notices that it sometimes accepts sacrifices played by the "third version," proceeds to defend well, keep the material advantage, and ultimately converts to a win.

Another point Kramnik makes, which feels similar to how humans learn chess, is that tactical skills appear to precede positional skills as AlphaZero learns. By generating self-play games over separate opening sets (e.g. the Berlin or the Queen's Gambit Declined in the "positional" set and the Najdorf and King's Indian in the "tactical" set), the researchers manage to provide circumstantial evidence but note that further work is needed to understand the order in which skills are acquired.



For a long time, it was believed that machine-learning systems learn uninterpretable representations that have little in common with human understanding of the domain they are trained on. In other words, how and what AI teaches itself is mostly gibberish to humans.

With their latest paper, the researchers have provided strong evidence for the existence of human-understandable concepts in an AI system that wasn't exposed to human-generated data. AlphaZero's network shows the use of human concepts, even though AlphaZero has never seen a human game of chess.

This might have implications outside the chess world. The researchers conclude:

The fact that human concepts can be located even in a superhuman system trained by self-play broadens the range of systems in which we should expect to find human-understandable concepts. We believe that the ability to find human-understandable concepts in the AZ network indicates that a closer examination will reveal more.

Co-author Nenad Tomasev commented to Chess.com that for him personally, he was really curious to consider if there is such a thing as a "natural" progression of chess theory:
论文合著者Nenad Tomasev对Chess.com评论说,就他个人而言,他很想认真考虑是否到底存在国际象棋理论的“自然”发展这样的事情:

Even in the human context—if we were to 'restart' history, go back in time— would the theory of chess have developed in the same way? There were a number of prominent schools of thought in terms of the overall understanding of chess principles and middlegame positions: the importance of dynamism vs. structure, material vs. sacrificial attacks, material imbalance, the importance of space vs. the hypermodern school that invites overextension in order to counterattack, etc. This also informed the openings that were played. Looking at this progression, what remains unclear is whether it would have happened the same way again. Maybe some pieces of chess knowledge and some perspectives are simply easier and more natural for the human mind to grasp and formulate? Maybe the process of refining them and expanding them has a linear trajectory, or not? We can't really restart history, so we can only ever guess what the answer might be.

However, when it comes to AlphaZero, we can retrain it many times—and also compare the findings to what we have previously seen in human play. We can therefore use AlphaZero as a Petri dish for this question, as we look at how it acquires knowledge about the game. As it turns out, there are both similarities and dissimilarities in how it builds its understanding of the game compared to human history. Also, while there is some level of stability (results being in agreement across different training runs), it is by no means absolute (sometimes the training progression looks a little bit different, and different opening lines end up being preferred).
然而,当我们谈到AlphaZero时,我们可以对其进行多次重新训练——并将结果与我们之前在人类对局中看到的结果进行比对。因此,我们可以将 AlphaZero 用作这类问题的实验道场,用来了解它如何获取国际象棋的知识。事实证明,与人类国际象棋理论的发展历程相比,AZ对国际象棋理论领悟的过程与其既有相似之处,也有不同之处。当然了,该结论虽然基本靠谱(结果在不同的训练运行中基本一致),但却不是绝对正确(有时训练进程看起来有点不同,会导致不同的开局偏好)。

Now, this is by no means a definitive answer to what is, to me personally, a fascinating question. There is still plenty to think about here. Yet, we hope that our results provide an interesting perspective and make it possible for us to start thinking a bit deeper about how we learn, grow, improve—the very nature of intelligence and how it goes all the way from a blank slate to what is a deep understanding of a very complex domain like chess.


"There are two major things which we can try to find out with this work. One is: how does AlphaZero learn chess, how does it improve? That is actually quite important. If we manage one day to understand it fully, then maybe we can interpret it into the human learning process.

Secondly, I believe it is quite fascinating to discover that there are certain patterns that AlphaZero finds meaningful, which actually make little sense for humans. That is my impression. That actually is a subject for further research, in fact, I was thinking that it might easily be that we are missing some very important patterns in chess, because after all, AlphaZero is so strong that if it uses those patterns, I suspect they make sense. That is actually also a very interesting and fascinating subject to understand, if maybe our way of learning chess, of improving in chess, is actually quite limited. We can expand it a bit with the help of AlphaZero, of understanding how it sees chess."
其次,我认为探寻某些 AlphaZero 认为有意义的‘规律认知’是一个非常有意思的过程,尽管这些所谓的规律对人类,至少对我来说没有太大的意义,但对我们来说确是有待进一步研究的课题。事实上我曾经想过,也许我们在国际象棋上遗漏了许多很重要的概念,虽然我怀疑这些概念是否对我们有任何意义,但AlphaZero确实使用了我们不懂的概念,也因此才会变的如此之强,搞懂这些规律和概念实际上将会是一个非常有意思的课题。也许我们学习、提高国际象棋水平的过程与能力十分有限,但在AlphaZero的帮助下,我们或许可以扩展我们的思路,帮助我们更好的理解国际象棋本身这项运动。(完)

