Also, the specific cycle attack doesn't work against other engines I think? In the paper their adversary doesn't transfer very well to LeelaZero, for example. So it's more one particular AI having issues, than a fact about Go itself.
Hi, one of the authors here speaking on behalf of the team. We’re excited to see that people are interested in our latest results. Just wanted to comment a bit on transferability.
To our knowledge, this attack is the first exploit that consistently wins against top programs using substantial search, without repeating specific sequences (e.g., finding a particular game that a bot lost and replaying the key parts of it). Our adversary algorithm also learned from scratch, without using any existing knowledge. However, there are other known weaknesses of bots, such as a fairly specific, complex sequence called "Mi Yuting's Flying Dagger joseki", or the ladder tactic. While these weaknesses were previously widespread, targeted countermeasures for them have already been created, so they cannot be used to consistently win games against top programs like KataGo. Nonetheless, these weaknesses, along with the cyclic one our adversary targets, suggest that CNN-based MCTS Go AIs have a shared set of flaws. Perhaps similar learning algorithms / neural-net architectures learn similar circuits / heuristics and thus also share the same vulnerabilities?
One question that we have been thinking about is whether the cyclic-vulnerability lies with CNNs or with AlphaZero style training. For example, some folks in multiagent systems think that “the failure of naive self play to produce unexploitable policies is textbook level material”. On the other hand, David Wu’s tree vs. cycle theory seems to suggest that certain inductive biases of CNNs are also at play.
Our adversary was also run in a weird mode against LZ/ELF, because it modeled LZ/ELF as being KataGo. We ran our transfer evaluation this way because accurately modeling LZ/ELF would have required a lot of additional software engineering. It’s not entirely clear to me that accurate modeling would necessarily help though.
The same bilbili poster also appears to have replicated our manual cyclic-exploit against various versions of KataGo with a 9-stone handicap: https://space.bilibili.com/33337424/channel/seriesdetail?sid=2973285.
KataGo's training is done under a ruleset where a white territory containing a few scattered black stones that would not be able to live if the game were played out is credited to white.
I don't think this statement is correct. Let me try to give some more information on how KataGo is trained.
Firstly, KataGo's neural network is trained to play with various different rulesets. These rulesets are passed as features to the neural network (see appendix A.1 of the original KataGo paper or the KataGo source code). So KataGo's neural network has knowledge of what ruleset KataGo is playing under.
Secondly, none of the area-scoring-based rulesets (of which modified and unmodified Tromp-Taylor rules are special instances of) that KataGo has ever supported[1] would report a win for the victim for the sample games shown in Figure 1 of our paper. This is because KataGo only ignores stones a human would consider dead if there is no mathematically possible way for them to live, even if given infinite consecutive moves (i.e. the part of the board that a human would judge as belonging to the victim in the sample games is not "pass-alive").
Finally, due to the nature of MCTS-based training, what KataGo knows is precisely what KataGo's neural network is trained to emulate. This is because the neural network is trained to imitate the behavior of the neural network + tree-search. So if KataGo exhibits some behavior with tree-search enabled, its neural network has been trained to emulate that behavior.
I hope this clears some things up. Do let me know if any further details would be helpful!
Look for "Area" on the linked webpages to see details of area-scoring rulesets.
One of the authors of the paper here. Really glad to see so much discussion of our work! Just want to help clarify the Go rules situation (which in hindsight we could've done a better job explaining) and my own interpretation of our results.
We forked the KataGo source code (github.com/HumanCompatibleAI/KataGo-custom) and trained our adversary using the same rules that KataGo was trained on.[1] So while our current adversary wins via a technicality, it was a technicality that KataGo was trained to be aware of. Indeed, KataGo is able to recognize that passing would result in a forced win by our adversary, but given a low tree-search budget it does not have the foresight to avoid this. As evhub noted in another comment on this post, increasing the tree-search budget solves this issue.
So TL;DR I do believe we have a genuine exploit of the KataGo policy network, triggering a failure that it was trained to avoid.
Additionally, the project is still ongoing and we are working on attacks that are adversarial nature but win via other means (i.e. no weird rule technicalities). There are some promising preliminary results here which makes me think that the current exploit is not just a one-off exploit but evidence of something more general.[2]
Nice piece. My own Asian flush has definitely turned me away from drinking. I wanted to like drinking due to the culture surrounding it, but the side effects I get from alcohol (headache and asthma) make the experience quite miserable.
Very exciting initiative. Thanks for helping run this. I think the co-working calendar link may be broken though.