Wiki Contributions

Comments

Very exciting initiative. Thanks for helping run this. I think the co-working calendar link may be broken though.

Also, the specific cycle attack doesn't work against other engines I think? In the paper their adversary doesn't transfer very well to LeelaZero, for example. So it's more one particular AI having issues, than a fact about Go itself.

Hi, one of the authors here speaking on behalf of the team. We’re excited to see that people are interested in our latest results. Just wanted to comment a bit on transferability.

  1. The adversary trained in our paper has a 97% winrate against KataGo at superhuman strength, a 6.1% winrate against LeelaZero at superhuman strength, and a 3.5% winrate against ELF OpenGo at superhuman strength. Moreover, in the games that we do win, we win by carrying out the cyclic-exploit (see https://goattack.far.ai/transfer), which shows that LZ and ELF are definitely susceptible. In fact, Kellin was also able to beat LZ with 100k visits using the cyclic exploit.

    And while it is true that our adversary has a significantly reduced winrate against LZ/ELF compared to KataGo, even a 3.5% winrate clearly demonstrates the existence of a flaw.[1] For example looking at goratings.org, a 3.5% win rate against the world #1 (3828 elo) is approximately 3245 elo, which is still in top 250 in the world. Considering that LZ/ELF are stronger than any human, the winrate we get against them should easily correspond to a top professional level of play, if not a superhuman level. But our adversary loses to a weak amateur (myself).
     
  2. We haven't confirmed this ourselves yet, but Golaxy and FineArt (two strong Chinese Go AIs) also seem to systematically misevaluate positions with cyclic groups. Our evidence is this bilbili video, which shows off various cyclic positions that KataGo, Golaxy, and FineArt all misevaluate. Golaxy and FineArt (绝艺) are shown at the end of the video.[2]

    Now these are only misevaluated test-positions, which don't necessarily imply that a from-scratch cyclic-exploit is possible to pull off. But given that a) LZ/ELF are both vulnerable; b) LZ was manually exploited by Kellin; and c) Golaxy and FineArt misevaluate cyclic-groups in the same way as KataGo; this does not bode well for Golaxy and FineArt. We are currently in the process of getting access to FineArt to test its robustness ourselves.

To our knowledge, this attack is the first exploit that consistently wins against top programs using substantial search, without repeating specific sequences (e.g., finding a particular game that a bot lost and replaying the key parts of it). Our adversary algorithm also learned from scratch, without using any existing knowledge. However, there are other known weaknesses of bots, such as a fairly specific, complex sequence called "Mi Yuting's Flying Dagger joseki", or the ladder tactic. While these weaknesses were previously widespread, targeted countermeasures for them have already been created, so they cannot be used to consistently win games against top programs like KataGo. Nonetheless, these weaknesses, along with the cyclic one our adversary targets, suggest that CNN-based MCTS Go AIs have a shared set of flaws. Perhaps similar learning algorithms / neural-net architectures learn similar circuits / heuristics and thus also share the same vulnerabilities?

One question that we have been thinking about is whether the cyclic-vulnerability lies with CNNs or with AlphaZero style training. For example, some folks in multiagent systems think that “the failure of naive self play to produce unexploitable policies is textbook level material”. On the other hand, David Wu’s tree vs. cycle theory seems to suggest that certain inductive biases of CNNs are also at play.

  1. ^

    Our adversary was also run in a weird mode against LZ/ELF, because it modeled LZ/ELF as being KataGo. We ran our transfer evaluation this way because accurately modeling LZ/ELF would have required a lot of additional software engineering. It’s not entirely clear to me that accurate modeling would necessarily help though.

  2. ^

    The same bilbili poster also appears to have replicated our manual cyclic-exploit against various versions of KataGo with a 9-stone handicap: https://space.bilibili.com/33337424/channel/seriesdetail?sid=2973285.

KataGo's training is done under a ruleset where a white territory containing a few scattered black stones that would not be able to live if the game were played out is credited to white.

I don't think this statement is correct. Let me try to give some more information on how KataGo is trained.

Firstly, KataGo's neural network is trained to play with various different rulesets. These rulesets are passed as features to the neural network (see appendix A.1 of the original KataGo paper or the KataGo source code). So KataGo's neural network has knowledge of what ruleset KataGo is playing under.

Secondly, none of the area-scoring-based rulesets (of which modified and unmodified Tromp-Taylor rules are special instances of) that KataGo has ever supported[1] would report a win for the victim for the sample games shown in Figure 1 of our paper. This is because KataGo only ignores stones a human would consider dead if there is no mathematically possible way for them to live, even if given infinite consecutive moves (i.e. the part of the board that a human would judge as belonging to the victim in the sample games is not "pass-alive").

Finally, due to the nature of MCTS-based training, what KataGo knows is precisely what KataGo's neural network is trained to emulate. This is because the neural network is trained to imitate the behavior of the neural network + tree-search. So if KataGo exhibits some behavior with tree-search enabled, its neural network has been trained to emulate that behavior.

I hope this clears some things up. Do let me know if any further details would be helpful!

  1. ^

    Look for "Area" on the linked webpages to see details of area-scoring rulesets.

tw1y166

One of the authors of the paper here. Really glad to see so much discussion of our work! Just want to help clarify the Go rules situation (which in hindsight we could've done a better job explaining) and my own interpretation of our results.

We forked the KataGo source code (github.com/HumanCompatibleAI/KataGo-custom) and trained our adversary using the same rules that KataGo was trained on.[1] So while our current adversary wins via a technicality, it was a technicality that KataGo was trained to be aware of. Indeed, KataGo is able to recognize that passing would result in a forced win by our adversary, but given a low tree-search budget it does not have the foresight to avoid this. As evhub noted in another comment on this post, increasing the tree-search budget solves this issue.

So TL;DR I do believe we have a genuine exploit of the KataGo policy network, triggering a failure that it was trained to avoid.

Additionally, the project is still ongoing and we are working on attacks that are adversarial nature but win via other means (i.e. no weird rule technicalities). There are some promising preliminary results here which makes me think that the current exploit is not just a one-off exploit but evidence of something more general.[2]

  1. ^

    To be more precise, KataGo is trained with various different rulesets, and the one we happen to attack with is just one of them.

  2. ^

    Indeed the main creator of KataGo pointed out to us that humans have actually figured out ways to exploit AZ-type agents (link).

Yeah I wish I didn't have it. I would like to be able to drink socially.

Nice piece. My own Asian flush has definitely turned me away from drinking. I wanted to like drinking due to the culture surrounding it, but the side effects I get from alcohol (headache and asthma) make the experience quite miserable.