Posts

Sorted by New

Wiki Contributions

Comments

I am also curious why the zero-shot transfer is so close to 0% but not 0%. Why do those agents differ so much, and what do the exploits for them look like?

The exploits for the other agents are pretty much the same exploit, they aren't really different. From what I can tell as an experienced Go player watching the adversary and other human players use the exploit, the zero shot transfer is not so high because the adversarial policy overfits to memorize specific sequences that let you set up the cyclic pattern and learns to do so in a relatively non-robust way. 

All the current neural-net-based Go bots share the same massive misevaluations in the same final positions. Where they differ is that they may have arbitrarily different preferences among almost equally winning moves, so during the long period that the adversary is in a game-theoretically-lost position, any different victim all the while still never realizing any danger, may nonetheless just so happen to choose different moves. If you consider a strategy A that might broadly minimize the number of plausible ways a general unsuspecting victim might mess up your plan by accident, and a strategy B that leaves more total ways open but those ways are not the ones that small set of victim networks you are trained to exploit would stumble into (because you've memorized their tendencies enough to know they won't), the adversary is incentivized more towards B than A.

This even happens after the adversary "should" win. Even after it it finally reaches a position that is game-theoretically winning, it often blunders several times and plays moves that cause the game to be game-theoretically lost again, before eventually finally winning again. I.e. it seems overfit to the fact that the particular victim net is unlikely to take advantage of its mistakes, so it never learns that they are in fact mistakes. In zero-shot transfer against a different opponent this unnecessarily may give the opponent, who shares the same weakness but may just so happen to play in different ways, chances to stumble on a refutation and win again. Sometimes even without the victim even realizing that it was a refutation of anything and that they were in trouble in the first place.

I've noticed human exploiters play very differently than that. Once they achieve a game-theoretic-winning position they almost always close all avenues for counterplay and stop giving chances to the opponent that would work if the opponent were to suddenly become aware.

Prior to that point, when setting up the cycle from a game-theoretically lost position, most human players I've seen also play slightly differently too. Most human players are far less good at reliably using the exploit, because they haven't practiced and memorized as much the ways to get any particular bot to not accidentally interfere with them as they do so. So the adversary does much better than them here. But as they learn to do better, they tend do so in ways that I think transfer better (i.e. from observation my feeling is they maintain a much stronger bias towards things like "strategy A" above).

(I'm the main KataGo dev/researcher)

Just some notes about KataGo - the degree to which KataGo has been trained to play well vs weaker players is relatively minor. The only notable thing KataGo does is in some self-play games to give up to an 8x advantage in how many playouts one side has over the other side, where each side knows this. (Also KataGo does initialize some games with handicap stones to make them in-distribution and/or adjust komi to make the game fair). So the strong side learns to prefer positions that elicit higher chance of mistakes by the weaker side, while the weak side learns to prefer simpler positions where shallower search doesn't harm things as much.

This method is cute because it adds pressure to only learn "general high-level strategies" for exploiting a compute advantage, instead of memorizing specific exploits (which one might hypothesize to be less likely to generalize to arbitrary opponents). Any specific winning exploit learned by the stronger side that works too well will be learned by the weaker side (it's the same neural net!) and subsequently will be avoided and stop working.

And it's interesting that "play for positions that a compute-limited yourself might mess up more" correlates with "play for positions that a weaker human player might mess up in".

But because this method doesn't adapt to exploit any particular other opponent, and is entirely ignorant of a lot of tendencies of play shared widely across all humans, I would still say it's pretty minor. I don't have hard data, but from firsthand subjective observation I'm decently confident that top human amateurs or pros do a better job playing high-handicap games (> 6 stones) against players that more than that many ranks weaker than them than KataGo would, despite KataGo being stronger in "normal" gameplay. KataGo definitely plays too "honestly", even with the above training method, and lacks knowledge of what weaker humans find hard.

If you really wanted to build a strong anti-human handicap game bot in Go, you'd absolutely start by learning to better model human play, using the millions of games available online.

(As for the direct gap with the very best pro players, without any specific anti-bot exploits, at tournament-like time controls I think it's more like 2 stones rather than 3-4. I could believe 3-4 for some weaker pros, or if you used ultra-blitz time controls, since shorter time controls tend to favor bots over humans).

There's (a pair of) binary channels that indicate whether the acting player is receiving komi or paying it. (You can also think of this as a "player is black" versus "player is white" indicator, but interpreting it as komi indicators is equivalent and is the natural way you would extend Leela Zero to operate on different komi without having to make any changes to the architecture or input encoding).

In fact, you can set the channels to fractional values strictly between 0 and 1 to see what the model thinks of a board state given reduced komi or no-komi conditions. Leela Zero is not trained on any value other than the 0 or 1 endpoints corresponding to komi +7.5 or komi -7.5 for the acting player, so there is no guarantee that the behavior for fractional values is reasonable, but I recall people found that many of Leela Zero's models do interpolate their output for the fractional values in a not totally unreasonable way! 

If I recall right, it tended to be the smaller models that behaved well, whereas some of the later and larger models behaved totally nonsensically for fractional values. If I'm not mistaken about that being the case, then as a total guess perhaps that's something to do with later and larger models having more degrees of freedom with which to fit/overfit arbitrarily to arbitrarily give rise to non-interpolating behavior in between, and/or having more extreme differences in activations at the end points that constrain the middle less and give it more room to wiggle and do bad non-monotone things.

This is very cool, thanks for this post.

Some remarks:

Playing at an intersection with no liberty is forbidden, unless the play results in capture

This is true, but the capture can be of your own stones. That is, Leela Zero is trained under Tromp-Taylor rules where self-capture is legal. So there isn't any forbidding of moves due to just liberties. Single stone suicide is still illegal, but only by virtue of the fact that self-capture of a single stone would repeat the board position, but you can suicide multiple stones.

However, there is still of course a question of how liberty counting works. See https://github.com/leela-zero/leela-zero/issues/877 for some past discoveries of positions where Leela Zero is unable to determine when a large group is in atari or would become in atari after certain move(s). This suggests that the neural nets do not learn a general and correct algorithm, instead they learn a bunch of heuristics on top of heuristics that work almost all of the time but have enough combinatorically many nooks and crannies that are not all forced to be correct due to rarity of some of the cases.

Note that if you are going to investigate liberty counting, you should expect that the neural net likely counts a much more fuzzy and intricate concept than literal liberties. As an expert Go player I would expect it almost certainly primarily focuses on concepts that are much more tactically relevant, like "fighting liberties", e.g. how many realistic moves the opponent would need to fill a group accounting for necessary approach moves, recaptures, etc. For example a group with literal 2 liberties but where one liberty was an eye and the other would be self-atari by the opponent unless they made a preparatory connection first would have 3 fighting liberties because actually capturing the group would take 3 moves under realistic play, not 2.

The fighting liberty count is also somewhat multidimensional, because it can vary depending on the particular tactical objective. You might have 4 fighting liberties against a particular group, but 6 against a different group because 2 of the liberties or approach moves required are only "effective" for one objective and not the other. Also purely integer values don't suffice because fighting liberty count can depend on things like a ko.

The fact that this not-entirely-rigorously-defined concept of "fighting liberties" is what expert play in Go cares about (at least, expert human players) perhaps makes it also less surprising why a net might not implement a "correct" and "general" algorithm for liberty counting but might instead end up with a pile of hacks and heuristics that don't always work.

I suspect will be easier to investigate the counting of eyes than liberties, and might suggest to focus on that first if you investigate further. The presence of an eye is much more discrete and less multidimensional than the fighting liberty count, and you only need to count up to 2, so there are fewer behavior patterns in the activations that will need to be distinguished by an analysis.

A particularly interesting application would be to understand Wang et al (2022)’s adversarial policies against KataGo. For example, one of the adversarial policies essentially amounts to confusing the model about the number of liberties that a large circular group has, and capturing it “without the model noticing”. This attack somewhat transfers to Leela Zero as well. Can we find a mechanism for liberty-counting that explains the success of this attack?

See https://www.lesswrong.com/posts/Es6cinTyuTq3YAcoK/there-are-probably-no-superhuman-go-ais-strong-human-players?commentId=gAEovdd5iGsfZ48H3 where I give a hypothesis for what the mechanism is, which I think is at a high-level more likely than not to be roughly what's happening.

I also currently believe based some playing around with the nets a long time ago and seeing the same misevaluations across all 4 different independently trained AlphaZero-based agents I tested that the misevaluation probably transfers very well, if not the attack. I.e. if the attack doesn't transfer perfectly, it's probably to do with the happenstance of the preferences of the agent earlier - for example it's probably easier to exploit agents that are trying to also sharply maximize score since they will tend to ignore your moves more and give you more moves in a row to do things - and not because any of these replications anticipate or evaluate the attack itself much better or worse.

However, here again I would strongly recommend investigating eyes, not liberties. All end-to-end observational experimentation with the models I've done suggests that exactly the same misevaluation is happening with respect to eyes. And the analysis of any miscounting of eyes on the predictions should be far crisper than for liberties.

In particular, if you overcount eyes by propagating a partial eye count multiple times around a cycle, you will predict a group is absolutely alive even when it isn't alive, because for groups with 2,3,4,5,... eyes, the data is nearly absolutely consistent that such groups are alive independent of anything else around them. 

By contrast, if you overcount liberties and think a group has a "very large" number of liberties, you might not always predict the group is alive. For example what if every opposing group has 2 eyes and is therefore immortally strong, whereas the large-liberty group is surrounded and has no eyes? The training data probably doesn't so sharply constrain how different nets will generalize to rare "over-large" liberty counts because generally how liberty counts affect statuses is contextual rather than absolute like eye count. So one would expect that when a net mistakenly overcounts liberties, the effect could also be harder to analyze due to being contextual and not always consistent.

Right before the game ends, the value of the board should come down to which player has more territory. Is the model calculating each player’s territory, and if so, how?

I would say almost certainly yes, but possibly not "exactly" territory, maybe something like certainty-adjusted or variance-adjusted territory, with hacks for different kinds of sources of uncertainty. But yes, almost certainly something that correlates very well with territory.

I did some visualizations of activations of a smaller net way back in the past that shows something pretty much like this. https://github.com/lightvector/GoNN#global-pooled-properties-dec-2017 

Although this architecture isn't quite a plain resnet (in particular, this is the visualization of the conv layer output prior to a global pooling layer) , it shows that the neural net learned a concept very recognizable to a Go player as having to do with predicted ownership or territory. And it learned this concept without ever being trained to predict the game outcome or score! In this case, the relevant net was trained solely to predict the next move a human player played in a position.

The fact that this kind of concept can arise automatically from pure move prediction also gives some additional intuitive force for why the original AlphaGoZero work found such a huge improvement from using the same neural net to predict both value and policy. Pure policy prediction is already capable of automatically generating the internal feature you would need to get basic value prediction working.

I think there are two "simple" abstractions possible here, where the amount of data to distinguish which one is right is minuscule under natural play in Go, and therefore can easily be overwhelmed by small constant factors in the "ease" of converging to either one due to inductive bias.

  • Abstraction 1: the size of the set of all empty spaces adjacent to a group
  • Abstraction 2: the sum of the number of empty spaces next to a given stone on the group, plus, recursively, this same sum for each of the neighboring stones of the same player to the north, south, east, west, where this recursion doesn't "backtrack", plus a penalty if a nearby empty space is bordered by the same group from multiple sides.

Abstraction 1 is mathematically more elegant, but abstraction 2 is algorithmically simpler to implement. If you modify abstraction 2's algorithm to include a globally remembered "set" such that traversed spaces and stones are added to this set and are not re-traversed, you can make it equivalent to abstraction 1, but that requires additional intermediate memory during the computation and so is more complex.

Indeed, when programmers write iterative or recursive algorithms that operate on graphs, it's easy and it's usually less lines of code and less local state if you forget to include proper checks on prohibiting return to the global set of locations previously visited during that instance of that algorithm, and then your algorithm may work fine on trees but fails on cycles. The "less memory or state required" part, plus the fact that in Go groups with large cycles that would distinguish these two possibilites are rare, is what leads me to hypothesize right now (without explicit evidence or confirmation) that convergence to abstraction 2 is morally what's going on here.

Keep in mind that the adversary was specifically trained against KataGo, whereas the performance against LeelaZero and ELF is basically zero-shot. It's likely the case that an adversary trained against LeelaZero and ELF would also win consistently.

I've run LeelaZero and ELF and MiniGo (yet another independent AlphaZero replication in Go) by hand in particular test positions to see what their policy and value predictions are, and they all very massively misevaluate cyclic group situations just like KataGo. Perhaps by pure happenstance different bots could "accidentally" prefer different move patterns that make it harder or easier to form the attack patterns (indeed this is almost certainly something that should vary between bots as they do have different styles and preferences to some degree), but probably the bigger contributor to the difference is explicit optimization vs zero shot.

So all signs point to this misgeneralization being general to AlphaZero with convnets, not one particular bot. In another post here https://www.lesswrong.com/posts/Es6cinTyuTq3YAcoK/there-are-probably-no-superhuman-go-ais-strong-human-players?commentId=gAEovdd5iGsfZ48H3 I explain why I think it's intuitive how and why a convnet would learn the an incorrect algorithm first and then get stuck on it given the data.

(As for whether, e.g. a transformer architecture would have less of an issue - I genuinely have no idea, I think it could go either way, nobody I know has tried it in Go. I think it's at least easier to see why a convnet could be susceptible to this specific failure mode, but that doesn't mean other architectures wouldn't be too)

In the paper, KataGo creator David Wu theorizes that KataGo learned a method for counting liberties that works on groups of stones with a tree structure, but fails when the stones form a cycle.  I feel like that can't be the whole story because KataGo can and does solve simpler life-or-death problems with interior liberties, but I don't have a more precise theory to put in its place.

KataGo author here: why do you feel this can't be the whole story? The very fact that it solves other life and death problems with interior liberties when no large cycle is present, and fails consistently and specifically on positions where a large cycle is present, is evidence that it is precisely the presence of a cycle that matters, not whether or not there are interior liberties.

And across many positions where the adversary is causing a misevaluation, the misevaluation goes away if you edit the position to break the cycle, even if you preserve other tactically-relevant features of the position, including preserving the fact that the interior group is surrounded (you can break the cycle without letting the interior group out via crosscut). See https://postimg.cc/bdTJQM6H for an illustration - consistently positions like case A are problematic, but positions like case B and C are completely fine, where in B and C there are surrounded groups just like in A, but the surrounding group doesn't link back to itself. And for each of A,B,C, the issue also appears to be consistent regardless of whether most of the liberties are external or shared with the internal group. In this illustration they happen to be external.

In retrospect, the fact that large-scale cycles may cause an issue is intuitive because it's a convolutional net. Consider: the kinds of algorithm a convolutional net based on 3x3 convolutions can learn are basically like a game of telephone. Every layer, every spot on the board gets to blindly send some information to all of its neighbors, but no further. The only way that information gets further than one space away is via repeated passing of information neighbor to neighbor, over dozens of steps, with every step of that information passing being greedily and locally optimized by gradient descent to improve the final loss function, and with every sequential step of that process being separately learned (there is no weight sharing between different layers).

So a very natural kind of algorithm you could imagine the optimization easily falling into would be for every layer to learn to devote some channels to simply passing liberty count "messages" in each direction along connected stones of the group, adding up contributions.

E.g. "My east neighbor so far reports 3 liberties coming from them, and I have an immediate north liberty right here, so I should report to my west neighbor a total of 4 liberties so far coming from their east. Whereas my west neighbor so far reports 1 liberty from them, so I should pass on to my east neighbor that there's 2 liberties so far coming from their west. South or diagonal from me (let's suppose) are stones of the opposing color, so I don't pass messages about liberties for this group in those directions, I just pass a message that there are no liberties for the opposing side coming from here".

If the group has a large scale tree topology, it's easy to see how this kind of message passing can implement a correct liberty counting algorithm. However, if the group has a large-scale cycle, then it's also intuitive how this message-passing-like counting can fail. Unless the messages are somehow also labeled with additional information about the origin of those liberties, or what parts of a cycle have and have not been traversed in producing a given partial count, you're liable to have messages circulate around a cycle endlessly accumulating more and more liberties by double and triple counting them. Behaving correctly for cycles would presumably require a different or nontrivially more complex algorithm that uses more of the net's capacity and is harder to learn, and so presumably it's not worth it, not while the frequency of large cycles in the data is so low.

Note that "small" cycles (such as the ring of stones around a single-point eye) should pose no problem because a small shape like this is "cheaply perceivable" just via the coordination of just a few layers, plus they are absurdly common and well-represented in the data, plus you already want to learn them for other reasons (counting whether you have 2 eyes).

But "large" cycles where the cyclic group is not already strong or alive for other reasons probably occur naturally in less than 0.1% of games. (as far as I can tell, it's only every few years in active continuous high-level-pro tournaments and league games that a notable case comes up, it's rare enough that when it does happen it's usually a news item in the Go game commentary world!). So the above kind of simple message-passing algorithm about liberty counts would work in >99.9% of cases, making it less surprising why a more complex and more correct algorithm wouldn't be learned.

Subjectively I'd say I'm > 70% confident that the above is qualitatively what's going on, that even if the above isn't right in the details or there are other subtle phenomena at play too, the above intuition about convnets + data scarcity captures the main "high-level-reason" for the issue.

In the last 2 months KataGo's community training run has started initializing a few tenths of a percent of self-play games to start in board positions where cyclic groups are already on the board, and in the 2 months so far there has been a dramatic shift in the policy and value predictions in a lot of these cases. Learning is still ongoing - predictions are still improving network-by-network with no sign that equilibrium has been reached yet. So that also supports the estimate that prior to explicit augmentation, the frequency of such positions was much less than tenths of a percent, too low to incentivize the net to learn an algorithm that works in general, but that a few tenths of a percent is sufficient to kickstart the learning here, at least for some of the possible cases (there are many distinct tactical cases, some cases seem to be harder to learn so far).

Couldn't you have also made the exact same argument for the word "vaccination" some number of generations ago, for almost exactly the same reason? It too derives from root words about a practice intended for protecting specifically against smallpox. (Namely, infecting someone with cowpox).

https://www.etymonline.com/word/vaccination

When words are so overly specific so as to almost completely fall out of usefulness for their original meaning (as in the case of both vaccination and variolation, since smallpox is not in circulation any more), it seems pretty natural to see people to repurpose them for other closely-related or more general meanings - that's certainly one common way language evolves.

If the original meaning is no longer even remotely relevant (so misunderstanding is vanishingly unlikely) and the new meaning is a natural-to-infer and useful extension for the topic being discussed, then this seems like good communication, which is what words are for.