Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

On a few different views, understanding the computation done by neural networks is crucial to building neural networks that constitute human-level artificial intelligence that doesn’t destroy all value in the universe. Given that many people are trying to build neural networks that constitute artificial general intelligence, it seems important to understand the computation in cutting-edge neural networks, and we basically do not.

So, how should we go from here to there? One way is to try hard to think about understanding, until you understand understanding well enough to reliably build understandable AGI. But that seems hard and abstract. A better path would be something more concrete.

Therefore, I set this challenge: know everything that the best go bot knows about go. At the moment, the best publicly available bot is KataGo, if you’re at DeepMind or OpenAI and have access to a better go bot, I guess you should use that instead. If you think those bots are too hard to understand, you’re allowed to make your own easier-to-understand bot, as long as it’s the best.

What constitutes success?

  • You have to be able to know literally everything that the best go bot that you have access to knows about go.
  • It has to be applicable to the current best go bot (or a bot that is essentially as good - e.g. you’re allowed to pick one of the versions of KataGo whose elo is statistically hard-to-distinguish from the best version), not the best go bot as of one year ago.
    • That being said, I think you get a ‘silver medal’ if you understand any go bot that was the best at some point from today on.

Why do I think this is a good challenge?

  • To understand these bots, you need to understand planning behaviour, not just pick up on various visual detectors.
  • In order to solve this challenge, you need to actually understand what it means for models to know something.
  • There’s a time limit: your understanding has to keep up with the pace of AI development.
  • We already know some things about these bots based on how they play and evaluate positions, but obviously not everything.
  • We have some theory about go: e.g. we know that certain symmetries exist, we understand optimal play in the late endgame, we have some neat analysis techniques.
  • I would like to play go as well as the best go bot. Or at least to learn some things from it.

Corollaries of success (non-exhaustive):

  • You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies.
  • You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.

Drawbacks of success:

  • You might learn how to build a highly intelligent and capable AI in a way that does not require deep learning. In this case, please do not tell the wider world or do it yourself.
  • It becomes harder to check if professional human go players are cheating by using AI.

Related work:

A conversation with Nate Soares on a related topic probably helped inspire this post. Please don’t blame him if it’s dumb tho.

New to LessWrong?

New Comment
113 comments, sorted by Click to highlight new comments since: Today at 10:43 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I think there's some communication failure where people are very skeptical of this for reasons that they think are obvious given what they're saying, but which are not obvious to me. Can people tell me which subset of the below claims they agree with, if any? Also if you come up with slight variants that you agree with that would be appreciated.

  1. It is approximately impossible to succeed at this challenge.
  2. It is possible to be confident that advanced AGI systems will not pose an existential threat without being able to succeed at this challenge.
  3. It is not obvious what it means to succeed at this challenge.
  4. It will probably not be obvious what it means to succeed at this challenge at any point in the next 10 years, even if a bunch of people try to work on it.
  5. We do not currently know what it means for a go bot to know something in operational terms.
  6. At no point in the next 10 years could one be confident that one knew everything a go bot knew, because we won't be confident about what it means for a go bot to know something.
  7. You couldn't know everything a go bot knows without essentially being that go bot.

[EDIT: 8. One should not issue a challenge to know everything a go bot knows without having a good definition of what it means for a go bot to know things.]

If your goal is to play as well as the best go bot and/or write a program that plays equally well from scratch, it seems like it's probably impossible. A lot of the go bot's 'knowledge' could well be things like "here's a linear combination of 20000 features of the board predictive of winning". There's no reason for the coefficients of that linear combination to be compressible in any way; it's just a mathematical fact that these particular coefficients happen to be the best at predicting winning. If you accepted "here the model is taking a giant linear combination of features" as "understanding" it might be more doable.

An even more pointed example: chess endgame tables. What does it mean to 'fully understand' it beyond understanding the algorithms which construct them, and is it a reasonable goal to attempt to play chess endgames as well as the tables?

If you have a "lazy" version of the goal, like "have a question-answerer that can tell you anything the model knows" or "produce a locally human-legible but potentially giant object capturing everything the model knows" then chess endgame tables are a reasonably straightforward case ("position X is a win for white").
(I am not one of the people who have expressed skepticism, but I find myself with what I take to be feelings somewhat similar to theirs.) I agree with 1 if it success is defined rather strictly (e.g., requiring that one human brain contain all the information in a form that actually enables the person whose brain it is to play like the bot does) but not necessarily if it is defined more laxly (e.g., it's enough if for any given decision the bot makes we have a procedure that pretty much always gives us a human-comprehensible explanation of why it made that decision, with explanations for different decisions always fitting into a reasonably consistent framework). I have no idea about 2; I don't think I've seen any nontrivial but plausibly true propositions of the form "It is possible to be confident that advanced AGI systems will not pose an existential threat without X", but on the other hand I don't think this justifies much confidence that any specific X is a thing we should be working on if we care about being able to be confident that advanced AGI systems will not pose an existential threat. I agree with 3, but I think this is because it hasn't been defined as explicitly as possible rather than because of some fundamental unclarity in the question. Accordingly, I think 4 is probably wrong. I'm not sure whether 5 is true or not and suspect that the answer depends mostly on how you choose to define "know". (Maybe go bots don't know anything!) I'm pretty confident saying that today's best go bots know who's likely to win in many typical game positions, or whether a given move kills a particular group or not. Accordingly, I am inclined to disagree with 6, even though probably there are edge cases where it's not clear whether a given bot "knows" a given thing or not. I don't know what "essentially being" means in 7; as written it looks wrong to me, but for some strong definitions of "know everything it knows" something close enough might be true. E.g., plausibly
I think I basically agree with all of this.
Taboo "know" and try to ask the question again, because I think you're engaging in a category error when you posit that, for example, a neural network actually knows anything at all.  That is, the concept of "knowledge" as it applies to a human being cannot be meaningfully compared to "knowledge" as it applies to a neural network; they aren't the same kind of thing.  A Go AI doesn't know how to play Go; it knows the current state of the board.  These are entirely different categories of things. The closest thing I think the human brain has to the kind of "knowledge" that a neural network uses is the kind of thing we represent in our cultural narrative as, for example, a spiritual guru slapping you for thinking about doing something instead of just doing it.  That is, we explicitly label this kind of thing, when it occurs in the human brain, as not-knowledge. ETA: You can move your arm, right?  You know how to move your arms and your legs and even how to do complicated things like throw balls and walk around.  But you don't actually know how to do any of those things; if you knew how to move your arm - much less something complicated like throwing balls! - it would be a relatively simple matter for you to build an arm and connect it to somebody who was missing one. Does this seem absurd?  It's the difference between knowing how to add and knowing how to use a calculator.  Knowing how to add is sufficient information to build a simple mechanical calculator, given some additional mechanical knowledge - knowing how to use a calculator gives you no such ability.
Why do you believe that?
4Charlie Steiner3y
To make my own point that may be distinct from ACP's: the point isn't that neural networks don't know anything. The point is that the level of description I'm operating on when I say that phrase is so imprecise that it doesn't allow you to make exact demands like knowing "everything the NN does" or "exactly what the NN does," for any system other than a copy of that same neural network. If I make the verbal chain of reasoning "the NN can know things, I can know things, therefore I can know what the NN knows," this chain of reasoning actually fails. Even though I'm using the same English word "know" both times, the logical consequences of the word are different each time I use it. If I want to make progress here, I'll need to taboo the word "know."
Because I think the word "know", as used by a human understanding a model, is standing in for a particular kind of mirror-modeling, in which we possess a reproductive model of a thing in our mind which we can use to simulate a behavior, whereas the word "know", as used by the referent AI, is standing in for "the set of information used to inform the development of a process". So an AI which has been trained on a game which it lost can behave "as if it has knowledge of that game", when in fact the only remnant of that game may be a slightly adjusted parameter, perhaps a connection weighting somewhere is 1% different than it would otherwise be. In order to "know" what the AI knows, in the sense that it knows it, requires a complete reproduction of the AI state - that is, if you know everything the AI actually knows, as opposed to the information-state that informed the development of the AI, then all you actually know, in that case, is that this particular connection is weighted 1% different; in order to meaningfully apply this knowledge, you must simulate the AI (you must know how all the connections interaction in a holistic sense), in which case you don't know anything, you're just asking the AI what it would do, which is not meaningfully knowing what it knows in any useful sense. Which is basically because it doesn't actually know anything.  Its state is an algorithm, a process; this algorithm could perhaps be dissected, broken down, simplified, and turned into knowledge of how it operates - but this is just another way of simulating and querying a part of the AI; critically, knowing how the AI operates is having knowledge that the AI itself does not actually have. Because now we are mirror-modeling the AI, and turning what the AI is, which isn't knowledge, into something else, which is.
I guess it seems to me that you're claiming that the referent AI isn't doing any mirror-modelling, but I don't know why you'd strongly believe this. It seems false about algorithms that use Monte Carlo Tree Search as KataGo does (altho another thread indicates that smart people disagree with me about this), but even for pure neural network models, I'm not sure why one would be confident that it's false.
Because it's expensive, slow, and orthogonal to the purpose the AI is actually trying to accomplish. As a programmer, I take my complicated mirror models, try to figure out how to transform them into sets of numbers, try to figure out how to use one set of those numbers to create another set of those numbers.  The mirror modeling is a cognitive step I have to take before I ever start programming an algorithm; it's helpful for creating algorithms, but useless for actually running them. Programming languages are judged as helpful in part by how well they do at pretending to be a mirror model, and efficient by how well they completely ignore the mirror model when it comes time to compile/run.  There is no program which is made more efficient by representing data internally as the objects the programmers created; efficiency gains are made in compilers by figuring out how to reduce away the unnecessary complexity the programmers created for themselves so they could more easily map their messy intuitions to cold logic. Why would an AI introduce this step in the middle of its processing?
1Josh Smith-Brennan3y
I've studied Go using AI and have heard others discuss the use of AI in studying Go. Even for professional Go players, the inability for the AI to explain why it gave a higher win rate to a particular move or sequence is a problem.  Even if you could program a tertiary AI which could query the Go playing AI, analyze the calculations the Go playing AI is using to make it's judgements, and then translate that into english (or another language) so that this tertiary AI could explain why the Go playing AI made a move, I would still disagree that even this hybrid system 'knew' how to play Go. There is a definite difference between 'calculating' and 'reasoning' such that even a neural network with it's training I think is really still just one big calculator, not a reasoner.
My take is: * I think making this post was a good idea. I'm personally interested in deconfusing the topic of universality (which basically should capture what "learning everything the model knows"), and you brought up a good "simple" example to try to build intuition on. * What I would call your mistake is a mostly 8, but a bit of the related ones (so 3 and 4?). Phrasing it as "can we do that" is a mistake in my opinion because the topic is very confused (as shown by the comments). On the other hand, I think asking the question of what it would mean is a very exciting problem. It also gives a more concrete form to the problem of deconfusing universality, which is important AFAIK to Paul's approaches to alignment.
One operationalization of "know" in this case is being able to accurately predict every move of the Go AI. This is a useful framing, because instead of a hard pass/fail criterion, we can focus on improving our calibration. Now the success criterion might be: * You have to be able to attain a Brier score of 0 in predicting the moves of the best go bot that you have access to. What's missing are some necessary constraints. Most likely, you want to prohibit the following strategies: 1. Running a second instance of the Go AI on the same position, and using as your prediction the move that instance #2 makes. 2. Manually tracing through the source code to determine what the output would be if it was run. 3. Memorizing the source code and tracing through it in your head. 4. Constraining the input moves to ones where every Go program would make the same move, then using the output of one of a different Go program as your input. 1. Corollary: you can't use any automation whatsoever to determine what move to make. Any automated system that can allow you to make accurate predictions is effectively a Go program. Overall, then you might just want to prohibit the use of Turing machines. However, my understanding is that this results in a ban on algorithms. I don't have enough CS to say what's left to us if we're denied algorithms. Here's a second operationalization of "know." You're allowed to train up using all the computerized help you want. But then, to prove your ability, you have to perfectly predict the output of the Go program on a set of randomly generated board positions, using only the power of your own brain. A softer criterion is to organize a competition, where participants are ranked by Brier score on this challenge. However, this version of the success criterion is just a harder version of being an inhumanly good Go player. Not only do you have to play as well as the best Go program, you have to match its play. It's the difference between being a b
I was thinking more of propositional knowledge (well, actually belief, but it doesn't seem like this was a sticking point with anybody). A corollary of this is that you would be able to do this second operationalization, but possibly with the aid of a computer program that you wrote yourself that wasn't just a copy of the original program. This constraint is slightly ambiguous but I think it shouldn't be too problematic in practice. The actual thing I had in mind was "come up with a satisfactory operationalization".
  I'm going to assume it's impossible for me, personally, to outplay the best Go AI I have access to. Given that, the requirement is for me to write a better Go AI than the one I currently have access to. Of course, that would mean that my new self-written program is now the best Go AI. So then I would be back to square one.
1Rudi C3y
There are weaker computational machines than Turing machines, like regexes. But you don really care about that, you just want to ban automatic reasoning. I think it’s impossible to succeed with that constrain; Playing Go is hard, people can’t just read code that plays Go well and “learn from it.”
1Jacob Pfau3y
One axis along which I'd like clarification is whether you want a form of explanation which is learner agnostic or learner specific? It seems to me that traditional transparency/interpretability tools try to be learner agnostic, but on the other hand the most efficient way to explain makes use of the learner's pre-existing knowledge, inductive biases, etc.  In the learner agnostic case, I think it will be approximately impossible to succeed at this challenge. In the learner specific case, I think it will require something more than an interpretability method. This latter task will benefit from better and better models of human learning -- in the limit I imagine something like a direct brain neuralink should do the trick... On the learner specific side, it seems to me Nisan is right when he said 'The question is if we can compress the bot's knowledge into, say, a 1-year training program for professionals.' To that end, it seems like a relevant method could be an improved version of influence functions. Something like find in the training phase when the go agent learned to make a better move than the pro and highlight the games (/moves) which taught it the improved play.
I don't know what you mean by "learner agnostic" or "learner specific". Could you explain?
1Jacob Pfau3y
Not sure what the best way to formalize this intuition is, but here's an idea. (To isolate this learner-agnostic/specific axis from the problem of defining explanation, let me assume that we have some metric for quantifying explanation quality, call it 'R' which is a function from <Model, learner, explanation> triples to real values.) Define learner-agnostic explanation as optimizing for aggregate R across some distribution of learners -- finding the one optimal explanation across this distribution. Learner-specific explanation optimizes for R taking the learner as an input -- finding multiple optimal explanations, one for each learner. The aggregation function in the learner-agnostic case could be the mean, or it could be a minimax function. The minimax case intuition would be formalizing the task of coming up with the most accessible explanation possible. Things like influence functions, input-sensitivity methods, automated concept discovery are all learner-agnostic. On the other hand, probing methods (e.g. as used in NLP) could maybe be called learner-specific. The variant of influence functions I suggested above is learner-specific. In general, it seems to me that as the models get more and more complex, we'll probably need explanations to be more learner-specific to achieve reasonable performance. Though perhaps learner-agnostic methods will suffice for answering general questions like 'Is my model optimizing for a mesa-objective'? 
I guess by 'learner' you mean the human, rather than the learned model? If so, then I guess your transparency/explanation/knowledge-extraction method could be learner-specific, and still succeed at the above challenge.
I’d say 1 and 7 (for humans). The way humans understand go is different to how bots understand go. We use heuristics. The bots may use heuristics too but there’s no reason we could comprehend those heuristics. Considering the size of the state space it seems that the bot has access to ways of thinking about go that we don’t, the same way a bot can look further ahead in a chess games than we could comprehend.

This sounds like a great goal, if you mean "know" in a lazy sense; I'm imagining a question-answering system that will correctly explain any game, move, position, or principle as the bot understands it. I don't believe I could know all at once everything that a good bot knows about go. That's too much knowledge.

That's basically what Paul's universality (my distillation post for another angle) is aiming for: having a question-answering overseer which can tell you everything you want to know about what the system knows and what it will do. You still probably need to be able to ask a relevant question, which I think is what you're pointing at.
Maybe it nearly suffices to get a go professional to know everything about go that the bot does? I bet they could.
What does that mean though? If you give the go professional a massive transcript of the bot knowledge, it's probably unusable. I think what the go professional gives you is the knowledge of where to look/what to ask for/what to search. 
Or maybe it means we train the professional in the principles and heuristics that the bot knows. The question is if we can compress the bot's knowledge into, say, a 1-year training program for professionals. There are reasons to be optimistic: We can discard information that isn't knowledge (lossy compression). And we can teach the professional in human concepts (lossless compression).
1Josh Smith-Brennan3y
Many Professional Go players are already using AI to help them study, including understanding the underlying technology and algorithms, with mixed results.  Humans have been playing Go for thousands of years and there is already a long and respected tradition and cannon of literature with commentaries and human reasoning to pull from. Most human players have used human created rituals to study with,and see studying AI as just one tool among many. Some don't give it much credence at all. Another problem is that when you rely on Go to make a living, taking time to attempt to incorporate ML and AI concepts into your study and tournament schedule is a big risk. Because currently there's no way to query the AI to understand why it made moves, much of what AI provides is essentially meaningless to human players.  The problem isn't even necessarily because it's an AI either; if you were trying to learn how to play Go from a Human teacher who won every game, but who couldn't communicate with you or anyone else because of the language barrier, you would be better served finding a teacher you could talk with, even if they didn't win as often as the 'unbeatable idiot.'  Additionally though many Go players see Go as a human game too, and feel offended at the encroachment of technology into their domain.  Besides most of the development of the AI associated with the Alpha Go series has branched off into development of the AI into the domain of Protein Folding, which I personally think and feel is a much better use of the technology.
How comparable are Go bots to chess bots in this?  Chess GMs at the highest level have been using engines to prepare for decades; I think if they're similar enough, that would be a good sample to look at for viable approaches.
If you show a Chess AI an endgame situation that it clearly wins I would expect it to make moves that end the game that are similar to those that a professional Chess player would make. Moves that end the game in the straightforward fashion. On the other hand the equvialent of AlphaGo's behavior would be if the Chess AI would first sacrifice some pieces where the sacrifice doesn't matter to win and then make a bunch of random moves before after a while making moves to mate maybe because of some rule of maximal moves or the time suggesting that the time game should be finished soon.  This difference makes it much easier to learn about the Chess endgame from a chess computer program then it is to learn from AlphaGo.
It might be worth mentioning that the specific bot mentioned in the OP, David Wu's KataGo, doesn't make random-looking slack moves in the endgame because the figure of merit it's trying to maximize involves both win probability and (with a very small coefficient) final score. This doesn't entirely negate Christian's point; some other strong bot might still have that behaviour, and KataGo itself may well have other features with similar implications. On the other hand, there's at least one respect in which arguably chess bots are harder to learn from than go bots: more of their superiority to humans comes from tactical brilliance, whereas more of go bots' superiority (counterintuitive though it would be to anyone working on computer game-playing say 20 years ago) comes from better positional understanding. It may be easier to learn "this sort of shape is better than everyone thought" than "in this particular position, 10 moves into the tactics you can throw in the move a6 which prevents Nb7 three moves later and wins". (I am not very confident of this. It might turn out that some of the things go bots see are very hard for humans to learn in a kinda-similar way; what makes "this shape" better in a particular case may depend on quirks of the position in ways that look arbitrary and random to strong human go players. And maybe some of the ridiculous inhuman tactical shots that the machines find in chess could be made easier to see by systematically different heuristics for what moves are worth considering at all.)
Even when that's true, it suggests that it might also try to do at the end a bit to maximize final score and not be as unconcered about it as Alpha Go, the fact that Alpha Go behaves like it  does suggest that what's important for being able to play very strong isn't about local patterns.  I have another argument that's a bit more Go specific is that even for humans professional players follow patterns less then amateuers in 1-5 kyu. If you look at openings, the amateur openings look play the same series of moves for a longer time then the professionals. The amateurs are often playing patterns they learned while the professionals (and very strong amateur players) are less pattern focused. According to the analysis of the pros, Alpha Go seemed not playing according to local patterns and thinking more globally to another level (here I have to trust the professionals because that difference goes beyond my Go abilities while the other arguments I made are more about things that I can reason about without trusting others).  In Go it's often not three moves later but 100 moves later. Go is not a game where in the early/midgame with the expection of life/dead situations the important consequences are not a handful of moves in the future but much further out. 
I have the same sense that strong go bots play more "globally" than strong humans. (Though I think what they do is in some useful sense a generalization of spotting local patterns; after all, in some sense that's what a convolutional neural network does. But as you add more layers the patterns that can be represented become more subtle and larger, and the networks of top bots are plenty deep enough that "larger" grows sufficiently to encompass the whole board.) I think what's going on with different joseki choices between amateurs and very strong humans isn't exactly more patterns versus less patterns. Weak human players may learn a bunch of joseki, but what they've learned is just "these are some good sequences of moves". Stronger players do better because (1) they have a better sense of the range of possible outcomes once those sequences of moves have been played out and (2) they have a better sense of how the state of the rest of the board affects the desirability of those various outcomes. So they will think things like "if I play this joseki then I get to choose between something like A or B; in case A the stones I play along the way will have a suboptimal relationship with that one near the middle of the left side, and in case B the shape I end up with in the corner fits well with what's going on in that adjacent corner and there's a nice ladder-breaker that makes the opposite corner a bit better for me, but in exchange for all that I end up in gote and don't get much territory in the corner; so maybe that joseki would be better because [etc., etc.]", whereas weak players like, er, me have a tiny repertoire of joseki lines (so would have to work out from first principles where things might end up) and are rubbish at visualizing the resulting positions (so wouldn't actually be able to do that, and would misevaluate the final positions even if we could) and don't have the quantitative sense of how the various advantages and disadvantages balance out (so even i
0Josh Smith-Brennan3y
Very much so. I have the same sense. From my understanding, Professional players (and stronger amateurs) still rely heavily on Joseki, it's just that they Joseki become longer and more complicated. In a lot of ways, the stronger you get I think the more reliant you become on patterns you know have succeeded for you or others in the past. It's the reason why Professionals spend so much time studying, and why most, if not all top ranked professionals started studying and playing as children. It takes that kind of dedication and that amount of time to learn to become a top player. It's possible to become a strong amateur Go player based on 'feeling' and positional judgement, but without being able to read your moves out to a decent degree - maybe 10 -15 moves ahead methodically - it's not easy to get very strong.  
In case it wasn't clear, that sentence beginning "Stronger players do better" was not purporting to describe all the things that make stronger go players stronger, but to describe specifically how I think they are stronger in joseki. I don't think joseki are the main reason why professional go players spend so much time studying, unless you define "studying" more narrowly than I would. But that's pure guesswork; I haven't actually talked to any go professionals and asked how much time they spend studying joseki. (Professional chess players spend a lot of time on openings, and good opening preparation is important in top-level chess matches where if you find a really juicy innovation you can practically win the game before it's started. I think that sort of thing is much less common in go, though again that's just a vague impression rather than anything I've got from actual top-level go players.)
This is also my understanding.
0Josh Smith-Brennan3y
I didn't take it as if it was all they did.  With (1) it seems like your describing the skill of reading, but not necessarily reading with the understanding of how to play so that you have a good outcome, or reading and assessing the variations of a particular position, and with (2) your describing reading how local play affects global play. I think if they are truly strong players, they also (3) understand the importance of getting and maintaining sente, and (4) also see joseki (or standard sequences) from both sides, as white and as black. I was talking mostly about studying in preparation to become a professional, like daily study for 8 hours a day, the path from say 1k-9p, although Joseki are usually an important part of study at any level. I think the term also applies more loosely to 'sequences with a good outcome'. Coming up with new and personal 'proprietary' joseki I think consumes a lot of study time for professionals, while going over other peoples or AI games and exploring the different variations.    There are other things to study, but I still maintain that Joseki make up fair amount of Professional knowledge. Some people study openings, others life and death problems, end game scenarios, but they all rely on learning set patterns and how to best to integrate them.
0Josh Smith-Brennan3y
AlphaGo was partly trained using human games as input, which I believe KataGo was as well.  But AlphaGoZero didn't use any human games as input, it basically 'taught itself' to play Go.  Seeing as how AlphaGo and KataGo used human games, which rely on integrating reasoning between local and global consideration, the development of the algorithms is different than that of AlphaGoZero.  Does AlphaGo rely on local patterns? Possibly, but AlphaGoZero? Where humans see a 3 phase game with maybe 320 moves, which gets broken down into opening, middle and end game, ko's, threats, exchanges, and so on, it seems likely AlphaGoZero sees the whole game as one 'thing' (and in fact sees that one game as just one variation in the likely billions of millions of trillions of games it has played with itself). Even at AlphaGoZeros level though, I think considering local patterns is still probably a handy way to divide computation of an infinite set of games into discrete groups when considering the branches of variations; sort of the way that Wikipedia still uses individual headings and pages for entries, even though they could probably turn the entire contents of Wikipedia into one long entry. It would be very difficult to navigate if they did though. I've heard stories of  Go professionals from the classical era claiming it's possible to tell who's going to win by the 2nd move.
If you know whom is playing whom and how much handicap there is you sometimes can tell who is going to win pretty reliably. There's not much more information that the first two moves give you in a game of professionals.  In Go you frequently change the place of the board on which you are playing. If you play enough moves in a given area of the board that the next move is worth X points but elsewhere on the board there's a move worth X+2 points, you change the place at which you are playing. This frequently means that it takes a long time till the game again continues playing many more moves at a certain place and by that point other things changed on the board. 
0Josh Smith-Brennan3y
The story I'm referring to was in relation to games played between evenly matched players, (without the use of handicaps) and has to do with the nuances of different approaches to playing the opening. It was a hard to encapsulate concept at the time of the original comment (way before modern computing came on the scene) but has to do with what would probably now be considered win-rate. The first 4 moves of a game between players familiar with the game - most of the time - reliably are played in each of the 4 corners of a 19x19 board. There are usually 3 different moves in each of the 4 corners, and the comment had to do with literally the 2nd move of a game between 2 players of any skill level, sans handicap I'm sure. This comment of course came from a very well respected professional player who had devoted his life to the study and play of Go, at a time when the only ways to play were either face to face, or a correspondence game played through the mail.  This is why the concept of shape building is so complex and what you are referring to is the concept of 'Tenuki'. This is what is usually referred to as reading local versus global positioning.
KataGo was not trained on human games. I wonder whether we are interpreting "local patterns" in different ways. What I mean is the sort of thing whose crudest and most elementary versions are things like "it's good to make table shapes" and "empty triangles are bad". The earlier layers of a CNN-based go-playing network are necessarily identifying local patterns in some sense. (Though KataGo's network isn't a pure CNN and does some global things too; I forget the details.) If you can predict the winner of a go game after two moves then it's because (1) one of the players played something super-stupid or (2) you're paying attention to the way they look at the board, or the authority with which they plonk down their stones, or something of the sort. In normal games it is obviously never the case that one player has a decisive advantage by the second move.
The 'global' things seem to be pooling operations that compute channel-wise means and maxes. Paper link.
0Josh Smith-Brennan3y
I don't think this use of the term 'global' is how Go players use the term, which is what this part of the discussion is about. This is probably where some of the misunderstanding comes from. 
0Josh Smith-Brennan3y
Here I don't think you're using the terms 'locally' and 'globally' in the standard sense that Go players use them.  Seeing as how CNN based processing underlies much of image processing, analyzing the shapes on a GO board this way makes a lot of sense, it's also how humans understand the game.  However, I don't understand what you mean by "KataGo's network isn't a pure CNN and does some global things too..." here the use of the word 'global' seems qualitatively different than how you use 'local'.
Perhaps you would like to clarify how you are intending to use the word "local"? My usage here is as follows: a "local pattern" is something whose presence or absence you can evaluate by looking at a small region of the board. (The smaller, the more local; locality comes in degrees. Presence or absence of a pattern might do, too.) So e.g. an empty triangle is an extremely local pattern; you can tell whether it is present by looking at a very small region of the board. A ponnuki is slightly less local, a table-shape slightly less local again, but these are all very local. A double-wing formation is substantially less local: to determine that one is present you need to look at (at least) a corner region and about half of one adjacent side. A ladder in one corner together with a ladder-breaker in the opposite corner is substantially less local again: to see that that's present you need to look all the way across the board. (I should maybe reiterate that the networks aren't really computing simple binary "is there an empty triangle here?" values, at least not in later layers. But what they're doing in their earlier layers is at least a little bit like asking whether a given pattern is present at each board location.) This seems to me to be the standard sense, but I might well be missing something. I had a quick look through some books but didn't spot any uses of the word "local" :-). (One can talk about things other than patterns being local. E.g., you might say that a move is locally good, meaning something like "there is some, hopefully obvious, portion of the board within which this move is good for most plausible configurations of the rest of the board, but it's possible that in the actual global position it's not a good move". Or sometimes you might say the same thing meaning just "this is the best move among moves in this part of the board". Or you might say that a group is locally alive, meaning something similar: for the group to be dead there would need to
0Josh Smith-Brennan3y
That is a lot to consider. I'll try to take my time to parse it apart a bit more before I try to respond. 
0Josh Smith-Brennan3y
I'm not a programmer, but have been trying to fit learning more about AI into my day, sort of using Go bots as an entry point into beginning to understand how neural nets and algorithms work in a more concrete, less conceptual way. So then was KataGo simply playing itself in order to train? I've spent 20 years or so playing casually on 19x19 boards mostly, and I think my concept of local play is less crude than the one your talking about. I tend to think of local play as play that is still in some sort of relationship to other shapes and smaller parts of the board, where what you are describing seems to imbue the shapes with an 'entity' of sorts apart from the actual game, if that makes sense. I think it's hard to describe a tree to someone who's never heard of one, without describing how it relates to it's environment:  (A) "one type of tree is a 30ft tall cylinder of wood with bark and has roots, branches and leaves" versus (B) "many trees make up a forest, they use roots buried in the ground to stand up and pull nutrients from the ground and they use their leaves to photosynthesize and turn carbon dioxide into oxygen as well as providing shade". (A) describes certain characteristics of a particular species of tree as it relates to itself and what it means to be a tree, whereas (B) describes what trees do in relation to a local environment. If you take that further, you could talk globally (literally) about how all the trees in the world contribute to clearing pollution out of the air, protecting the integrity of soil and provide support for all kinds of wildlife, as well as provide timber for construction, fuel and paper industries.  All the local situations around the world add up to complete the global situation.
Yes, KataGo trains entirely through self-play. It's not "100% pure Zero" in that it doesn't only play entire games from the start. So e.g. it gets supplied with some starting positions that are ones in which some version of KataGo was known to have blindspots (in the hope that this helps it understand those positions better and lose the blindspots) or ones that occur in human games but not in KataGo self-play games (in the hope that this helps it play better against humans and makes it more useful for analysing human games). But I believe all its training is from self-play and e.g. it's never trying to learn to play the same moves as humans did. (The blindspot-finding is actually pretty clever. What they do is to take a lot of games, and search through them automatically for places where KG doesn't like the move that was actually played but it leads to an outcome that KG thinks is better than what it would have got, and then make a small fraction of KG's training games use those starting positions and also add some bias to the move-selection in those training games to make sure the possibly-better move gets explored enough for KG to learn that it's good if it really is.) I am not surprised that your concept of local play is less crude than something I explicitly described as the "crudest and most elementary versions". It's not clear to me that we have an actual disagreement here. Isn't there a part of you that winces a little when you have to play an empty triangle, just because it's an ugly very-local configuration of stones? Here's my (very rough-and-ready; some bits are definitely inaccurate but I don't care because this is just for the sake of high-level intuition) mental model of how a CNN-based go program understands a board position. (This is just about the "static" evaluation and move-proposing; search is layered on top of that and is also very important.) * There are many layers. * "Layer zero" knows, for each location on the board, whether there's a
-2Josh Smith-Brennan3y
From the paper on KataGo:   This says enough to help me understand there were no human games as input involved in the initial training of the KataGo engine. Here the acknowledgement of the gains in greater efficiency from non-domain specific improvements to the software and hardware architecture are somewhat insightful.  As I'm running a windows machine with less than an i5, an integrated graphics card, and a somewhat respectable 12gigs of ram, playing around with neural nets and attempting to train them is sort of out of the question at this point. I do have an interest in this area, although at this point my interests I think would be better served if I could work with someone already trained in these areas.  I suppose this gets back to OP's desire to program a Go Bot in the most efficient manner possible. I think the domain of Go would still be too large for a human to 'know' Go the way even the most efficient Go Bot would/will eventually 'know' Go. Although I'm sure we agree about quite a bit in these regards, I wouldn't necessarily put an isolated instance of something like 'an empty triangle' under the heading of local play. Although at lower levels of consideration under circumstances of attempting to define the idea of 'shape' and the potential they have, it is closer to local play than to global play, especially if it's true that the earlier layers compute based on local play, instead of global play.  I kind of doubt that though. Maybe the initial models did, but after some refinement it seems plausible that even the lowest levels take global position into account, and at the scale and speed AI neural nets can compute, it's difficult to compare human thinking of differences between local play and global play to AI thinking on the matter. It seems reasonable to assume a good Engine like Katago with the most up-to date models might function as if it plays globally the entire game. This is what human players study to develop, a global underst
I think that at least some of the time you are using "local" and "global" temporally whereas I am using them spatially. (For short-term versus long-term I would tend to use distinctions like "tactics" versus "strategy" rather than "local" versus "global".) Aside from that, I cannot think of anything more local than wincing at an empty triangle. If by "lowest levels" you mean the earliest layers of the network in a KataGo-like bot, they literally cannot take global position into account (except to whatever extent the non-neural-network bits of the program feed the network with more-global information, like "how many liberties does the group containing this stone have?").
0Josh Smith-Brennan3y
I am using the terms 'locally' and 'globally' both temporally and spatially. Like all board games, time and space effect the outcomes, I don't really think you can have one without the other. Can you give me an example of what you are referring to specifically?  I really don't know what you mean by this. Empty triangles get played all the time, it's just that they are not considered an efficient use of the stones.  Placing one stone next to another is 'just as local' as creating an empty triangle; in terms of spatial consideration, local just refers to moves or shapes that are close by to each other. Is there another meaning for local you are thinking of? What I mean by globally in this instance has more to do with how the training the engine has already gone through has biased it's algorithms to consider 'good plays' over 'bad plays' by previously playing games and then analyzing those games from a global perspective in order to retrain the networks to maximize it's chances of success in subsequent games. The 'global' consideration has already been done prior to the game, it is simply expressed in the next game.
As I said elsewhere in the thread, by "local" I mean "looking only at a smallish region of the board". A "local pattern" is one defined by reference to a small part of the board. A group is "locally alive" if there's nothing in the part of the board it occupies that would make it not-alive. A move is "the best move locally" if when you look at a given smallish region of the board it's the best move so far as you can judge from the configuration there. Etc. (There are uses of "local" that don't quite match that; e.g., a "local ko threat" is one that affects the stones that take part in the ko.) What I mean about empty triangles is (of course) not that making an empty triangle is always bad. I mean that the judgement of whether something is an empty triangle or not is something you do by looking only at a very small region of the board; and that if some part of you winces just a little when you have to play one, that part of you is probably looking only at a small bit of the board at a time. That is: judging that something is an empty triangle (which your brain needs to do for that slight wincing reaction to occur) is a matter of recognizing what I have been calling a "local pattern". Yes, "placing one stone next to another" is also a local thing; the notion of "attachment", for instance, is a very local one. Yes, the computations the network does even on its lowest layers have been optimized to try to win the whole game. But what that means (in my terms, at least) is that the early layers of the network are identifying local features of the position that may have global significance for the actual task of evaluating the position as a whole and choosing the next move.
0Josh Smith-Brennan3y
Once again, I have to say then, I'm not sure where the disagreement stems from between you and I.  Although I would say that the idea of 'locally alive' is a little confusing: a group is either 'alive' because it has 2 real eyes or has implied shape so that it cannot be killed (barring potential ko's which might force the player to sacrifice the group for a more important strategic play elsewhere) or it's 'possible to kill' at which point it would be considered 'not yet alive.' I think this is another way to describe 'locally alive' possibly? Maybe I don't understand what you mean by this, but I think that does match the same concept: i.e. white starts a ko battle by capturing a stone in blacks huge dragon, a stone which is necessary for blacks shape to live. So black must respond by making a ko threat elsewhere that is approx. of equal value to the loss of blacks dragon, otherwise white has no reason to continue the battle and can take the ko, thereby killing blacks huge group.  If black makes such a threat, so that white must respond with another ko threat, it would be to whites advantage to be able to make a 'local ko' threat, meaning that the new ko threat by white would still effect the shape of concern - namely blacks dragon - so that now there are 2 points of importance at risk for blacks group to live instead of just the one. This is what I would consider to be a 'local ko' threat, because it builds directly on the first ko threat instead of forcing white to find another ko threat elsewhere, indirectly affecting blacks play, but not blacks dragon, the place where the original ko started.
I too am not sure whence cometh our disagreement, but I know the point at which I first thought we had one. There was some discussion of CNN-based go programs looking at "local patterns" and you said: which seemed to me to be responding to "these programs look at local patterns" with "I don't believe AlphaGo Zero does, because it sees the whole game as one thing rather than looking at different phases of the game separately", and I think that in the previous discussion "local" was being used spatially (small region of the board) rather than temporally (one phase of the game, or part thereof) but your response seemed to assume otherwise. On "locally alive", on reflection I think a more common usage is that you call a group "locally alive" when you can see two eyes for it (or an unstoppable way of getting them) locally; but it can be "not locally alive" without being "dead" because there might be a way to connect it to other useful things, or run it out into somewhere where it has space to make more eyes. I think we are using "local ko threat" in pretty much the same way, which is reassuring :-). I think it's a bit different from other uses of "local" because the region of the board involved can potentially be very large, if e.g. the black dragon in your example stretches all the way across the board. But it's not very different; it's still about being concerned only with a subset of the board.
0Josh Smith-Brennan3y
Sorry if this goes a bit funny in places, I've been up all night. We had 4 cop cars and a helicopter taking an interest in the apartment complex I live in last night and I haven't been able to sleep since. Ok. I think we are on the same page now, which is good. I've had to readjust the parameters of my thinking a bit in order to look at similarities in our writing about our thinking. I consider myself to be a natural skeptic, so I tend to question things first before I find a way to agree with them. I blame my mom for this, so any complaints should be sent to her. :) I'm a little familiar with CNN's, although I didn't know the exact name. I've previously done a little research into Neural Nets as they relate to Machine Vision, namely just trying to familiarize myself with toy models of what they are, how they function, and a little on training them.  I am/am not surprised they are used for Go playing Ai, but that is a slightly different topic for another time hopefully. As for the meaning of "local patterns", I think of them as a human concept, a 'short cut' of sorts to help humans divide the board up into smaller subsets as you mentioned. I think we naturally see the whole board when we play Go, and it is through training that we begin to see 'local patterns.' Every move in a physical game, uses the matter of a stone to create meaning to everyone watching. All observers as well as the players are all seeing the same matter, and so the meaning is shared even though some people are trained to see more, and more accurate information about the game. You cannot see the players brains working unless you put them in fMRI machines or something of that nature, but you can see the judgement of their contemplation in the matter of the placement of a stone on the board. The meaning is a by product of the matter, and vice versa. The meaning and the matter are entangled. In an instance of a Go playing AI, we can actually 'see' or try ti 'understand' what is going on inside
Robot arms and computer vision, at the level necessary for playing a game of go, are I think a sufficiently solved problem that there's no particular reason why AI researchers working on making a strong go-playing program would bother hooking them up. On its own I don't think doing that would add anything interesting; in particular, I don't think there's any sense in which it would make the program's thinking more human-like. I don't know about the Alphas, but my laptop running KataGo uses an amount of power that's in the same ballpark as Ke Jie (more than just his brain, less than his whole body) and I'm pretty sure it would beat him very solidly. Go programs don't generally concern themselves with power consumption as such but they do have to manage time (which is roughly proportional to total energy use) -- but so far as I know the time-management code is always hand-written rather than being learned somehow. No one is claiming that a strong go-playing program is anything like an artificial general intelligence or that it makes humans obsolete or anything like that. (Though every surprising bit of progress in AI should arguably make us think we're closer to making human-level-or-better general intelligences than we'd previously thought we were.) Programs like AlphaGo Zero or KataGo don't see the board in terms of local patterns as a result of learning from humans who do (and actually I very much doubt that non-Zero AlphaGo can usefully be said to do that either), they see the board in terms of local patterns because the design of their neural nets encourages them to do so. At least, it does in the earlier layers; I don't know how much has been done to look at later layers and see whether or not their various channels tend to reflect local features that are comprehensible to humans. Of course every move is a whole-board move in the sense that it is made on a whole board, its consequences may affect the whole board, and most of the time the way you chose it inv
0Josh Smith-Brennan3y
Warning: This is a long post and there's some personal stuff about my living situation near the beginning. I figure if people on the forum can bring up their issues with living in expensive, culturally blessed parts of the country, I can bring up issues of living in the homeless shelter system. I also apologize in advance for the length as I haven't addressed many of the more technical aspects yet. I partly blame the fascinating intersections of AI and human culture for my long post. I do sort of take these posts as opportunities to attempt to draw the different threads of my thinking closer together, with the sometimes unhelpful effect that though I can explain certain ideas more efficiently and concisely, I try to add in more ideas as a result First let me say this: I'll address the points you bring up as best as I can given my approach and purposes in this discussion. I have some questions and some theories that I think are suitable for further development, and the fact ML and AI have developed so far, makes me think they would help me investigate my theories.  The fact that DanielFilan created a post addressing these ideas in relationship to Go is a win-win for me, as it gives me a good entry point for discussion of the technical aspects of neural nets and such, so that becoming more familiar with the way they work makes sense if I have any chance of pursuing research with some sort with credibility. You make have a zero% success rate for the opportunities you don't bother to try and make happen. Apologies to you and to Daniel for my skepticism and some of the assumptions I made previously in this discussion; the clarity of the discussion now is challenging to maintain under my  currently rather bad circumstances. As I've mentioned elsewhere, I'm attempting to work my way out of the homeless shelter system where I've existed for the past 4 years and counting. This forum is a much needed lifeline of sanity, especially given the once-in-a-century pandemic we a
If by "OP" you mean me, that's not really my desire (altho that would be nice).
-2Josh Smith-Brennan3y
No offense meant Daniel, generally "OP' stands for 'Original Poster'. I'm uncomfortable using names until I better know people on forums, and Mr. Filan seems too formal, so I settled on OP as is the norm on forums. I guess I'm unsure now of what your post is asking, as I was operating under the understanding that the above quote from your post was the main thrust of it.
OP is a fine way to refer to me, I was just confused since I didn't think my post indicated that my desire was to efficiently program a go bot.
-2Josh Smith-Brennan2y
Sorry Daniel,  I really didn't mean any offense, in fact I was maybe a bit too eager to jump into an area I have interest in, but don't really understand at a technical level. While I am pretty familiar with Go, not so much with ML or AI. In fact I really appreciated the discussion, even though I am conflicted about AI's impact on the Go community.  It's funny though, I recently had the opportunity to talk a very little bit with a 9 dan professional about his experience with AI, and I was a bit surprised by his response. I value his opinion very much, and so have attempted to try and change my attitude about Go playing AI slightly.
I believe what DanielFilan is mostly interested in here is the general project of understanding what neural networks "know" or "understand" or "want". (Because one day we may have AIs that are much much smarter than we are, and being much smarter than us may make them much more powerful than us in various senses, and in that case it could be tremendously important that we be able to avoid having them use that power in ways that would be disastrous for us. At present, the most impressive and most human-intelligence-like AI systems are neural networks, so getting a deep understanding of neural networks might turn out to be not just very interesting for its own sake but vital for the survival of the human race.)
This is correct, altho I'm specifically interested in the case of go AI because I think it's important to understand neural networks that 'plan', as well as those that merely 'perceive' (the latter being the main focus of most interpretability work, with some notable exceptions).
0Josh Smith-Brennan3y
If he used the concept of a Go playing AI to inspire discussion along those lines, then Ok, I did get that. I guess I'm not sure where the misunderstanding came from then.
0Josh Smith-Brennan3y
So let me step back and try to approach this in a slightly different manner. I understand that overall what Daniel " mostly interested in here is the general project of understanding what neural networks "know" or "understand" or "want"." from a position of concern with existential threats from AGI (that is a concern of most people on this forum, one which I share as well). In this particular post, Daniel put forward a thought experiment which uses the concept of attempting to 'know' what a neural network/AI 'knows' by using the idea of programming a Go playing AI; the idea being if you could program a Go playing AI and knew what the AI was doing because you programmed it, might that constitute understanding what an AI 'knew?'  Seeing as how understanding everything that went into programming the Go playing AI would be a lot to 'know', it follows that a very efficient program of a Go playing AI would be easier to 'know' as there would be less to 'know' than if it was a very inefficient program. Which brings me back to my point which Daniel was responding to: I think my point still stands that even an efficient and compact Go playing AI would still be too much for a single person to 'know', while they may understand the whole program they programmed, that would not allow them to play Go at a professional level. Because this part of the thread isn't involved directly with the idea of existential threat from an out of control AGI, I'll leave my thoughts on how this relates for a different post.
0Josh Smith-Brennan3y
Thing is, the way you build shape in Go isn't a straightforward process; the 3 phases of a game, opening, middle game and end game usually involve different types of positional judgement, requiring different ratios of consideration between local position and global position.  Shape building occurs as game play progresses simply because of the aggregation of moves on the board over time, with the development of 'good shape' being desirable because it's easy to defend and useful, and 'bad shape' being difficult to defend and a hindrance.  Most of the opening of the game and part of the middle game, shape is implied, and it is the potential of a shape to support a specific approach or tactic which develops into strategy over the game. It is the ability of the human player to correctly see the potential for shape especially in the opening, and to read out how it is likely to grow over the course of game play which makes the difference between a good player and a mediocre one.   Since a great endgame can never make up for a bad opening, especially when you consider many games between evenly matched players will result in a win with only a 0.5 point lead, a human has to be good at either the opening or the middle game in order to even have a chance of winning in the end game. In human terms, Go bots seem to contemplate all 3 phases, opening, middle, and end game at the same time - from the beginning of the game - while the human player is only thinking about the opening. It seems this long view of AI leads the bots to play moves which at times seem like bad moves. Sometimes a potential rational becomes clear 10 or 15 moves later, but at times it is just plain impossible to understand why a Go bot plays a different move than the one preferred by professionals.  At times yes. Trying to read out all the potential variations of a developing position - over time - from a seemingly arbitrary or random move a Go bot makes results in diminishing returns for a human player. E
I'm not familiar with chess bots, but I would be surprised if one could be confident that chess GMs know everything that chess bots know.
I'm not clear on your usage of the word "know" here, but if it's in a context where knowing and level of play have a significant correlation, I think GMs not knowing would be evidence against it being possible for a human to know everything that game bots do.  GMs don't just spend most of their time and effort on it, they're also prodigies in the sport.
I think it's probably possible to develop better transparency tools than we currently have to extract knowledge from AIs, or make their cognition more understandable.
1Josh Smith-Brennan3y
I don't believe they are that comparable. For starters, an average Chess game lasts somewhere around 20 moves, whereas an average Go game lasts closer to 200 - 300 moves. This is just one example of why a Go playing computer didn't reliably beat  a professional Go player until 15 or so years after a chess playing computer beat a GM.
That's evidence for it being harder to know what a Go bot knows than to know what a chess bot does, right?  And if I'm understanding Go correctly, those years were at least a significant part due to computational constraints, which would imply that better transparency tools or making them more human-understandable still wouldn't go near letting a human know what they know, right?
1Josh Smith-Brennan3y
Yes to your first question. Yes to the second question, but with the caveat that Go playing AI are still useful for certain tasks in terms of helping develop a players game, but with limitations. Will a human player ever fully understand the game of Go period, much less the way an AI does? No I don't think so.
Good point!

You have to be able to know literally everything that the best go bot that you have access to knows about go.

In your mind, is this well-defined? Or are you thinking of a major part of the challenge as being to operationalize what this means?

(I don't know what it means.)

I roughly know what it means, by virtue of knowing what it means to know stuff. But I think I mention that one of the parts is operationalizing better what it means for a model to know things.

I think that it isn't clear what constitutes "fully understanding" an algorithm. 

Say you pick something fairly simple, like a floating point squareroot algorithm. What does it take to fully understand that. 

You have to know what a squareroot is. Do you have to understand the maths behind Newton raphson iteration if the algorithm uses that? All the mathematical derivations, or just taking it as a mathematical fact that it works. Do you have to understand all the proofs about convergence rates. Or can you just go "yeah, 5 iterations seems to be eno... (read more)

One interesting feature of alpha go was that it was generally not playing what a go professional would see as optimal play in the end game. A go professional doesn't play moves that obviously lose points in the late game. On the other hand alpha go played many moves that lost points, likely because it judged them not to change anything about the likelihood of winning the game given that it was ahead by enough points.  A good go player has a bunch of end game patterns memorized that are optimal at maximizing points. When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it's judgements.  Alpha Go is following patterns about what's the optimal move in similar situations much less then human go players do. It's playing the game more globally instead of focusing on local positions. 
7Daniel Kokotajlo3y
I nitpick/object to your use of "optimal moves" here. The move that maximizes points is NOT the optimal move; the optimal move is the move that maximizes win probability. In a situation where you are many points ahead, plausibly the way to maximize win probability is not to try to get more points, but rather to try to anticipate and defend against weird crazy high-variance strategies your opponent might try.
3Donald Hobson3y
This behaviour is consistent with local position based play that also considers "points ahead" as part of the situation.
I think human go players consider points ahead as part of the situation and still don't just play a move that provides no benefit but costs points in the endgame.  We are not talking about situation where there's any benefit to be gained from the behavior as that behavior happened in situations that can be fully read out. There are situations in Go where you don't start a fight that you expect to win with 95% because you already ahead on the board and the 5% might make you lose but that's very far from the moves of AlphaGo that I was talking about.  AlphaGo plays moves that are bad according to any pattern of what's good in Go when it's ahead. 
I feel like it's pretty relevant that AlphaGo is the worst super-human go bot, and I don't think better bots have this behaviour.
Last I heard, Leela Zero still tended to play slack moves in highly unbalanced late-game situations.
That seems right. I think there's reason to believe that SGD doesn't do exactly this (nets that memorize random data have different learning curves than normal nets iirc?), and better reason to think it's possible to train a top go bot that doesn't do this. Yes, but luckily you don't have to do this for all algorithms, just the best go bot. Also as mentioned, I think you probably get to use a computer program for help, as long as you've written that computer program.
3Donald Hobson3y
I'm thinking of humans having some fast special purpose inbuilt pattern recognition, which is nondeterministic and an introspective black box, and a slow general purpose processor. Humans can mentally follow the steps of any algorithm, slowly.  Thus if a human can quickly predict the results of program X, then either there is a program Y  based on however the human is thinking that does the same thing as X and takes only a handful of basic algorithmic operations. Or the human is using their pattern matching special purpose hardware. This hardware is nondeterministic, not introspectively accessible, and not really shaped to predict go bots.  Either way, it also bears pointing out that if the human can predict the move a go bot would make, the human is at least as good as the machine.  So you are going to need a computer program for "help" if you want to predict the exact moves. At this stage, you can ask if you really understand how the code works. And aren't just repeating it by route.
I'd also be happy with an inexact description of what the bot will do in response to specified strategies that captured all the relevant details.

I'm a bit confused. What's the difference between "knowing everything that the best go bot knows" and "being able to play an even game against a go bot."? I think they're basically the same. It seems to me that you can't know everything the go bot knows without being able to beat any professional go player.

Or am I missing something?

You could plausibly play an even game against a go bot without knowing everything it knows.
Sure. But the question is can you know everything it knows and not be as good as it? That is, does understanding the go bot in your sense imply that you could play an even game against it?
I imagine so. One complication is that it can do more computation than you.
But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play." I know more about go than that bot starts out knowing, but less than it will know after it does computation. I wonder if, when you use the word "know", you mean some kind of distilled, compressed, easily explained knowledge?
Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.

I think at this point you've pushed the word "know" to a point where it's not very well-defined; I'd encourage you to try to restate the original post while tabooing that word.

This seems particularly valuable because there are some versions of "know" for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete's ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it'd be good to explain why it's actually a realistic target.

Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime: * it's not obvious to me that this is a realistic target, and I'd be surprised if it took fewer than 10 person-years to achieve. * I do think the knowledge should 'cover' all the athlete's ingrained instincts in your example, but I think the propositions are allowed to look like "it's a good idea to do x in case y".
Perhaps I should instead have said: it'd be good to explain to people why this might be a useful/realistic target. Because if you need propositions that cover all the instincts, then it seems like you're basically asking for people to revive GOFAI. (I'm being unusually critical of your post because it seems that a number of safety research agendas lately have become very reliant on highly optimistic expectations about progress on interpretability, so I want to make sure that people are forced to defend that assumption rather than starting an information cascade.)
OK, the parenthetical helped me understand where you're coming from. I think a re-write of this post should (in part) make clear that I think a massive heroic effort would be necessary to make this happen, but sometimes massive heroic efforts work, and I have no special private info that makes it seem more plausible than it looks a priori.
Actually, hmm. My thoughts are not really in equilibrium here.
(Also: such a rewrite would be a combination of 'what I really meant' and 'what the comments made me realize I should have really meant')
I would say that bot knows what the trained AlphaZero-like model knows.
Also it certainly knows the rules of go and the win condition.
As an additional reason for the importance of tabooing "know", note that I disagree with all three of your claims about what the model "knows" in this comment and its parent. (The definition of "know" I'm using is something like "knowing X means possessing a mental model which corresponds fairly well to reality, from which X can be fairly easily extracted".)
In the parent, is your objection that the trained AlphaZero-like model plausibly knows nothing at all?
The trained AlphaZero model knows lots of things about Go, in a comparable way to how a dog knows lots of things about running. But the algorithm that gives rise to that model can know arbitrarily few things. (After all, the laws of physics gave rise to us, but they know nothing at all.)
Ah, understood. I think this is basically covered by talking about what the go bot knows at various points in time, a la this comment - it seems pretty sensible to me to talk about knowledge as a property of the actual computation rather than the algorithm as a whole. But from your response there it seems that you think that this sense isn't really well-defined.
I'm not sure what you mean by "actual computation rather than the algorithm as a whole". I thought that I was talking about the knowledge of the trained model which actually does the "computation" of which move to play, and you were talking about the knowledge of the algorithm as a whole (i.e. the trained model plus the optimising bot).
On that definition, how does one train an AlphaZero-like algorithm without knowing the rules of the game and win condition?
The human knows the rules and the win condition. The optimisation algorithm doesn't, for the same reason that evolution doesn't "know" what dying is: neither are the types of entities to which you should ascribe knowledge.
Suppose you have a computer program that gets two neural networks, simulates a game of go between them, determines the winner, and uses the outcome to modify the neural networks. It seems to me that this program has a model of the 'go world', i.e. a simulator, and from that model you can fairly easily extract the rules and winning condition. Do you think that this is a model but not a mental model, or that it's too exact to count as a model, or something else?
I'd say that this is too simple and programmatic to be usefully described as a mental model. The amount of structure encoded in the computer program you describe is very small, compared with the amount of structure encoded in the neural networks themselves. (I agree that you can have arbitrarily simple models of very simple phenomena, but those aren't the types of models I'm interested in here. I care about models which have some level of flexibility and generality, otherwise you can come up with dumb counterexamples like rocks "knowing" the laws of physics.) As another analogy: would you say that the quicksort algorithm "knows" how to sort lists? I wouldn't, because you can instead just say that the quicksort algorithm sorts lists, which conveys more information (because it avoids anthropomorphic implications). Similarly, the program you describe builds networks that are good at Go, and does so by making use of the rules of Go, but can't do the sort of additional processing with respect to those rules which would make me want to talk about its knowledge of Go.

To me a good definition for this is:

Get to a stage where you can write a computer program which can match the best AI at Go, where the program does no training (or equivalent) and you do no training (or equivalent) in the process of writing the software.

I.E. write a classical computer program that uses the techniques of the Neural Network based program to match it at Go.

2Rudi C3y
This is the most intuitive answer to me, as well. It’s also extremely difficult, and it‘s unclear how it is going to be useful for doing alignment generally.    Perhaps one idea is to train AI to write legible code, then use human code review on it. This seems as safe as our current mode of software development if the AI is not actively obfuscating (a big assumption).

I kind of do know everything the best go bot knows? For a given definition of "knows."

At the most simple: I know that the best move to make given a board is the one that leads to a victory board state, or, failing that, a board state with the best chance of leading to a victory board state. Which is all a go progam is doing.

Now, the program is able to evaluate those conditions to a much deeper search depth & breadth than I can, but that isn't a matter of knowledge, just ability-to-implement knowledge.

I wouldn't count the database of prior games as part of the go program, since I (or a different program) could also have access to that same database.

This is an interesting direction to explore but as is I don't have any idea what you mean by understand the go bot and I fear figuring that out would itself require answering more than you want to ask.

For instance, what if I just memorize the source code. I can slowly apply each step on paper and as the adversarial training process has no training data or human expert input if I know the rules of go I can, Chinese room style, fully replicate the best go bot using my knowledge given enough time.

But if that doesn't count and you don't just mean be better th... (read more)