All of Rekrul's Comments + Replies

I think this is a simple but evocative post that does a solid job of illustrating why one should be worried about near-term transformative AGI. I think some of the conclusions aren't entirely accurate and more thorough analysis would paint a less dire (but still dire!) picture, but I still really like this post, and I think it's probably very effective at getting people into the "transformative AGI is close" camp.

Part of my frustration of you guys is that I'm a layperson. If you were to post your proof of alignment impossibility I probably wouldn't have the relevant skillset to understand it. So the only thing I can do is rely on others whose expertise I trust to analyze it for me.

Paul is one of those people. He posted multiple times he would engage with a proof in the previous thread, and I was disappointed he was not replied with one. I disagree with your assertion he wouldn't handle it faithfully based off all of the past writing of his I have read. One thing Pa... (read more)

Really appreciate you sharing your honest thoughts her, Rekrul. From my side, I’d value actually discussing the reasoning forms and steps we already started to outline on the forum. For example, the relevance of intrinsic vs extrinsic selection and correction, or the relevance of the organic vs. artificial substrate distinction. These distinctions are something I would love to openly chat about with you (not the formal reasoning – I’m the bridge-builder, Forrest is the theorist). That might feel unsatisfactory – in the sense of “why don’t you just give us the proof now?” As far as I can tell (Forrest can correct me later), there are at least two key reasons: 1. There is a tendency amongst AI Safety researchers to want to cut to the chase to judging the believability of the conclusion itself. For example, notice that I tried to clarify several argument parts in comment exchanges with Paul, with little or no response. People tend to believe that this would be the same as judging a maths proof over idealised deterministic and countable spaces. Yet formal reasoning here would have have to reference and build up premises from physical theory in indeterministic settings. So we actually need to clarify how a different form of formal reasoning is required here, that does not look like what would be required for P=NP. Patience is needed on the side of our interlocutors. 2. While Forrest does have most of the argument parts formalised, his use of precise analytical language and premises are not going to be clear to you. Mathematicians are not the only people who use formal language and reasoning steps to prove impossibilities by contradiction. Some analytical philosophers do too (as do formal verification researchers in industrial software engineering using different notation for logic transformation, etc.). No amount of “just give the proof to us and leave it to us to judge” lends us confidence that the j

Being blunt here, you guys triggered a bunch of crank alarms and someone called you and your partner a crank. They called the quacking animal a duck.

The response to this is to let your work do the talking for you, to simply present the proof of your "outlandish" theory and nothing else. If you have a proof that P=NP or a slam-dunk theory of everything, then that should be able to stand on it's own. You can't complain about people engaging with things other than your idea if the only thing they can even engage with is your idea.

Instead you have decided to m... (read more)

The tricky thing here is that a few people are reacting by misinterpreting the basic form of the formal reasoning at the onset, and judging the merit of the work by their subjective social heuristics. Which does not lend me (nor Forrest) confidence that those people would do a careful job at checking the term definitions and reasoning steps – particularly if written in precise analytic language that is unlike the mathematical notation they’re used to. The filter goes both ways. Actually, this post was written in 2015 and I planned last week to reformat it and post it. Rereading it, I’m just surprised how well it appears to line up with the reactions.

If there is an easy way of fixing it sure, but I wouldn't devote more than a small amount of mental effort towards thinking up solutions, unless you don't see anywhere else you can plausibly help with alignment. Again, It's just a weird worst-case scenario that could only happen by a combination of incredible incompetence and astronomically bad luck, it's not even close to the default failure scenario.

I get you have anxiety and distress about this idea, but I don't think hyper-focusing on it will help you. Nobody else is going to hyper-focus on it, they'll... (read more)

We'll just have to agree to disagree here then. I just don't find this particular worst-case-scenario likely enough to worry about. Any AGI with a robust way of dealing with bugs will deal with this, and any AGI without that will far more likely just break or paperclip us.

Would you not agree that (assuming there's an easy way of doing it), separating the system from hyperexistential risk is a good thing for psychological reasons? Even if you think it's extremely unlikely, I'm not at all comfortable with the thought that our seed AI could screw up & design a successor that implements the opposite of our values; and I suspect there are at least some others who share that anxiety. For the record, I think that this is also a risk worth worrying about for non-psychological reasons.

You seem to have a somewhat general argument against any solution that involves adding onto the utility function in "What if that added solution was bugged instead?". While maybe this can be resolved, I think it's better to move on from trying to directly target the sign-flip problem and instead deal with bugs/accidents in general. After all, the sign-flip is just the very unlikely worse case scenario version of that, and any solution to dealing with bugs/accidents in an AGI will also deal with it.

I might've failed to make my argument clear: if we designed the utility function as U = V + W (where W is the thing being added on and V refers to human values), this would only stop the sign flipping error if it was U that got flipped. If it were instead V that got flipped (so the AI optimises for U = -V + W), that'd be problematic. I disagree here. Obviously we'd want to mitigate both, but a robust way of preventing sign-flipping type errors specifically is absolutely crucial (if anything, so people stop worrying about it.) It's much easier to prevent one specific bug from having an effect than trying to deal with all bugs in general.

Another small silver lining is we don't have to worry about making sure our alignment tools and processes can generalize, just that they scale. So they can be as tailor made to GPT as we want. I don't think this buys us much, as making an effective scalable tool in the first place seems like the much harder part than generalizing it.

Agreed, GPT is very alien under the hood even though it's mimicking us, and that poses some problems. I'm curious however, just how good it's mimicry of us is/going to be, more specifically it's mo... (read more)

You've given me a lot to think about (and may have even lowered my confidence in some of my assertions). Kudos!

I do still have some thoughts to give in response though, but they don't really function as very in-depth responses to your points, as I'm still in the process of ruminating:

  • I agree with you that GPT-3 probably hasn't memorized the prompts given in the OP, it's too rare for that to be worth it. I just think it's so big and has access to so much data it really doesn't need to solve prompts like that. Take the Navy Seal Copypasta prompts Gwern di

... (read more)

I recognize the points you are making, and I agree, I don't want to be a person who sets an unfeasibly high bar, but with how GPT-3 was developed it's really difficult to put one that isn't near that height. If GPT-3 was instead made with mostly algorithmic advances instead of mostly scaling, I'd be a lot more comfortable placing said bar and a lot less skeptical, but it wasn't, and the sheer size of all this is in a sense intimidating.

The source of a lot of my skepticism is GPT-3's inherent inconsistency. It can range wildly from it's high-quality ouput t... (read more)

I think you were pretty clear on your thoughts, actually. So, the easy / low-level way response to some of your skeptical thoughts would be technical details and I'm going to do that and then follow it with a higher-level, more conceptual response.

The source of a lot of my skepticism is GPT-3's inherent inconsistency. It can range wildly from it's high-quality ouput to gibberish, repetition, regurgitation etc. If it did have some reasoning process, I wouldn't expect such inconsistency. Even when it is performing so well people call it &
... (read more)
4Daniel Kokotajlo3y
One thing I find impressive about GPT-3 is that it's not even trying to generate text. Imagine that someone gave you a snippet of random internet text, and told you to predict the next word. You give a probability distribution over possible next words. The end. Then, your twin brother gets a snippet of random internet text, and is told to predict the next word. Etc. Unbeknownst to either of you, the text your brother gets is the text you got, with a new word added to it according to the probability distribution you predicted. Then we repeat with your triplet brother, then your quadruplet brother, and so on. Is it any wonder that sometimes the result doesn't make sense? All it takes for the chain of words to get derailed is for one unlucky word to be drawn from someone's distribution of next-word prediction. GPT-3 doesn't have the ability to "undo" words it has written; it can't even tell its future self what its past self had in mind when it "wrote" a word!

In a very loosely similar sense (though not at all accurate architectural sense) to how AlphaGo knows which moves are relevant for playing Go. I wouldn't say it was reasoning. It was just recognizing and predicting.

To give an example: If I were to ask various levels of GPT (perhaps just 2 and 3, as I'm not very familiar with the capabilities of the first version off the top of my head) "What color is a bloody apple" It would have a list of facts in it's "head" about the words "bloody" and "apple", like one can be red or green, one is depicted as various sh... (read more)

Great, but the terms you're operating with here are kind of vague. What problems could you give to GPT-3 that would tell you whether it was reasoning, versus "recognising and predicting", passive "pattern-matching" or a presenting "illusion of reasoning"? This was a position I subscribed to until recently, when I realised that every time I saw GPT-3 perform a reasoning-related task, I automatically went "oh, but that's not real reasoning, it could do that just by pattern-matching", and when I saw it do some... (read more)

GPT-3 was trained on an astronomical amount of data from the internet, and asking weird hypotheticals is one of the internet's favorite pastimes. I would find it surprising if it was trained on no data resembling your prompts.

There's also the fact that it's representations are staggeringly complex. It knows an utterly absurd amount of facts "Off the top of it's head", including the mentioned facts about muzzle velocity, gravity, etc., and it's recognition abilities are great enough to recognize which of the facts it knows are the relevant ones based on the... (read more)

So it has these facts. How does it know which facts are relevant? How does it "get to" the right answer from these base facts? Classically, these are both hard problems for GOFAI reasoners.

Yeah, this sampling stuff brings up arguments about "curating" or "If you rephrase the same question and get a different answer then there is no reasoning/understanding here" which I'm sympathetic to.

I also think categorizing GPT-3's evasiveness, tendency to take serious prompts as joke prompts, etc. as solely the fault of the human is unfair. GPT-3 also shares the blame for failing to interpret the prompt correctly. This is hard task obviously, but that just means we have further to go, despite the machine's impressiveness already.

About the first paragraph: There is infinite amount of wrong answers to "What is six plus eight", only one is correct. If GPT-3 answers it correctly in 3 or 10 tries, that means it *has* some understanding/knowledge. Through that's moderated by numbers being very small - if it also replies with small numbers it has non-negligible chance of being correct solely by chance. But it's better than that. And more complex questions, like these in the interview above are even more convincing, through the same line of reasoning. There might be (exact numbers pulled out of the air, they're just for illustrative purposes), out of all sensible-English completions (so no "weoi123@!#*), 0.01% correct ones, 0.09% partially correct and 99% complete nonsense / off-topic etc. Returning to arithmetic itself, for me GPT seems intent on providing off-by-one answers for some reason. Or even less wrong [heh]. When I was playing with Gwern's prefix-confidence-rating prompt, I got this: Q: What is half the result of the number 102? A: [remote] 50.5 About confidence-rating prefixes, neat thing might be to experiment with "requesting" high (or low) confidence answer by making these tags part of the prompt. It worked when I tried it (for example, if it kept answering it doesn't know the answer, I eventually tried to write question + "A: [highly likely] " - and it answered sensibly! But I didn't play all that much so it might've been a fluke. Here's more [ ] if anyone's interested.
Yeah. The way I'm thinking about it is: to discuss these questions we have to get clear on what we mean by "knowledge" in the context of GPT. In some sense Gwern is right; in a different sense, you're right. But no one has offered a clearer definition of "knowledge" to attempt to arbitrate these questions yet (afaik, that is).
Answer by RekrulJul 20, 2020Ω03

I still haven't been convinced GPT-3 is capable of reasoning, but I'm also starting to wonder if it's even that important. Roughly, all GPT-3 does is examine text, try to find a pattern, and continue it. But it is so massive, and trained on so much data that the patterns it can "see" and connections it can make are far more expansive than we'd expect. What this means, is while it doesn't try to comprehend any logical questions and then apply some kind of reasoning to answer it, it's ability to see patterns combined with it's staggeringly huge amount of dat... (read more)

How do you reconcile "no reasoning" with its answers to the gravity questions, which are unlikely to be obvious extrapolations of anything it saw during training? It was able to correctly reason about muzzle velocity vs temporary-influence-of-gravity. I don't see how that can be explained away as purely "pattern-matching".  Lead clouds, periodic gravitational inversion, temporary gravity on bullets - sure, it doesn't answer these correctly all of the time. I think it's remarkable that it can answer them correctly at all. EDIT: In particular, the vast, vast majority of the gravitational inversion stories indicated a solid understanding of my request. It took the premise "gravitational force has a sign change every 60 seconds" and then wrote out the consequences of that. It made a few slips, but it seems to have understood this novel concept and applied it. 

Yeah, that part was very impressive. Personally, I'm not sure if I get why it requires reasoning, it seems more like just very advanced and incredible recognition and mimicry abilities to me. But I'm just a casual observer of this stuff, I could easily be not getting something. Hopefully, since we can all play around with GPT-3, people continue to push it to it's limits, and we can get an accurate picture of what's really going on under the hood, if it's really developed some form of reasoning.

I'm not sure how exactly reasoning should be defined and whether that part really requires reasoning or not. But if it's just very advanced and incredible recognition and mimicry abilities, it still shifts my impression of what can be achieved using just advanced and incredible recognition and mimicry abilities. I would previously have assumed that you need something like reasoning for it, but if you don't, then maybe the capacity for reasoning is slightly less important than I had thought.

I'd love to know your reasoning here. I've been very impressed with GPT-3, but not to the extent I'd majorly update my timelines.

My largest update came from the bit where it figured out that it was expected to produce Harry Potter parodies in different styles. Previously GPT had felt cool, but basically a very advanced version of a Markov chain. But the HP thing felt like it would have required some kind of reasoning.
My own take (not meant to be strong evidence of anything, mostly just kinda documenting my internal updating experience) I had already updated towards fairly shortish (like, maybe 20% chance of AGI in 20 years?). I initially had a surge of AAAAUGH maybe the end times around now right around the corner with GPT-3, but I think that was mostly unwarranted. (At least, GPT-3 didn't seem like new information, it seemed roughly what I'd have expected GPT-3 to be like, and insofar as I'm updating shorter it seems like that means I just made a mistake last year when first evaluating GPT-2) I'm also interested in more of Kaj's thoughts.

That's a fair point, but I don't think you need to have a livestreamed event to gain access to professional Starcraft players beyond consultation. I'm sure many would be willing to be flown to DeepMind HQ and test AlphaStar in private.

Well, apparently that's exactly what happened with TLO and MaNa, and then the DeepMind guys were (at least going by their own accounts) excited about the progress they were making and wanted to share it, since being able to beat human pros at all felt like a major achievement. Like they could just have tested it in private and continued working on it in secret, but why not give a cool event to the community while also letting them know what the current state of the art is. E.g. some comments from here []: And the top comment from that thread:

The "it" I was referring to were these showmatches, I worded that poorly my bad. I just don't see the point in having these uneven premature showmatches, instead of doing the extra work (which might be a lot) and then having fair showmatches. All the current ones have brought is confusion, and people distrusting Deepmind (due to misleading comments/graphs), and whenever they do finally get it right, it won't get the same buzz as before, because the media and general public will think it's old news that's already been done. Having them now instead of later just seems like a bad idea all around.

As noted in e.g. the conversation between Wei Dai and me elsewhere in this thread, it's quite plausible that people thought beforehand that the current APM limits were fair (DeepMind apparently consulted pro players on them). Maybe AlphaStar needed to actually play a game against a human pro before it became obvious that it could be so overwhelmingly powerful with the current limits.

There were definitely some complaints around OAI5's capabilities. Besides criticism over it's superhuman reaction speed, the restrictions placed upon the game to allow the AI to learn it essentially formed it's own weird meta the humans were unfamiliar with, so the AI was using tactics built for an entirely different game than the humans.

Honestly, I haven't been very impressed with either AI through these showmatches, because it's so hard to tell what their "intelligence" level is, as it's influenced heavily by their unfair capabilities. They need to both

... (read more)

I'm a relative layperson, so I honestly don't know. Maybe no new tricks are needed. But if that's the case, why not just do it and not have all this confusion flying about?

These big uneven showmatches do a very poor job of highlighting the state of the art in Ai tactics, as the tactics the AI use seem to be heavily influenced by it's "unfair" capabilities. I can't really tell if these agents are generally smart enough to play at the full game evenly but use unfair strategies because they're allowed to, or if they're dumber agents who couldn't play the full game evenly so they're propped up by their unfair capabilities and earn victories by exploiting them.

1) Their goal was a really good bot - and that hadn't been done before (apparently). To implement handicaps to begin with would have been... very optimistic. 2) They don't know what will work for sure until they try it. 3) Expense. (Training takes time and money.)

After both Dota and this, I'm starting to come to the conclusion that video games are a poorer testbed for examining the "thinking" capabilities of modern AI. Unless a substantial amount of effort is put in to restrict the AI to being roughly equal "physically" to humans, it becomes very difficult to determine how much of the AI's victories were due to "thinking" or due to being so "physically" superior it pulls off moves humans are simply incapable of doing even if they thought of them (or having other advantages like being able to see all of the visible

... (read more)
Could we train AlphaZero on all games it could play at once, then find the rule set its learning curve looks worst on?
At 277, AlphaStar APM was, on average, lower than both MaNa and TLO's.
How long do handicaps take to overcome, though? I find it hard to imagine that the difference between eg 500 APM average or 500 APM hard ceiling requires a whole new insight for the agent to be “clever” enough to win anyways - maybe just more training.

Even if something like an electron has some weird degraded form of consciousness, I don't see why I should worry about it suffering. It doesn't feel physical pain or emotional pain, because it lacks physical processes for both of them, and any talk of it having "goals" and it failing to reach them means it suffers just reeks of anthropomorphism. I just don't buy it.

I agree that researchers can take shortcuts and develop tricks, but I don't see how that shortens it to something as incredibly short as 1 year, especially since we will be starting with parts that are far worse than their equivalent in the human brain.

"We could assume—by analogy with human brain training in childhood—that to train one model of human mind, at least 1 year of training time is needed (if the computer is running on the same speed as human mind)."

Could you clarify here? I'm no expert, but I'm pretty sure human brains in childhood take a lot longer than a year to learn everything they need to survive and thrive in the real world. And they have a lot more going for them than anything we'll build for the foreseeable future (better learning algorithm, better architecture built by evolution, etc.)

I think that the experimenters will find the ways to compress the training process, may be by skipping part of the dream-less dreaming and the periods of passivity, as well as they will use some algorithmic tricks.