Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I'll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:


(by Rob Wiblin)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)


 

Sequences

Agency: What it is and why it matters
AI Timelines
Takeoff and Takeover in the Past and Future

Wiki Contributions

Comments

Also, the US did consider the possibility of waging a preemptive nuclear war on the USSR to prevent it from getting nukes. (von Neumann advocated for this I think?) If the US was more of a warmonger, they might have done it, and then there would have been a more unambiguous world takeover.

Who is pushing for totalitarianism? I dispute that AI safety people are pushing for totalitarianism.

Reporting requirements, especially requirements to report to the public what your internal system capabilities are, so that it's impossible to have a secret AGI project. Also reporting requirements of the form "write a document explaining what capabilities, goals/values, constraints, etc. your AIs are supposed to have, and justifying those claims, and submit it to public scrutiny. So e.g. if your argument is 'we RLHF'd it to have those goals and constraints, and that probably works because there's No Evidence of deceptive alignment or other speculative failure modes' then at least the world can see that no, you don't have any better arguments than that.

That would be my minimal proposal. My maximal proposal would be something like "AGI research must be conducted in one place: the United Nations AGI Project, with a diverse group of nations able to see what's happening in the project and vote on each new major training run and have their own experts argue about the safety case etc."

There's a bunch of options in between.

I'd be quite happy with an AGI Pause if it happened, I just don't think it's going to happen, the corporations are too powerful. I also think that some of the other proposals are strictly better while also being more politically feasible. (They are more complicated and easily corrupted though, which to me is the appeal of calling for a pause. Harder to get regulatory-captured than something more nuanced.)

 

I think this post doesn't engage with the arguments for anti-x-risk AI regulation, to a degree that I am frustrated by. Surely you know that the people you are talking to -- the proponents of anti-x-risk AI regulation -- are well aware of the flaws of governments generally and the way in which regulation often stifles progress or results in regulatory capture that is actively harmful to whatever cause it was supposed to support. Surely you know that e.g. Yudkowsky is well aware of all this... right?

The core argument these regulation proponents are making and have been making is "Yeah, regulation often sucks or misfires, but we have to try, because worlds that don't regulate AGI are even more doomed."

What do you think will happen, if there is no regulation of AGI? Seriously how do you think it will go? Maybe you just don't think AGI will happen for another decade or so?

Thanks! This is exactly the sort of response I was hoping for. OK, I'm going to read it slowly and comment with my reactions as they happen:

Suppose a near future not-quite-AGI, for example something based on LLMs but with some extra planning and robotics capabilities like the things OpenAI might be working on, gains some degree of autonomy and plans to increase its capabilities/influence. Maybe it was given a vague instruction to benefit humanity/gain profit for the organization and instrumentally wants to expand itself, or maybe there are many instances of such AIs running by multiple groups because it's inefficient/unsafe otherwise, and at least one of them somehow decides to exist and expand for its own sake. It's still expensive enough to run (added features may significantly increase inference costs and latency compared to current LLMs) so it can't just replace all human skilled labor or even all day-to-day problem solving, but it can think reasonably well like non-expert humans and control many types of robots etc to perform routine work in many environments. This is not enough to take over the world because it isn't good enough at say scientific research to create better robots/hardware on its own, without cooperation from lots more people. Robots become more versatile and cheaper, and the organization/the AI decides that if they want to gain more power and influence, society at large needs to be pushed to integrate with robots more despite understandable suspicion from humans. 

While it isn't my mainline projection, I do think it's plausible that we'll get near-future-not-quite-AGI capable of quite a lot of stuff but not able to massively accelerate AI R&D. (My mainline projection is that AI R&D acceleration will happen around the same time the first systems have a serious shot at accumulating power autonomously) As for what autonomy it gains and how much -- perhaps it was leaked or open-sourced, and while many labs are using it in restricted ways and/or keeping it bottled up and/or just using even more advanced SOTA systems, this leaked system has been downloaded by enough people that quite a few groups/factions/nations/corporations around the world are using it and some are giving it a very long leash indeed. (I don't think robotics is particularly relevant fwiw, you could delete it from the story and it would make the story significantly more plausible (robots, being physical, will take longer to produce lots of. Like even if Tesla is unusally fast and Boston Dynamics explodes, we'll probably see less than 100k/yr production rate in 2026. Drones are produced by the millions but these proto-AGIs won't be able to fit on drones) and just as strategically relevant. Maybe they could be performing other kinds of valuable labor to fit your story, such as virtual PA stuff, call center work, cyber stuff for militaries and corporations, maybe virtual romantic companions... I guess they have to compete with the big labs though and that's gonna be hard? Maybe the story is that their niche is that they are 'uncensored' and willing to do ethically or legally dubious stuff?)

To do this, they may try to change social constructs such as jobs and income that don't mesh well into a largely robotic economy. Robots don't need the same maintenance as humans, so they don't need a lot of income for things like food/shelter etc to exist,  but they do a lot of routine work so full-time employment of humans are making less and less economic sense. They may cause some people to transition into a gig-based skilled labor system where people are only called on (often remotely) for creative or exceptional tasks or to provide ideas/data for a variety of problems. Since robotics might not be very advanced at this point, some physical tasks are still best done by humans, however it's easier than ever to work remotely or to simply ship experts to physical problems or vice versa because autonomous transportation lowers cost. AIs/robots still don't really own any property, but they can manage large amounts of property if say people store their goods in centralized AI warehouses for sale, and people would certainly want transparency and not just let them use these resources however they want. Even when they are autonomous and have some agency, what they want is not just more property/money but more capabilities to achieve goals, so they can better achieve whatever directive they happen to have (they probably still are unable to have original thoughts on the meaning or purpose of life at this point). To do this they need hardware, better technology/engineering, and cooperation from other agents through trade or whatever. 

Again I think robots are going to be hard to scale up quickly enough to make a significant difference to the world by 2027. But your story still works with nonrobotic stuff such as mentioned above. "Autonomous life of crime" is a threat model METR talks about I believe.

Violence by AI agents is unlikely, because individual robots probably don't have good enough hardware to be fully autonomous in solving problems, so one data center/instance of AI with a collective directive would control many robots and solve problems individual machines can't, or else a human can own and manage some robots, and neither a large AI/organization or a typical human who can live comfortably would want to risk their safety and reputation for relatively small gains through crime. Taking over territory is also unlikely, as even if robots can defeat many people in a fight, it's hard to keep it a secret indefinitely, and people are still better at cutting edge research and some kinds of labor. They may be able to capture/control individual humans (like obscure researchers who live alone) and force them to do the work,  but the tech they can get this way is probably insignificant compared to normal society-wide research progress. An exception would be if one agent/small group can hack some important infrastructure or weapon system for desperate/extremist purposes, but I hope humans should be more serious about cybersecurity at this point (lesser AIs should have been able to help audit existing systems, or at the very least, after the first such incident happens to a large facility, people managing critical systems would take formal verification and redundancy etc much more seriously).

Agree re violence and taking over territory in this scenario where AIs are still inferior to humans at R&D and it's not even 2027 yet. There just won't be that many robots in this scenario and they won't be that smart.

...as for "autonomous life of crime" stuff, I guess I expect that AIs smart enough to do that will also be smart enough to dramatically speed up AI R&D. So before there can be an escaped AI or an open-source AI or a non-leading-lab AI significantly changing the world's values (which is itself kinda unlikely IMO), there will be an intelligence explosion in a leading lab.

ta.


Wow, what interests me and surprises me about this graph is the big bump that peaks in the 50's but begins in the 40's. Baby Boom indeed! I had always thought that the Baby Boom was a return to trend, i.e. people had less kids during the war and then compensated by having more kids after. But it seems that they overcompensated and totally reversed the trend that had been ongoing for decades prior to the war! So now I'm confused about what was going on, and more hopeful that these trends might be reversible.

Ideas for text that could be turned into future albums:

  • Pain is not the unit of effort
  • Almost No One is Evil. Almost Everything is Broken.
  • Reversed Stupidity is Not Intelligence
  • Optimality is the Tiger; Agents are the Teeth
  • Taboo Outside View
  • If you let reality have the final word, you might not like the bottom line.
  • You can believe whatever you want.

I'd love to see a scenario by you btw! Your own equivalent of What 2026 Looks Like, or failing that the shorter scenarios here. You've clearly thought about this in a decent amount of detail.

Feature request: I'd like to be able to play the LW playlist (and future playlists!) from LW. I found it a better UI than Spotify and Youtube, partly because it didn't stop me from browsing around LW and partly because it had the lyrics on the bottom of the screen. So... maybe there could be a toggle in the settings to re-enable it?

I agree that it unfairly trivializes what's going on here. I am not too bothered by it but am happy to look for a better phrase. Maybe a more accurate phrase would be "not offending mainstream sensibilities?" Indeed a much more nuanced and difficult task than avoiding a list of words.

Load More