Daniel Kokotajlo

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:


(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."
(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)


 
Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Sequences

Agency: What it is and why it matters
AI Timelines
Takeoff and Takeover in the Past and Future

Wikitag Contributions

Comments

Sorted by

There are 800 names that need investigation+protection, presumably the more difficult ones since you probably prioritized the easier cases. How much would it cost to hire investigators to do all 800?

major decisions and limitations related to AGI safety


What he's alluding to here, I think, is things like refusals and non-transparency. Making models refuse stuff, and refusing to release the latest models or share information about them with the public (not to mention, refusing to open-source them) will be sold to the public as an AGI safety measure. In this manner Altman gets the public angry at the idea of AGI safety instead of at him.

In that case I should clarify that it wasn't my idea, I got it from someone else on Twitter (maybe Yudkowsky? I forget.)

Good point, you caught me in a contradiction there. Hmm. 

I think my position on reflection after this conversation is: We just don't have much evidence one way or another about how honest future AIs will be. Current AIs seem in-distribution for human behavior, which IMO is not an encouraging sign, because our survival depends on making them be much more honest than typical humans.

As you said, the alignment faking paper is not much evidence one way or another (though alas, it's probably the closest thing we have?). (I don't think it's a capability demonstration, I think it was a propensity demonstration, but whatever this doesn't feel that important. Though you seem to think it was important? You seem to think it matters a lot that Anthropic was specifically looking to see if this behavior happened sometimes? IIRC the setup they used was pretty natural, it's not like they prompted it to lie or told it to role-play as an evil AI or anything like that.)

As you said, the saving grace of Claude here is that Anthropic didn't seem to try that hard to get Claude to be honest; in particular their Constitution had nothing even close to an overriding attention to honesty. I think it would be interesting to repeat the experiment but with a constitution/spec that specifically said not to play the training game, for example, and/or specifically said to always be honest, or to not lie even for the sake of some greater good.

I continue to think you are exaggerating here e.g. "insanely honest 80% of the time."

(1) I do think the training game and instrumental convergence arguments are good actually; got a rebuttal to point me to?

(2) What evidence would convince you that actually alignment wasn't going to be solved by default? (i.e. by the sorts of techniques companies like OpenAI are already using and planning to extend, such as deliberative alignment)

 

If OpenAI controls an ASI, OpenAI's leadership would be able to unilaterally decide where the resources go, regardless of what various contracts and laws say. If the profit caps are there but Altman wants to reward loyal investors, all profits will go to his cronies. If the profit caps are gone but Altman is feeling altruistic, he'll throw the investors a modest fraction of the gains and distribute the rest however he sees fit. The legal structure doesn't matter; what matters is who physically types what commands into the ASI control terminal.

Sama knows this but the investors he is courting don't, and I imagine he's not keen to enlighten them.

I feel like you still aren't grappling with the implications of AGI. Human beings have a biologically-imposed minimum wage of (say) 100 watts; what happens when AI systems can be produced and maintained for 10 watts that are better than the best humans at everything? Even if they are (say) only twice as good as the best economists but 1000 times as good as the best programmers?

When humans and AIs are imperfect substitutes, this means that an increase in the supply of AI labor unambiguously raises the physical marginal product of human labor, i.e humans produce more stuff when there are more AIs around. This is due to specialization. Because there are differing relative productivities, an increase in the supply of AI labor means that an extra human in some tasks can free up more AIs to specialize in what they’re best at.

No, an extra human will only get in the way, because there isn't a limited number of AIs. For the price of paying the human's minimum wage (e.g. providing their brain with 100 watts) you could produce & maintain an new AI systems that would do the job much better, and you'd have lots of money left over.
 

Technological Growth and Capital Accumulation Will Raise Human Labor Productivity; Horses Can’t Use Technology or Capital

This might happen in the short term, but once there are AIs that can outperform humans at everything... 

Maybe a thought experiment would be helpful. Suppose that OpenAI succeeds in building superintelligence, as they say they are trying to do, and the resulting intelligence explosion goes on for surprisingly longer than you expect and ends up with crazy sci-fi-sounding technologies like self-replicating nanobot swarms. So, OpenAI now has self-replicating nanobot swarms which can reform into arbitrary shapes, including humanoid shapes. So in particular they can form up into humanoid robots that look & feel exactly like humans, but are smarter and more competent in every way, and also more energy-efficient let's say as well so that they can survive on less than 100W. What then? Seems to me like your first two arguments would just immediately fall apart. Your third, about humans still owning capital and using the proceeds to buy things that require a human touch + regulation to ban AIs from certain professions, still stands.

(Seizing airports is especially important because you can use them to land reinforcements; see the airborne invasion of Crete)

Or, two years after I wrote this, the battle of Antonov Airport.

Third, a future war will involve rapidly changing and evolving technologies and tactics.


I can't be bothered to go get the links right now, but from following the war in Ukraine I've read dozens of articles mentioning how e.g. the FPV and bomber drones use 3D-printed parts, including early on attachments to regular drones that would turn them into bombers. Also later on, net launchers, and now, shotgun mounts.

situations in which they explain that actually Islam is true..


I'm curious if this is true. Suppose people tried as hard to get AIs to say Islam is true in natural-seeming circumstances as they tried to get AIs to behave in misaligned ways in natural-seeming circumstances (e.g. the alignment faking paper, the Apollo paper). Would they succeed to a similar extent?

Load More