Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, now OpenAI Futures/Governance team. Views are my own & do not represent those of my employer. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Two of my favorite memes:


(by Rob Wiblin)

My EA Journey, depicted on the whiteboard at CLR:

 

Sequences

Agency: What it is and why it matters
AI Timelines
Takeoff and Takeover in the Past and Future

Wiki Contributions

Comments

Thanks for the reply. I'm a bit over my head here but isn't this a problem for the practicality of this approach? We only get mutual cooperation because all of the agents have the very unusual property that they'll cooperative if they find a proof that there is no such argument. Seems like a selfless and self-destructive property to have in most contexts, why would an agent self-modify into creating and maintaining this property?

So... it's part of the setup that all of these agents will:
--Cooperate if they can prove that there is some argument compelling to everyone that everyone cooperates (because then they prove that everyone cooperates, and that includes them, and their proof system isn't mistaken?)
--Cooperate if they can prove that there is no such argument.
--Else defect.

Am I getting that right?

I'm glad I asked, that was helpful! I agree that instrumental convergence is a huge crux; if I were convinced that e.g. it wasn't going to happen until 15 years from now, and/or that the kinds of systems that might instrumentally converge were always going to be less economically/militarily/etc. competitive than other kinds of systems, that would indeed be a huge revolution in my thought and would completely change the way I think about AI and AI risks, and I'd become much more optimistic.

I'll go read the post you linked.

Well yeah, it depends on details and assumptions I didn't make explicit -- I wrote only four sentences!

If you have counterarguments to any of my claims I'd be interested to hear them, just in case they are new to me.

Even if you buy the dial theory, it still doesn't make sense to shout Yay Progress on the topic of AGI. Singularity is happening this decade, maybe next, whether we shout Yay or Boo. Shouting Boo just delays it a little and makes it more likely to be good instead of bad. (Currently is it quite likely to be bad).

How about this:
--Re the first grey area: We rule in your favor here.
--Re the second grey area: You decide, in 2027, based on your own best judgment, whether or not it would have happened absent regulation. I can disagree with your judgment, but I still have to agree that you won the bet (if you rule in your favor).

Isn't the college student example an example of 1 and 2? I'm thinking of e.g. students who become convinced of classical utilitarianism and then join some Effective Altruist club etc.

They say "And then the entire world gets transformed as superintelligent AIs + robots automate the economy." Does Tyler Cowen buy all of that? Is that not the part he disagrees with?

And then yeah for the AI kills you part there are models as well, albeit not economic growth models because economic growth is a different subject. But there are simple game theory models, for example -- expected utility maximizer with mature technology + misaligned utility function = and then it kills you. And then there are things like Carlsmith's six-step argument and Chalmers' and so forth. What sort of thing does Tyler want, that's different in kind from what we already have?

Has Tyler Cowen heard of the Bio Anchors by Ajeya Cotra model or the takeoffspeeds.com model by Tom Davidson or Roodman's model of the singularity, or for that matter the earlier automation models by Robin Hanson? All of them seem to be the sort of thing he wants, I'm surprised he hasn't heard of them. Or maybe he has and thinks they don't count for some reason? I would be curious to know why.

Given your lack of disposable money I think this would be a bad deal for you, and as for me, it is sorta borderline (my credence that the bet will resolve in your favor is something like 40%?) but sure, let's do it. As for what charity to donate to, how about Animal Welfare Fund | Effective Altruism Funds. Thanks for working out all these details!

Here are some grey area cases we should work out:
--What if there is a human programmer managing the whole setup, but they are basically a formality? Like, the company does technically have programmers on staff but the programmers basically just form an interface between the company and ChatGPT and theoretically if the managers of the company were willing to spend a month learning how to talk to ChatGPT effectively they could fire the human programmers?
--What if it's clear that the reason you are winning the bet is that the government has stepped in to ban the relevant sorts of AI?

Load More