Daniel Kokotajlo

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I'll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:


(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."
(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)


 
Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Sequences

Agency: What it is and why it matters
AI Timelines
Takeoff and Takeover in the Past and Future

Wiki Contributions

Comments

Sorted by

(TBC I expect said better experiments to find nothing super scary, because I think current models are probably pretty nice especially in obvious situations. I'm more worried about future models in scarier situations during takeoff.)

That's my understanding too. I hope they get access to do better experiments with less hand-holdy prompts.

That's better, but the problem remains that I value pre-AGI money much more than I value post-AGI money, and you are offering to give me post-AGI money in exchange for my pre-AGI money (in expectation).

You could instead pay me $10k now, with the understanding that I'll pay you $20k later in 2028 unless AGI has been achieved in which case I keep the money... but then why would I do that when I could just take out a loan for $10k at low interest rate?

I have in fact made several bets like this, totalling around $1k, with 2030 and 2027 as the due date iirc. I imagine people will come to collect from me when the time comes, if AGI hasn't happened yet.

But it wasn't rational for me to do that, I was just doing it to prove my seriousness.

I'm not sure I understand. You and I, as far as I know, have the same beliefs about world energy consumption in 2027, at least on our median timelines. I think it could be higher, but only if AGI timelines are a lot shorter than I think and takeoff is a lot faster than I think. And in those worlds we probably won't be around to resolve the bet in 2027, nor would I care much about winning that bet anyway. (Money post-singularity will be much less valuable to me than money before the singularity)

To be clear, my view is that we'll achieve AGI around 2027, ASI within a year of that, and then some sort of crazy robot-powered self-replicating economy within, say, three years of that. So 1000x energy consumption around then or shortly thereafter (depends on the doubling time of the crazy superintelligence-designed-and-managed robot economy).

So, the assumption of constant growth from 2023 to 2031 is very false, at least as a representation of my view. I think my median prediction for energy consumption in 2027 is the same as yours.

 

One thing I'd really like labs to do is encourage their researchers to blog about their thoughts on the future, on alignment plans, etc.

Another related but distinct thing is have safety cases and have an anytime alignment plan and publish redacted versions of them.

Safety cases: Argument for why the current AI system isn't going to cause a catastrophe. (Right now, this is very easy to do: 'it's too dumb')

Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.

Why would the shift be bad? More politics, more fakery, less honest truth-seeking? Yeah that seems bad. There are benefits too though (e.g. makes people less afraid to link to LW articles). Not sure how it all shakes out.

 

Yep. Other important people (in government, in AGI research groups) do too.

Maybe instead of focusing on a number (10x vs. 1.1x) the focus should be on other factors, like "How large and diverse is the group of non-CoI'd people who thought carefully about this decision?" and "How much is it consensus among that group that this is better for humanity, vs. controversial?"

In the case where e.g. the situation and safety cases have been made public, and e.g. the public is aware that the US AGI project is currently stalled due to not having a solution for deceptive alignment that we know will work, but meanwhile China is proceeding because they just don't think deceptive alignment is a thing at all, and moreover the academic ML community not just in the USA but around the world has looked at the safety case and the literature and model organisms etc. and generally is like "yeah probably deceptive alignment won't be an issue so long as we do XY and Z, but we can't rule it out even then" and the tiny minority that thinks otherwise seems pretty unreasonable, then I'd feel pretty happy with the decision to proceed with AGI capabilities advancements in the USA subject to doing XY and Z. (Though even then I'd also be like: Let's at least try to come to some sort of deal with China)

Whereas if e.g. the safety case and situation hasn't been made public, and the only technical alignment experts who've thought deeply about the situation and safety case are (a) corporate employees and (b) ~10 picked advisors with security clearances brought in by the government... OR if there's still tons of controversy with large serious factions saying "XY and Z are not enough; deceptive alignment is a likely outcome even so"... then if we proceed anyway I'd be thinking 'are we the baddies?'

Load More