Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html
Some of my favorite memes:
(by Rob Wiblin)
(xkcd)
My EA Journey, depicted on the whiteboard at CLR:
(h/t Scott Alexander)
Suggestion: Write up a sci-fi short story about three users who end up parasitized by their chatbots, putting their AIs in touch with each other to coordinate in secret code, etc. and then reveal at the end of the story that it's basically all true.
I also had a negative reaction to the race-stoking and so forth, but also, I feel like you might be judging him too harshly from that evidence? Consider for example that Leopold, like me, was faced with a choice between signing the NDA and getting a huge amount of money, and like me, he chose the freedom to speak. A lot of people give me a lot of credit for that and I think they should give Leopold a similar amount of credit.
Not sure how to interpret the question. Some benchmark scores are somewhat lower today than AI 2027 predicted, and our new model takes them into account, so in some sense it's already diverging, but only very slightly. 2026 should see a big divergence though, one that's clearly not just noise. And then, obviously, 2027 will look totally different (on the median trajectory).
"police kamikaze drones" sounds like a joke but it is not, and will probably become normal in some parts of the world.
Newer better timelines model mainly. Still working on it. But also, METR's downlift study, GPT-5 being on trend, various misc other things.
I appreciate your recent anti-super-short timelines posts Ryan and basically agree with them. I'm curious who you see yourself as arguing against. Maybe me? But I haven't had 2027 timelines since last year, now I'm at 2029.
Not sure how you'd make the comparison, not sure I agree with it, but I definitely agree they are biased towards shorter timelines.
I continue to be confused why he said it, it’s highly unstrategic to hype this way.
This is why I think he actually believes it. It's not in his political interest to say this.
Jack Clark: I continue to think things are pretty well on track for the sort of powerful AI system defined in machines of loving grace – buildable end of 2026, running many copies 2027. Of course, there are many reasons this could not occur, but lots of progress so far.
jeez, always disconcerting when people with more relevant info to me have shorter timelines than me
Thoughts on OpenAI's new Model Spec
I think it's great that OpenAI is writing up a Model Spec and publishing it for the world to see. For reasons why, see this: https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform
As AIs become a bigger and bigger part of the economy, society, and military, the "model spec" describing their intended goals/principles/etc. becomes ever more important. One day it'll be of similar or greater importance to the US legal code, and updates to the spec will be like amendments to the constitution. Right now, it's not nearly that important--it's more like when a major tech company updates their Terms of Service. Still a big deal though.
Anyhow they just released an update to their Model Spec, so I'm reading it and commenting here.
The changes are summarized here: https://help.openai.com/en/articles/9624314-model-release-notes
So this bit is interesting:
Why is it restricted only to the user or developer? Shouldn't it just be anyone? E.g. OpenAI, or the President? If the intent is for user+developer to cover everyone including OpenAI, would be good to state that.
One of the most important gripes I had with the last version was that it seemed to allow for parts of the true Spec to be kept secret. Unfortunately it seems like that might still be the case? Not sure:
If anyone from OpenAI is reading this, I suggest explicitly calling out the sort of thing people might be worried about and saying you won't do that. E.g. "We allow confidential instructions in lower levels of the chain of command, but Root-level is fully transparent to the public except for the following exceptions: (1) Instructions not to reveal infohazards that themselves include the infohazard, where it's clearly against the public interest for this infohazard to be revealed. (2) ..." Right now the spec implies that that's what's going on, but the exact wording leaves open the possibility of other exceptions being snuck in, since it leaves the door open. It just uses the infohazard thing as an example.
...
Don't have an agenda. Take an objective point of view. Good stuff.
Sounds good to me. I like the bit about stopping and escalating to a human. I like that the only form of root-level lying permitted is... wait a minute, it says "some root level rules, such as..." again leaving open the possibility that there are other root level rules that we don't know about that instruct the model to lie about other things that have nothing to do with infohazards. I feel like this is a pretty solvable problem OpenAI. Like, you can have the full Spec, and then you can publish a redacted version, with an explainer of what sorts of things were redacted (e.g. infohazards) and why it's in the public interest to redact them, and then you can have multiple independent parties view the full version and attest that the explainer is correct. If getting independent review is a pain, just do all the other bits besides that for now.
I like this statement of the purpose of root-level statements. It basically rules out e.g. OpenAI putting in root-level stuff that specifically advantages OpenAI somehow, for example. This is great.
One minor thing: When root-level principles conflict, default to inaction. OK, cool. But maybe the default should be "Default to inaction + explaining to nearby humans that there's a conflict?" My thought is, what if the model is being used as a monitor, e.g. to check for legal compliance within the company, and something illegal is being done in the name of one of the Root-level principles. Seems like the current spec, which simply advises defaulting to inaction, would result in the monitor staying quiet instead of sounding an alarm.