Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html
Some of my favorite memes:
(by Rob Wiblin)
(xkcd)
My EA Journey, depicted on the whiteboard at CLR:
(h/t Scott Alexander)
Thanks for playing & writing up your reflections!
I think China wasn't as aggressive / bold in our game as I think they could have been; I agree that the situation for them is pretty rough but I'd like to try again someday and see if they can pull off a win, by more aggressively angling for a deal early on.
Hmm, but can't the megacorporations involved in the H200 transaction also bribe Customs? Won't your $500 WeChat bribe to the mid-level bureaucrat be cancelled and overwhelmed by the many $5,000 WeChat bribes flying at them by the corporations eagerly awaiting their shipment of H200s?
Yeah that EA-prevalence assumption also caused me to doubt that the author actually worked at an AI company, it was very dissonant with my experience at least.
Strong-upvoted.
Nit: I don't think it's that ambiguous. I think that in worlds where alignment is solved by an AI company, the epistemic culture of the AI company that solves it would look markedly better than this story depicts. Moreover, I think this is still true (though less true) in worlds where alignment turns out to be surprisingly easy.
I understand you don't like benchmark-based methodology. I think you should still answer my question, because if you did have a better benchmark, it would be valuable to me, and I asked nicely. ;) But it's OK now I think it's clear you don't.
Thank you for explaining your model more. I disagree with some bits:
In my model gated by "breakthroughs", accelerating incremental algorithmic improvements or surrounding engineering doesn't particularly help, because it merely picks the low-hanging incremental algorithmic fruit faster (which is in limited supply for a given level of compute and with given methods, but currently takes human researchers years to pick). At the point where even "breakthroughs" are accelerated a lot, AI capable of full automation of civilization is probably already available.
The speedup from today's coding agents is not just a within-paradigm speedup. If someone is trying to figure out how to do continual learning or brain-like AGI or whatever, they need to run experiments on GPUs as part of their research, and they'll be able to do that faster with the help of Claude Code. Only slightly faster of course. But the point is, it's not just a within-paradigm speedup. And it'll get stronger over the next year or three, as the coding agents get massively better and able to succeed at longer-horizon tasks. Moreover, horizon lengths seem to be going up on most (all?) domains, not just coding; this suggests that the current paradigm will eventually automate the parts of AI R&D involved with new paradigms.
So... no? You don't have any other milestones or benchmarks to point to that you think are better?
Separately, I take your point that "at some point the current architecture/method can be written off as insufficient... until a new breakthrough ... which takes an amount of time that's probably not predictable..." except that actually I think it IS predictable, if we think that the current paradigm will massively accelerate AI R&D in general: "Shortly after AI R&D is massively accelerated, the next paradigm (or three) will be discovered" would be my prediction.
Also, I don't think there's that much more in the "new breakthrough" category needed. Like, maybe continual learning? And that's about it? Not even sure we need it tbh.
I acknowledge that METR time horizon has loads of limitations, my position has just been that it's the least bad benchmark to extrapolate / the best single piece of evidence we have. Do you have a better suggestion / alternative?
Similarly, re: 0% code reviewed: Seems similarly relevant to me as the 0% code written milestone. Would you agree? Do you have other milestones to point to which you think are more relevant than either?
That's why I said "possibly still a few years later."
I wish I had a more confident answer to give you! But my answer is:
You can read more about it here: https://www.aifuturesmodel.com/forecast/daniel-01-26-26
(OK this is for Automated Coder which is a somewhat weaker milestone than Superhuman Coder, we are switching to it because it seems a more appropriate milestone)
The game in question was about as decentralized as you expect, I think? But, importantly, compute is very unevenly distributed. The giant army of AIs running on OpenAI's datacenters all have the same system prompt essentially (like, maybe there are a few variants, but they are all designed to work smoothly together towards OpenAI's goals) and that army constitutes 20% of the total population of AIs initially and at one point in the game a bit more than 50%.
So while (in our game) there were thousands/millions of different AI factions/goals of similar capability level, the top 5 AI factions/goals by population size / compute level controlled something like 90% of the world's compute, money, access-to-powerful-humans, etc. So to a first approximation, it's reasonable to model the world as containing 1-4 AI factions, plus a bunch of miscellaneous minor AIs that can get up to trouble and shout warnings from the sidelines but don't wield significant power.
If you are interested in playing a game sometime, you'd be welcome to join! I'd encourage you to make your own variant scenario too if you like.