Previously "Lanrian" on here. Research analyst at Open Philanthropy. Views are my own.
Ok, gotcha.
It's that she didn't accept the reasoning behind that number enough to really believe it. She added a discount factor based on fallacious reasoning around "if it were that easy, it'd be here already".
Just to clarify: There was no such discount factor that changed the median estimate of "human brain compute". Instead, this discount factor was applied to go from "human brain compute estimate" to "human-brain-compute-informed estimate of the compute-cost of training TAI with current algorithms" — adjusting for how our current algorithm seem to be worse than those used to run the human brain. (As you mention and agree with, although I infer that you expect algorithmic progress to be faster than Ajeya did at the time.) The most relevant section is here.
I suspect there's a cleaner way to make this argument that doesn't talk much about the number of "token-equivalents", but instead contrasts "total FLOP spent on inference" with some combination of:
Having higher "FLOP until X" (for each of the X in the 3 bullet points) seems to increase danger. While increasing "total FLOP spent on inference" seems to have a much better ratio of increased usefulness : increased danger.
In this framing, I think:
It's possible that "many mediocre or specialized AIs" is, in practice, a bad summary of the regime with strong inference scaling. Maybe people's associations with "lots of mediocre thinking" ends up being misleading.
Thanks!
I agree that we've learned interesting new things about inference speeds. I don't think I would have anticipated that at the time.
Re:
It seems that spending more inference compute can (sometimes) be used to qualitatively and quantitatively improve capabilities (e.g., o1, recent swe-bench results, arc-agi rather than merely doing more work in parallel. Thus, it's not clear that the relevant regime will look like "lots of mediocre thinking".[1]
There are versions of this that I'd still describe as "lots of mediocre thinking" —adding up to being similarly useful as higher-quality thinking.
(C.f. above from the post: "the collective’s intelligence will largely come from [e.g.] Individual systems 'thinking' for a long time, churning through many more explicit thoughts than a skilled human would need to solve a problem" & "Assuming that much of this happens 'behind the scenes', a human interacting with this system might just perceive it as a single super-smart AI.)
The most relevant question is whether we'll still get the purported benefits of the lots-of-mediocre-thinking-regime if there's strong inference scaling. I think we probably do.
Paraphrasing my argument in the "Implications" section:
I think o3 results might involve enough end-to-end training to mostly contradict the hopes of bullet points 1-2. But I'd guess it doesn't contradict 3-4.
(Another caveat that I didn't have in the post is that it's slightly tricker to supervise mediocre serial thinking than mediocre parallel thinking, because you may not be able to evaluate a random step in the middle without loading up on earlier context. But my guess is that you could train AIs to help you with this without adding too much extra risk.)
One argument I have been making publicly is that I think Ajeya's Bioanchors report greatly overestimated human brain compute. I think a more careful reading of Joe Carlsmith's report that hers was based on supports my own estimates of around 1e15 FLOPs.
Am I getting things mixed up, or isn’t that just exactly Ajeya’s median estimate? Quote from the report: ”Under this definition, my median estimate for human brain computation is ~1e15 FLOP/s.”
https://docs.google.com/document/d/1IJ6Sr-gPeXdSJugFulwIpvavc0atjHGM82QjIfUSBGQ/edit
We did the the 80% pledge thing, and that was like a thing that everybody was just like, "Yes, obviously we're gonna do this."
Does anyone know what this is referring to? (Maybe a pledge to donate 80%? If so, curious about 80% of what & under what conditions.)
Related: The monkey and the machine by Paul Christiano. (Bottom-up processes ~= monkey. Verbal planner ~= deliberator. Section IV talks about the deliberator building trust with the monkey.)
A difference between this essay and Paul's is that this one seems to lean further towards "a good state is one where the verbal planner ~only spends attention on things that the bottom-up processes care about", whereas Paul's essay suggests a compromise where the deliberator gets to spend a good chunk of attention on things that the monkey doesn't care about. (In Rand's metaphor, I guess this would be like using some of your investment returns for consumption. Where consumption would presumably count as a type of dead money, although the connotations don't feel exactly right, so maybe it should be in a 3rd bucket.)
Here's the best explanation + study I've seen of Dunning-Krueger-ish graphs: https://www.clearerthinking.org/post/is-the-dunning-kruger-effect-real-or-are-unskilled-people-more-rational-than-it-seems
Their analysis suggests that their data is pretty well-explained by a combination of a "Closer-To-The-Average Effect" (which may or may not be rational — there are multiple possible rational reasons for it) and a "Better-Than-Average Effect" that appear ~uniformly across the board (but getting swamped by the "closer-to-the-average effect" at the upper end).
probably research done outside of labs has produced more differential safety progress in total
To be clear — this statement is consistent with companies producing way more safety research than non-companies, if companies also produce even way more capabilities progress than non-companies? (Which I would've thought is the case, though I'm not well-informed. Not sure if "total research outside of labs look competitive with research from labs" is meant to deny this possibility, or if you're only talking about safety research there.)
I'm not sure if the definition of takeover-capable-AI (abbreviated as "TCAI" for the rest of this comment) in footnote 2 quite makes sense. I'm worried that too much of the action is in "if no other actors had access to powerful AI systems", and not that much action is in the exact capabilities of the "TCAI". In particular: Maybe we already have TCAI (by that definition) because if a frontier AI company or a US adversary was blessed with the assumption "no other actor will have access to powerful AI systems", they'd have a huge advantage over the rest of the world (as soon as they develop more powerful AI), plausibly implying that it'd be right to forecast a >25% chance of them successfully taking over if they were motivated to try.
And this seems somewhat hard to disentangle from stuff that is supposed to count according to footnote 2, especially: "Takeover via the mechanism of an AI escaping, independently building more powerful AI that it controls, and then this more powerful AI taking over would" and "via assisting the developers in a power grab, or via partnering with a US adversary". (Or maybe the scenario in 1st paragraph is supposed to be excluded because current AI isn't agentic enough to "assist"/"partner" with allies as supposed to just be used as a tool?)
What could a competing definition be? Thinking about what we care most about... I think two events especially stand out to me:
Maybe a better definition would be to directly talk about these two events? So for example...
Where "significantly less probability of AI-assisted takeover" could be e.g. at least 2x less risk.
The motivation for assuming "future model weights secure" in both (1a) and (1b) is so that the downside of getting the model weights stolen imminently isn't nullified by the fact that they're very likely to get stolen a bit later, regardless. Because many interventions that would prevent model weight theft this month would also help prevent it future months. (And also, we can't contrast 1a'="model weights are permanently secure" with 1b'="model weights get stolen and are then default-level-secure", because that would already have a really big effect on takeover risk, purely via the effect on future model weights, even though current model weights probably aren't that important.)
The motivation for assuming "good future judgment about power-seeking-risk" is similar to the motivation for assuming "future model weights secure" above. The motivation for choosing "good judgment about when to deploy vs. not" rather than "good at aligning/controlling future models" is that a big threat model is "misaligned AIs outcompete us because we don't have any competitive aligned AIs, so we're stuck between deploying misaligned AIs and being outcompeted" and I don't want to assume away that threat model.