Do confident short timelines make sense?

abramdemski

This is a valuable discussion to have, but I believe Tsvi has not raised or focused on the strongest arguments. For context, like Tsvi, I don't understand why people seem to be so confident of short timelines. However (though I did not read everything, and honestly I think this was justified since the conversation eventually seems to cycle and become unproductive) I generally found Abram's arguments more persuasive and I seem to consider short timelines much more plausible than Tsvi does.

I agree that "originality" / "creativity" in models is something we want to watch, but I think Tsvi fails to raise to the strongest argument that gets at this: LLMs are really, really bad at agency. Like, when it comes to the general category of "knowing stuff" and even "reasoning stuff out" there can be some argument around whether LLMs have passed through undergrad to grad student level, and whether this is really crystalized or fluid intelligence. But we're interested in ASI here. ASI has to win at the category we might call "doing stuff." Obviously this is a bit of a loose concept, but the situation here is MUCH more clear cut.

Claude cannot run a vending machine business without making wildly t... (read more)

[-]ryan_greenblatt5mo130

Anyway, this is my crux. If we start to see competent agentic behavior I will buy into the short timelines view at 75% +

Seems good to flesh out what you mean by this if it's such a big crux. Ideally, you'd be able to flesh this out in such a way that bad vision (a key problem for games like pokemon) and poor motivation/adversarial-robustness (a key problem for vending claude because it would sort of knowingly make bad financial decisions) aren't highlighted.

Would this count as competent agentic behavior?

The AI often successfully completes messy software engineering tasks which require 1 week of work for a skilled human and which require checking back in with the person who specified the task to resolve ambiguities. The way the AI completes these tasks involves doing a bunch of debugging and iteration (though perhaps less than a human would do).

4Cole Wyeth5mo

Yes, if time horizons on realistic SWE tasks pass 8-16 hours that would change my mind - I have already offered to bet the AI 2027 team cash on on this (not taken up) and you can provide me liquidity on the various existing manifold markets (not going to dig up the specific ones) which I very occasionally trade on. Adversarial robustness is part of agency, so I don't agree with that aspect of your framing.

2ryan_greenblatt5mo

Maybe so, but it isn't clearly required for automating AI R&D!

5Cole Wyeth5mo

I think that it is. I keep meaning to write my thoughts on this issue up. I believe adversarially robustness is a core agency skill because reasoning can defeat itself; you have to be unable to fool yourself. You can’t be fooled by the processes you spin off, figuratively or literally. You can’t be fooled by other people’s bad but convincing ideas either. this is related to an observation I’ve made that exotic counterexamples are likely to show up in wrong proofs, not becuase they are typical, but because mathematicians will tend to construct unusual situations while seeking to misuse true results to prove a false result. a weaker position is that even if adversarial robustness isn’t itself necessary for agency, an egregious failure to be adversarially robust seems awfully likely to indicate that something deeper is missing or broken.

8ryan_greenblatt5mo

IMO, the type of adversarial robustness you're discussing is sufficiently different than what people typically mean by adversarial robustness that it would be worth tabooing the word. (E.g., I might say "robust self-verification is required".)

4Cole Wyeth5mo

I guess that’s true. The way I model this situation is tied to my analysis of joint AIXI which treats the action bits as adversarial because the distribution is not realizable. so, there are actually a few different concepts here which my mental models link in a non-transparent way. (I’ve noticed that when people say things like I just said, it seems to be fairly common that their model is just conflating things and they’re wrong. I don’t think that applies to me, but it’s worth a minor update on the outside view)

3Mateusz Bagiński5mo

To echo my comment from 2 months ago: Or, in other words, all capabilities stem from "getting things to 'align' with each other in the right way". Is this a problematic equivocation of the term "alignment"? The term "alignment" is polysemous and thus quite equivocable anyway, but if we narrow down on what I consider the most sensible explication of the relevant submeaning, i.e., Tsvi's "make a mind that is highly capable, and whose ultimate effects are determined by the judgement of human operators", then (modulo whether you want to apply the term "alignment" to the LLMs which is downstream from other modulos: modulo "highly capable" (and modulo "mind") and modulo the question of whether there is a sufficient continuity or inferential connection between the LLMs you're talking about here and the possible future omnicide-capable AI or whatever[1]) I think the framing mostly works. I still feel like there's something wrong or left unsaid in this framing. Perhaps it's that the tails of the alignment-capabilities distinction (to the extent that you want to use it all) come apart as you move from the coarse-grained realm of clear distinction between "thing can do bad thing X but won't and that 'won't' is quite robust" to the finer-grained real of blurry "thing can't do X but for reasons that are too messy to concisely describe in terms of capabilities and alignment". 1. ^ These are plausibly very non-trivial modulos ... but modulo that non-triviality too.

3TsviBT5mo

Not sure what you mean by agency, but I probably disagree with you here. I don't think agency is that strong an indicator of "this is going to kill us within 5 years", and conversely I don't think the lack of agency implies "this won't kill us within 5 years". In these sorts of cases, I probably qualitatively agree with Abram's point about performance / elicitation / "alignment". In other words, I expect training with RL (broadly) to pick up some medium-hanging fruit that's pretty easily available given what gippities can already do / quasi-understand. Concretely, I wouldn't be very surprised by FSD working soon, other robotics things working, some jobs on the level of "manage some vending machines" being replaced, some customer relationship management jobs being replaced, etc. For comparison, good old fashioned chess playing programs defeated human chess players last millenium by searching through action-paths superhumanly. That's already enough agency to be very scary.

3Cole Wyeth5mo

I think that agency at chess is not the same as agency in the real world. That is why we have superhuman chess bots, and not super human autonomous drones. (I don’t expect this to be convincing. I agree that we disagree. I have not seen strong evidence that agency failures will be easily overcome with better elicitation)

7abramdemski5mo

Reinforcing Tsvi's point: I tend to think the correct lesson from Claude Plays Pokemon is "it's impressive that it does as well as it does, because it hasn't been trained to do things like this at all!". Same with the vending machine example. Presumably, with all the hype around "agentic", tasks like this (beyond just "agentic" coding) will be added to the RL pipeline soon. Then, we will get to see what the capabilities are like when agency gets explicitly trained. (Crux: I'm wrong if Claude 4 already has tasks like this in the RL.) Very roughly speaking, the bottleneck here is world-models. Game tree search can probably work on real-world problems to the extent that NNs can provide good world-models for these problems. Of course, we haven't seen large-scale tests of this sort of architecture yet (Claude Plays Pokemon is even less a test of how well this sort of thing works; reasoning models are not doing MCTS internally).

7Cole Wyeth5mo

I suppose that I don’t know exactly what kind of agentic tasks LLMs are currently being trained on…. But people have been talking about LLM agents for years, and I’d be shocked if the frontier labs weren’t trying? Like, if that worked out of the box, we would know by now (?). Do you disagree? It seems like for your point to make sense, you have to be arguing that LLMs haven’t been trained on such agentic tasks at all - not just that they perhaps weren’t trained on Pokémon specifically. They’re supposed to be general agents - we should be evaluating them on such things as untrained tasks! And like, complete transcripts of twitch streams of Pokémon play throughs probably are in the training data, so this is even pretty in-distribution. Their performance is NOT particularly impressive compared to what I would have expected chatting with them in 2022 or so when it seemed like they had pretty decent common sense. I would have expected Pokémon to be solved 3 years later. The apparent competence was to some degree an illusion - that or they really just can’t be motivated to do stuff yet. And I worry that these two memes - AGI is near, and alignment is not solved - are kind of propping each other up here. If capabilities seem to lag, it’s because alignment isn’t solved and the LLMs don’t care about the task. If alignment seems to be solved, it’s because LLMs aren’t competent enough to take the sharp left turn, but they will be soon. I’m not talking about you specifically, but the memetic environment on lesswrong. Unrelated but: How do you know reasoning models are not doing MCTS internally? I’m not sure I really agree with that regardless of what you mean by “internally”. ToT is arguably a mutated and horribly heuristic type of guided MCTS. And I don’t know if something MCTS like is happening inside the LLMs.

[-]Vladimir_Nesov5mo175

But people have been talking about LLM agents for years, and I’d be shocked if the frontier labs weren’t trying? Like, if that worked out of the box, we would know by now (?).

Agentic (tool-using) RLVR only started working in late 2024, with o3 the first proper tool-using reasoning LLM prototype. From how it all looks (rickety and failing in weird ways), it'll take another pretraining scale-up to get enough redundant reliability for some noise to fall away, and thus to get a better look at the implied capabilities. Also the development of environments for agentic RLVR only seems to be starting to ramp this year, and GB200 NVL72s that are significantly more efficient for RLVR on large models are only now starting to get online in large quantities.

So I expect that only 2026 LLMs trained with agentic RLVR will give a first reasonable glimpse of what this method gets us, the shape of its limitations, and only in 2027 we'll get a picture overdetermined by essential capabilities of the method rather than by contingent early-days issues. (In the worlds where it ends up below AGI in 2027, and also where nothing else works too well before that.)

6Cole Wyeth5mo

So in other words, everything has to go “right” for AGI by 2027? Maybe it will work. I’m only arguing against high confidence in short timelines. Anything could happen.

9Vladimir_Nesov5mo

I'm responding to the point about LLM agents being a thing for years, and that therefore some level of maturity should be expected from them. I think this isn't quite right, as the current method is new, the older methods didn't work out, and it's too early to tell that the new method won't work out. So I'm discussing when it'll be time to tell that it won't work out either (unless it does), at which point it'll be possible to have some sense as to why. Which is not yet, probably in 2026, and certainly by 2027. I'm not really arguing about the probability that it does work out.

6Cole Wyeth5mo

You are consistent about this kind of reasoning, but a lot of others seem to expect everything to happen really fast (before 2030) while also dismissing anything that doesn’t work as not having been tried because there haven’t been enough years for research.

4abramdemski4mo

Numbers? What does "high confidence" mean here? IIRC from our non-text discussions, Tsvi considers anything above 1% by end-of-year 2030 to be "high confidence in short timelines" of the sort he would have something to say about. (But not the level of strong disagreement he's expressing in our written dialogue until something like 5-10% iirc.) What numbers would you "only argue against"?

6TsviBT4mo

Say what now?? Did I write that somewhere? That would be a typo or possibly a thinko. My own repeatedly stated probabilities would be around 1% or .5%! E.g. in https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce

2abramdemski4mo

I recall it as part of our (unrecorded) conversation, but I could be misremembering. Given your reaction I think I was probably misremembering. Sorry for the error! So, to be clear, what is the probability someone else could state such that you would have "something to say about it" (ie, some kind of argument against it)? Your own probability being 0.5% - 1% isn't inconsistent with what I said (if you'd have something to say about any probability above your own), but where would you actually put that cutoff? 5%? 10%?

6TsviBT4mo

If someone says 10% by 2030, we disagree, but it would be hard to find something to talk about purely on that basis. (Of course, they could have other more specific beliefs that I could argue with.) If they say, IDK, 25% or something (IDK, obviously not a sharp cutoff by any means, why would there be?), then I start feeling like we ought to be able to find a disagreement just by investigating what makes us say such different probabilities. Also I start feeling like they have strategically bad probabilities (I mean, their beliefs that are incorrect according to me would have practical implications that I think are mistaken actions). (On second thought, probably even 10% has strategically bad implications, assuming that implies 20% by 2035 or similar.)

2Cole Wyeth4mo

High confidence means at least over 75% Short timelines means, say, less than 10 years, though at this point I think the very short timeline picture means “around 2030” I don’t know how anyone could reasonably refer to 1% confidence as high.

2abramdemski4mo

Well, overconfident/underconfident is always only meaningful relative to some baseline, so if you strongly think (say) 0.001% is the right level of confidence, then 1% is high relative to that. The various numbers I've stated during this debate are 60%, 50%, and 30%, so none of them are high by your meaning. Does that really mean you aren't arguing against my positions? (This was not my previous impression.)

2Cole Wyeth4mo

I think 60% by 2030 is too high, and I am arguing against numbers like that. There's some ambiguity about drawing the lines because high numbers on very short timelines are of course strictly less plausible than high numbers on merely short timelines, so there isn't necessarily one best number to compare. On reflection, I don't like the phrase "high confidence" for <50% and preferably not even for <75%. Something like "high credence" seems more appropriate - though one can certainly have higher or lower confidence, it is not clear communication to say you are highly confident of something which you believe at little better than even odds. Even if you were buying a lottery ticket with the special knowledge that you had picked one of three possible winning numbers, you still wouldn't say you were highly confident that ticket would win - even though we would no longer be confident of losing! Anyway, I haven't necessarily been consistent / explicit about this throughout the conversation.

1IC Rainbow4mo

I'm at least 50% sure that this timeline would happen ~2x faster. Conditional on training for agency yielding positive results the rest would be overdetermined by EoY 2025 / early 2026. Otherwise, 2026 will be a slog and the 2027 wouldn't happen in time (i.e. longer timelines).

5Raemon5mo

I don't think LLMs have been particularly trained on what I'd consider the obvious things to really focus on agency-qua-agency in the sense we care about here. (I do think they've been laying down scaffolding and doing the preliminary versions of the obvious-things-you'd-do-first-in-particular)

6Cole Wyeth5mo

Several months ago I had dinner with a GMD employee who's team was working on RL to make LLMs play games. I would be very surprised if this hasn't been going on for well over a year already.

7abramdemski5mo

In terms of public releases, reasoning models are less than a year old. The way these things work, I suspect, is that there are a lot of smaller, less expensive experiments going on at any given time, which generally take time to make it into the next big training run. These projects take some time to propose and develop, and the number of such experiments going on at a frontier lab at a given time is (very roughly) the number of research engineers (ie talent-constrained; you can't try every idea). Big training runs take several months, with roughly one happening at a time. "Agentic" wasn't a big buzzword until very recently. Google Trends shows an obvious exponential-ish trend which starts very small, in the middle of last year, but doesn't get significant until the beginning of this year, and explodes out from there. Thinking about all this, I think things seem just about on the fence. I suspect the first few reasoning models didn't have game-playing in their RL at all, because the emphasis was on getting "reasoning" to work. A proactive lab could have put game-playing into the RL for the next iteration. A reactive lab could have only gotten serious about it this year. The scale also matters a lot. Data-hunger means that they'll throw anything they have into the next training run so long as it saw some success in smaller-scale experiments and maybe even if not. However, the first round of game-playing training environments could end up being a negligible effect on the final product due to not having a ton of training cases yet. However, by the second round, if not the first, they should have scraped together a big collection of cases to train on. There's also the question of how good the RL algorithms are. I haven't looked into it very much and also most of the top labs keep details quite private anyway, but, my impression is that the RL algorithms used so far have been quite bad (not 'real RL' -- just assigning equal credit to all tokens in a chain-of-thought

3TsviBT5mo

It's different, yeah---for example, in that doing interesting things in the real world requires originary concept creation. But to do merely "agentic" things doesn't necessarily require that. IDK what you meant by agency if not "finding paths through causality to drive some state into a small sector of statespace"; I was trying to give a superhuman example of that.

[-]AnthonyC5mo151

Thanks, this is a really interesting conversation to read!

One thing I have not seen discussed much from either of these viewpoints (or maybe it is there and I just missed it) is how rare frontier-expanding intelligence is among humans, and what that means for AI. Among humans, if you want to raise someone, it's going to cost you something like 20-25 years and $2-500k. If you want to train a single scientist, on average you're going to have to do this about a few hundred to a thousand times. If you want to create a scientist in a specific field, much more than that. If you want to create the specific scientist in a specific field who is going to be able to noticeably advance that field's frontier, well, you might need to raise a billion humans before that happens, given the way we generally train humans.

If I went out in public and said, "Ok, based on this, in order to solve quantum gravity we'll need to spend at least a quadrillion dollars on education" the responses (other than rightly ignoring me) would be a mix of "That's an absurd claim" and "We're obviously never going to do that," when in fact that's just the default societal path viewed from another angle.

But, in this, ... (read more)

[-]TsviBT5mo152

how rare frontier-expanding intelligence is among humans,

On my view, all human children (except in extreme cases, e.g. born without a brain) have this type of intelligence. Children create their conceptual worlds originarily. It's not literally frontier-expanding because the low-hanging fruit have been picked, but it's roughly the same mechanism.

Maybe this is a matter of shots-on-goal, as much as anything else, and better methods and insights are mostly reducing the number of shots on goal needed to superhuman rates rather than expanding the space of possibilities those shots can access.

Yeah but drawing from the human distribution is very different from drawing from the LP25 distribution. Humans all have the core mechanisms, and then you're selecting over variation in genetic and developmental brain health / inclination towards certain kinds of thinking / life circumstances enabling thinking / etc. For LP25, you're mostly sampling from a very narrow range of Architectures, probably none of which are generally intelligent.

So technically you could set up your laptop to generate a literally random python script and run it every 5 minutes. Eventually this would create an AGI, yo... (read more)

6abramdemski5mo

The distribution of mistakes is very different, and, I think, illuminates the differences between human minds and LLMs. (Epistemic status: I have not thoroughly tested the distribution of AI mistakes against humans, nor have I read thorough research which tested it empirically. I could be wrong about the shape of these distributions.) It seems like LLM math ability cuts off much more sharply (around 8 digits I believe), whereas for humans, error rates are only going to go up slowly as we add digits. This makes me somewhat more inclined towards slow timelines. However, it bears repeating that LLMs are not human-brain-sized yet. Maybe when they get to around human-brain-sized, the distribution of errors will look more human.

[-]Noosphere895mo94

I have a couple things to add here to the conversation that I think will help:

I think a lot of current LLM failures that @TsviBT is pointing to are downstream of 2 big issues currently, which is that current LLM weights are frozen, meaning that once they stop training, they have no way to learn anything new other than ICL, and while there's some generalization ability, it's also not enough for real use cases like automating AI research, and in particular LLMs have no long-term memory, meaning they have to relearn things that a human would only have to lear

... (read more)

4abramdemski5mo

I don't agree with this connection. Why would you think that continual learning would help with this specific sort of thing? It seems relevantly similar to just throwing more training data at the problem, which has shown only modest progress so far.

4Noosphere895mo

The key reason is to bend the shape of the curve, and my key crux is I don't expect throwing more training data to change the shape of the curve where past a certain point, LLMs sigmoid/fall off hard, and my expectation is more training data would make LLMs improve, but they'd still have a point where once LLMs are asked to do any task harder than that point, LLMs start becoming incapable more rapidly in humans. To quote Gwern: From this link: https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF (Note that I have a limit on how many comments I can make per week, so I will likely respond slowly, if at all to any responses to this).

2abramdemski5mo

It seems to me like the improvement in learning needed for what Gwern describes has little to do with "continual" and is more like "better learning" (better generalization, generalization from less examples).

2TsviBT5mo

Regarding continual learning and memory, I mentioned in the dialogue that I'm not just talking about performance of trained LLMs, but rather addressing the whole Architecture: Your remarks sound to me like "We just need X", which I addressed here: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions See also https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html#silently-imputing-the-ghost-in-the-machine , which I'll quote from:

9Noosphere895mo

Note that I was talking about both long-term memory and continual learning, not just continual learning, so I'm happy to concede that my proposed architecture is not like how LLMs are trained today, and thus could reasonably be called a non-LLM architecture. Though I will say that the BabyLM challenge and to a lesser extent the connect the dots paper is evidence that part of the reason current LLMs are so data inefficient is not because of fundamental limitations, but rather because AI companies didn't really need to have LLMs be data efficient in order for LLMs to work so far, but by 2028-2030, this won't work nearly as effectively assuming LLMs haven't automated away all AI research. You've mentioned the need for a missing update, and I think part of that missing update is that we didn't really realize how large the entire internet was, and this gave the fuel for the very impressive LLM scaling, but this is finite, and could very plausibly not be enough for LLMs out of the current companies. However, I'm inclined towards thinking the issue may not be as fundamental as you think it is, for the reason @abramdemski said below: Remember, this is a small scale experiment, and you often have to go big in order to make use of your new findings, even if there are enough efficiency tricks such that at the end, you can make an AI that is both very capable and more efficient than modern human learning (I'm not assuming that there exists a method such that LLMs can be made more data efficient than a human, but am claiming that if they exist, there still would need to be scaling to find those efficiency tricks). So it being only as good as GPT2 is unsurprising. Keep in mind that GPT-3 was trained by OpenAI who absolutely believed in the ability to scale up compute, and had more resources than academic groups at the time. To respond to this: My main response is that once we condition on LLMs not having weight level continual learning as well as them not having a long-ter

1TsviBT5mo

Ok... but are you updating on hypothetical / fictional evidence? BTW to clarify, the whole sample efficiency thing is kind of a sideline to me. If someone got GPT4 level performance by training on human data that is like 10x the size of the books that a well-read human would read in 50 years, that would be really really weird and confusing to me, and would probably shorten my timelines somewhat; in contrast, what would really shorten my timelines would be observations of LP2X creating novel interesting concepts (or at least originary interesting concepts, as in Hänni's "Cantor's Diagonal from scratch" thing). Yes, this is a good example of mysteriously assuming for no reason that "if we just get X, then I don't see what's stopping my learning program from being an AGI, so therefore it is an AGI", which makes absolutely zero sense and you should stop. No it's not. I mean it is a little bit. But it's also "because" "neurons implement Bayesian learning", and it's also "because" "neurons implement a Turing machine". Going from this sort of "because" to "because my thing is also a Turing machine and therefore it's smart, just like neurons, which are also Turing machines" makes zero sense. What considerations (observations, arguments, etc.) most strongly contributed to convincing you of the strongest form of this proposition that you believe?

5TsviBT5mo

@Lucius Bushnaq It's not too combative, you're wrong. My previous comment laid out what's wrong with the reasoning. Then Noosphere89 wrote a big long comment that makes all the same lines of reasoning, still without giving any arguments. This is really bad epistemics, and people going around vibing hard about this have been poisoning (or rather, hijacking https://www.lesswrong.com/posts/dAz45ggdbeudKAXiF/a-regime-change-power-vacuum-conjecture-about-group-belief) the discourse for 5 years.

[-]Lucius Bushnaq5mo133

I do not think that Noosphere’s comment did not contain an argument. The rest of the comment after the passage you cited tries to lay out a model for why continual learning and long-term memory might be the only remaining bottlenecks. Perhaps you think that this argument is very bad, but it is an argument, and I did not think that your reply to it was helpful for the discussion.

[-]Mateusz Bagiński5mo70

First, I don't buy Mateusz' conclusion from the whack-a-mole analogy. AI safety is hard because, once AIs are superintelligent, the first problem you don't catch can kill you. AI capability research is relatively easy because when you fail, you can try again. If AI safety is like a game of whack-a-mole where you lose the first time you miss, AI capabilities is like whack-a-mole with infinite retries.

I don't think the difference between "first problem you don't catch can kill you" and "when you fail, you can try again" is relevant here.

The thing I had in mi... (read more)

9Cole Wyeth5mo

Right. I think the mole generator is basically lack of continual learning and arbitrarily deep neural reasoning (which is different than eg CoT), and that it manifests itself most clearly in agency failures but also suggests something like limits of original thinking.

4TsviBT5mo

Alas, more totally unjustified "we just need X". See https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense?commentId=NpT59esc92Zupu7Yq

2Cole Wyeth5mo

I’m not saying that’s necessarily the last obstacle.

2abramdemski5mo

"The mole generator is basically X" seems somewhat at odds with the view Mateusz is expressing here, which seems more along the lines "LLM researchers are focusing on moles and ignoring where the moles are coming from" (the source of the moles being difficult to see).

2Mateusz Bagiński5mo

The mole generator might be easy to see (or identify with relatively high certainty), but even if one knows the mole generator, addressing it might be very difficult.

2abramdemski5mo

Straining the analogy, the mole-hunters get stronger and faster each time they whack a mole (because the AI gets stronger). My claim is that it isn't so implausible that this process could asymptote soon, even if the mole-mother (the latent generator) doesn't get uncovered (until very late in the process, anyway). This is highly disanalogous to the AI safety case, where playing whack-a-mole carries a very high risk of doom, so the hunt for the mole-mother is clearly important. In the AI safety case, making the mistake of going after a baby mole instead of the mole-mother is a critical error. In the AI capabilities case, you can hunt for baby moles and look for patterns and learn and discover the mole-mother that way. A frontier-lab safety researcher myopically focusing on whacking baby moles is bad news for safety in a way that a frontier-lab capabilities researcher myopically focusing on whacking baby moles isn't such bad news for capabilities.

3Mateusz Bagiński4mo

Thanks for clarifying. I do feel some pull in this direction, but it's probably because the weight of the disvalue of the consequences of this "asymptoting" warps my assessment of plausibilities. When I try to disentangle these factors, I'm left with a vague "Rather unlikely to asymptote, but I surely rather not have anyone test this hypothesis.".

[-]Gram Stone5mo60

So I feel like Tsvi is actually right about a bunch of stuff but that his timelines are still way too long. I think of there as being stronger selection, on australopithecines and precursors, for bipedalism during interglacial periods, because it was hotter and bipedalism reduces the solar cross-section, and this is totally consistent with this not being enough/the right kind of selection over several interglacial periods to cause evolution to cough up a human. But if there had been different path dependencies, you could imagine a world where enough consec... (read more)

2TsviBT5mo

It makes sense to say "we're currently in a high hazard-ratio period, but H will decrease as we find out that we didn't make AGI". E.g. because you just discovered something that you're now exploring, or because you're ramping up resource inputs quickly. What doesn't make sense to me (besides H being so high) is the sharp decrease in H. Though maybe I'm misunderstanding what people are saying and actually their H does fall off more smoothy. As I mentioned, H should be smeared out. Any given resource might take longer than you expected to ramp up, or to be applied to the right things; any given insight might take longer to implement, or might have further insightful ramifications which then have their own minor innovation curves. Given this smearing effect, I don't intuitively / pretheoretically see how a bimodal H could possibly make sense. Maybe it does, I've just not yet heard a plausible story. You can have things like bimodals if you have some really specific method that you think should work, and if it doesn't work then nothing like it should work. But no one has ever, in earshot of me, explained anything remotely like this regarding AGI (and lots of people have vaguely, confident-soundingly, said that they did have a picture like that--until questioned, at which point they totally and embarrassingly fall apart).

1Gram Stone5mo

So I guess my model says that AGI R&D is basically the opposite of human evolution in a certain sense, actually basically all of the cognitive architecture necessary to cough up a human was in place by the speciation of chimpanzees, if not macaques, it really was just a matter of scaling (including having the selection pressures and metabolic resources necessary to connect previously unconnected modules, and don't underestimate the generality of a 'module,' on my model), but like, if this took several complicated pieces, then if you're relying on a different dependency structure (possibly like modern AI research, which the weirdly anachronistic capabilities of LLMs strongly suggest) with tons of money, time, energy, and hardware, you could enjoy way more abundance than human evolution and make the last of the algorithmic improvements that evolution made and suddenly get a system at least as capable as the human algorithm with that much hardware, and take over the world. I'm saying you should imagine human evolution thus far as having made way more algorithmic progress than our civilization because it was strongly constrained by resource availability. I don't have code that would end the world if I ran it, nor would I admit it if I did, but I feel like I have a good enough account of civilizational inadequacy in this domain, and a good enough model of human cognitive evolution, and 'cultural evolution', to conclude that LLMs are a massive enough boon to research productivity for key individuals/organizations to be a serious threat? I guess I feel like bimodal distributions can be reasonable by some kind of qualitative reasoning like, "How likely is it that I am merely two insights from AGI, as opposed to one or many?" If I had to share some things that I don't think would quickly end the world by being shared, given that they're already public, and given that, if me pointing out this difference is likely to quickly end the world, it's significant evidence in favo

2TsviBT5mo

I deny this and have no idea how you or anyone thinks you can do this (but therefore I can't be too too confident). Meh. I think you're discounting the background stuff that goes into the way humans do it. For example, we have additional juice going into "which representations should I use, given that I wanted to play around with / remember about / plan using my representation of this thingy?". NeRFs are not going to support detailed counterfactuals very well out of the box I don't think! Maybe well enough for self-driving cars that at least avoid crashing; but not well enough to e.g. become interested in an anomaly, zoom in, do science, and construct a better representation which can then be theorized about.

1Gram Stone4mo

Yes, we should distinguish between the ability to generate counterfactuals at all versus being able to use that ability instrumentally, but I was kind of trying to preempt this frame with "still don't think this is the The Xth Insight." I mean, NeRFs were the beginning, but we can already generate 3d Gaussian splats from text prompts or single images, do semantic occupancy prediction in 3d splats, construct 4d splats from monocular video, do real-time 4d splats with enough cameras, etc., and it seems to me that doing these things opens the way to creating synthetic datasets of semantic 4d splats, which it further seems you could use to train generative models that would constitute Constructive Episodic Simulators, in which case on my model, actually yes, something akin to human episodic imagination, if not 'true' counterfactuals, should come right out of the box. By themselves, of course these modules will sometimes produce volumetric video analogs of the hallucinations we see in LLMs, not necessarily be very agentic by default, etc., so I don't think achieving this goal immediately kills everyone, but it seems like an essential part of something that could. At the very least I guess I'm predicting that we're going to get some killer VR apps in the next few years featuring models that can generate volumetric video at interactive frame rates.

2TsviBT4mo

I don't know how strong of a / what kind of a claim you're trying to make here... Are you claiming NeRFs represent a substantial chunk of the Xth algorithmic insight? Or not an algorithmic part, but rather setting up a data source with which someone can make the Xth insight? Or...?

1Gram Stone4mo

I'm claiming that any future model that generates semantically rich volumetric histories seems to me to be implementing a simpler version of humans' constructive episodic simulation adaptations, of which episodic counterfactuals, episodic memories, imaginary scenes, imagining perspectives you haven't actually experienced, episodic prospection, dreams, etc. are special cases. So 'antepenultimate algorithmic insight,' and 'one of just a few remaining puzzle pieces in a lethal neuromorphic architecture' both strike me as relatively fair characterizations. I have this intuition that some progress can be characterized more as a recomposition of existing tricks, whereas some tricks are genuinely new under the sun, which makes me want to make this distinction between architecture and algorithms, even though in the common sense every architecture is an algorithm; this feels fuzzy, relative, and not super defensible, so I won't insist on it. But to describe my view through the lens of this distinction, more capable generalizations of stuff like MAV3D would be a critical module (algorithmic-level) in a generally intelligent neuromorphic architecture. Yes, you need other modules for this architecture to efficiently search for episodic simulations in a way that effectively guides action, and for taking simulation as an action itself and learning when and how to episodically simulate, and so on, but I'm specifically trying not to describe a roadmap here. As far as I know we're nowhere near exploiting existing video corpora as much as we could for training things like MAV3D, and yes, it seems to me we would be well-positioned to build synthetic datasets for future generative volumetric video models from the outputs of earlier models trained on videos of real scenes, and perhaps from data on VR users controlling avatars in interactive volumetric scenes as well. It seems like this would be easy for Meta to do. I'm more sure that this generates data that can be used for making bet

2TsviBT4mo

Ok. This is pretty implausible to me. Bagiński's whack-a-mole thing seems relevant here, as well as the bitter lesson. Bolting MAV3D into your system seems like the contemporary equivalent of manually writing convolution filters in your computer vision system. You're not striking at the relevant level of generality. In other words, in humans, all the power comes from stuff other than a MAV3D-like thing--a human's MAV3D-like thing is emergent / derivative from the other stuff. Probably.

1Gram Stone4mo

I agree with this as an object-level observation on the usefulness of MAV3D itself, but also have a Lucas critique of the Bitter Lesson that ultimately leads me to different conclusions about what this really tells us. I think of EURISKO, Deep Blue, and AlphaGo/Zero as slightly discordant historical examples that you could defy, but on my view they are subtle sources of evidence supporting microfoundations of cognitive returns on cognitive reinvestment that are inconsistent with Sutton's interpretation of the observations that inspired him to compose The Bitter Lesson. EURISKO is almost a ghost story, but if the stories are true, then this doesn’t imply that N clever tricks would’ve allowed EURISKO to rule the world, or even that EURISKO is better classified as an AI as opposed to an intelligence augmentation tool handcrafted by Lenat to complement his particular cognitive quirks, but Lenat + EURISKO reached a surprising level of capability quite early. Eliezer seems to have focused on EURISKO as an early exemplar of cognitive returns on recursive self-improvement, but I don't think this is the only interesting frame. It’s suggestive that EURISKO was written in Interlisp, as the homoiconic nature of LISPs might have been a critical unhobbling. That is to say, because Lenat’s engineered heuristics were LISP code, by homoiconicity they were also LISP data, an advantage that Lenat fearlessly exploited via macros, and by extension, domain specific languages. It also appears that Lenat implemented an early, idiosyncratic version of genetic algorithms. EURISKO was pretty close to GOFAI, except perhaps for the Search, but these descriptions of its architecture strongly suggest some intuitive appreciation by Lenat of something akin to the Bitter Lesson, decades before the coining of that phrase. It looks like Lenat figured out how to do Search and Learning in something close to the GOFAI paradigm, and got surprisingly high cognitive returns on those investments, although

2TsviBT4mo

I think getting this to work in a way that actually kills everyone, rather than merely is AlphaFold or similar, is really really hard--in the sense that it requires more architectural insight than you're giving credit for. (This is a contingent claim in the sense that it depends on details of the world that aren't really about intelligence--for example, if it were pretty easy to make an engineered supervirus that kills everyone, then AlphaFold + current ambient tech could have been enough.) I think the easiest way is to invent the more general thing. The systems you adduce are characterized by being quite narrow! For a narrow task, yeah, plausibly the more hand-engineered thing will win first. Back at the upthread point, I'm totally baffled by and increasingly skeptical of your claim to have some good reason to have a non-unimodal distribution. You brought up the 3D thing, but are you really claiming to have such a strong reason to think that exactly the combination of algorithmic ideas you sketched will work to kill everyone, and that the 3D thing is exactly most of what's missing, that it's "either this exact thing works in <5 years, or else >10 years" or similar?? Or what's the claim? IDK maybe it's not worth clarifying further, but so far I still just want to call BS on all such claims.

2Cole Wyeth5mo

Wow, that made a surprising amount of sense considering the length of the sentences.

1Gram Stone5mo

Yeah I've been having a problem with people thinking I'm LLM-crazy, I think. But it's what I believe.

[-]Thane Ruthenis5mo52

Tertiarily relevant annoyed rant on terminology:

I will persist in using "AGI" to describe the merely-quite-general AI of today, and use "ASI" for the really dangerous thing that can do almost anything better than humans can, unless you'd prefer to coordinate on some other terminology.

I don't really like referring to The Thing as "ASI" (although I do it too), because I foresee us needing to rename it from that to "AGSI" eventually, same way we had to move from AI to AGI.

Specifically: I expect that AGI labs might start training their models to be superhuman ... (read more)

[-]romeostevensit5mo50

Thanks for the efforts. Modeling good discourse around complicated subjects is hard and valuable.

Didn't know about enthymemes, cool concept.

1% chances of wiping all humans makes the current people working on ai worse than genocidal dictators and I dislike the appeasement of so many people trying to stay in good graces with them on the off chance of influencing them anyway. I think their behavior is clearly sociopathic if that term is to have any meaning at all and the only influence anyone has on them is simulated by them for strategic purposes. They are strongly following power gradients.

[-]β-redex5mo32

During the live debate Tsvi linked to, TJ (an attendee of the event) referred to the modern LLM paradigm providing a way to take the deductive closure of human knowledge: LLMs can memorize all of existing human knowledge, and can leverage chain-of-thought reasoning to combine that knowledge iteratively, making new conclusions. RLVF might hit limits, here, but more innovative techniques might push past those limits to achieve something like the "deductive closure of human knowledge": all conclusions which can be inferred by some combination of existing know

... (read more)

2abramdemski4mo

You're right. I should have put computational bounds on this 'closure'.

[-]Nick_Tarleton5mo20

2.1: This doesn't appear to follow from the previous two steps. EG, is a similar argument supposed to establish that, a priori, bridges are a long way off? This seems like a very loose and unreliable form of argument, generally speaking.

It seems fine to me; bridges were a long way off at most times at which bridges didn't exist! (What wouldn't be fine is continuing to make the a priori argument once there is evidence that we have many of the ideas.)

6abramdemski4mo

I guess it depends on what "a priori" is taken to mean (and also what "bridges" is taken to mean). If "a priori" includes reasoning from your own existence, then (depending on "bridge") it seems like bridges were never "far off" while humans were around. (Simple bridges being easy to construct & commonly useful.) I don't think there is a single correct "a priori" (or if there is, it's hard to know about), so I think it is easy to move work between this step and the next step in Tsvi's argument (which is about the a posteriori view) by shifting perspectives on what is prior vs evidence. This creates a risk of shifting things around to quietly exclude the sort of reasoning I'm doing from either the prior or the evidence. The language Tsvi is using wrt the prior suggests a very physicalist, entropy-centric prior, EG "steel beams don't spontaneously form themselves into bridges" -- the sort of prior which doesn't expect to be on a planet with intelligent life. Fair enough, so far as it goes. It does seem like bridges are a long way off from this prior perspective. However, Tsvi is using this as an intuition pump to suggest that the priors of ASI are very low, so it seems worth pointing out that the priors of just about everything we commonly have today are very low by this prior. Simply put, this prior needs a lot of updating on a lot of stuff, before it is ready to predict the modern world. It doesn't make sense to ONLY update this prior on evidence that pattern-matches to "evidence that ASI is coming soon" in the obvious sense. First you have to find a good way to update it on being on a world with intelligent life & being a few centuries after an industrial revolution and a few decades into a computing revolution. This is hard to do from a purely physicalist type of perspective, because the physical probability of ASI under these circumstances is really hard to know; it doesn't account for our uncertainty about how things will unfold & how these things work in gen

4TsviBT4mo

(Noting that I don't endorse the description of my argument as "physicalist", though I acknowledge that the "spontaneously" thing kinda sounds like that. Allow me to amend / clarify: I'm saying that you, a mind with understanding and agency, cannot spontaneously assemble beams into a bridge--you have to have some understanding about load and steel and bridges and such. I use this to counter "no blockers" arguments, but I'm not denying that we're in a special regime due to the existence of minds (humans); the point is that those minds still have to understand a bunch of specific stuff. As mentioned here: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#The__no_blockers__intuition )

4abramdemski4mo

Yeah, I almost added a caveat about the physicalist thing probably not being your view. But it was my interpretation. Your clarification does make more sense. I do still feel like there's some reference class gerrymandering with the "you, a mind with understanding and agency" because if you select for people who have already accumulated the steel beams, the probability does seem pretty high that they will be able to construct the bridge. Obviously this isn't a very crucial nit to pick: the important part of the analogy is the part where if you're trying to construct a bridge when trigonometry hasn't been invented, you'll face some trouble. The important question is: how adequate are existing ideas wrt the problem of constructing ASI? In some sense we both agree that current humans don't understand what they're doing. My ASI-soon picture is somewhat analogous to an architect simply throwing so many steel beams at the problem that they create a pile tall enough to poke out of the water so that you can, technically, drive across it (with no guarantee of safety). However, you don't believe we know enough to get even that far (by 2030). To you it is perhaps more closely analogous to trying to construct a bridge without having even an intuitive understanding of gravity.

4TsviBT4mo

Yeah, if I had to guess, I'd guess it's more like this. (I'd certainly say so w.r.t. alignment--we have no fucking idea what mind-consequence-determiners even are.) Though I suppose I don't object to your analogy here, given that it wouldn't actually work! That "bridge" would collapse the first time you drive a truck over it.

[-]Expertium5mo10

I'm surprised to see zero mentions of AlphaEvolve. AlphaEvolve generated novel solutions to math problems, "novel" in the "there are no records of any human ever proposing those specific solutions" sense. Of course, LLMs didn't generate them unprompted, humans had to do a lot of scaffolding. And it was for problems where it's easy to verify that the solution is correct; "low messiness" problems if you will. Still, this means that LLMs can generate novel solutions, which seems like a crux for "Can we get to AGI just by incrementally improving LLMs?".

5TsviBT5mo

Please provide more detail about this example. What did the system invent? How did the system work? What makes you think it's novel? Would it have worked without the LLM? (All of the previous many times someone said something of the form "actually XYZ was evidence of generality / creativity / deep learning being awesome / etc.", and I've spent time looking into the details, it turns out that they were giving a quite poor summary of the result, in favor of making the thing sound more scary / impressive. Or maybe using a much lower bar for lots of descriptor words. But anyway, please be specific.)

3Expertium5mo

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ https://arxiv.org/pdf/2506.13131 Example: matrix multiplication using fewer multiplication operations. There were also combinatorics problems, "packing" problems (like multiple hexagons inside a bigger hexagon), and others. All of that is in the paper. Also, "This automated approach enables AlphaEvolve to discover a heuristic that yields an average 23% kernel speedup across all kernels over the existing expert-designed heuristic, and a corresponding 1% reduction in Gemini’s overall training time." It's essentially an evolutionary/genetic algorithm, with LLMs providing "mutations" for the code. Then the code is automatically evaluated, bad solutions are discarded, and good solutions are kept. These solutions weren't previously discovered by humans. Unless the authors just couldn't find the right references, of course, but I assume the authors were diligent. You mean, "could humans have discovered them, given enough time and effort?". Yes, most likely.

4TsviBT5mo

Um, ok, were any of the examples impressive? For example, did any of the examples derive their improvement by some way other than chewing through bits of algebraicness? (The answer could easily be yes without being impressive, for example by applying some obvious known idea to some problem that simply hadn't happened to have that idea applied to it before, but that's a good search criterion.)

1Expertium5mo

I don't think so.

2TsviBT5mo

Ok gotcha, thanks. In that case it doesn't seem super relevant to me. I would expect there to be lots of gains in any areas where there's algebraicness to chew through; and I don't think this indicates much about whether we're getting AGI. Being able to "unlock" domains, so that you can now chew through algebraicness there, does weakly indicate something, but it's a very fuzzy signal IMO. (For contrast, a behavior such as originarily producing math concepts has a large non-algebraic component, and would IMO be a fairly strong indicator of general intelligence.)

3abramdemski5mo

I took it as obvious that this sort of thing wouldn't meet Tsvi's bar. AlphaEvolve seems quite unsurprising to me. We have seen other examples of using LLMs to guide program search. Tsvi and I do have disagreements about how far that sort of thing can take us, but I don't think AlphaEvolve provides clear evidence on that question. Of course LLMs can concentrate the probability mass moderately well, improving brute-force search. Not clear how far that can take us.

138

Do confident short timelines make sense?

138

138

Tsvi's context

Background Context:

A Naive Argument:

Argument 1

Why continued progress seems probable to me anyway:

The Deductive Closure:

The Inductive Closure:

Fundamental Limits of LLMs?

The Whack-A-Mole Argument

Generalization, Size, & Training

Creativity & Originariness

Some responses

Automating AGI research

Whence confidence?

Other points

Timeline Split?

Line Go Up?

Some Responses

Memers gonna meme

"Right paradigm?" Wrong question.

The timescale characters of bioevolutionary design vs. DL research

AGI LP25

"come on people, it's [Current Paradigm] and we still don't have AGI??"

Rapid disemhorsepowerment

Miscellaneous responses

Big and hard

Intermission

Remarks on gippity thinkity

Assorted replies as I read:

Paradigm

Bio-evo vs DL

AGI LP25

Rapid disemhorsepowerment

Miscellaneous

Magenta Frontier

Considered Reply

Point of Departure

Tsvi's closing remarks

Abram's Closing Thoughts