I am worried about near-term non-LLM AI developments

[-]Trent Hodgeson3mo6144

To the degree worries of this general shape are legitimate (we think they very much are), seems like it would be wise for the alignment community to more seriously pursue and evaluate tons of neglected approaches that might solve the fundamental underlying alignment problem, rather than investing the vast majority of resources in things like evals and demos of misalignment failure modes in current LLMs, which definitely are nice to have, but almost certainly won't themselves directly yield scalable solutions to robustly aligning AGI/ASI.

[-]JasonB3mo422

I made a manifold post for this for those who wish to bet on it: https://manifold.markets/JasonBrown/will-a-gpt4-level-efficient-hrm-bas?r=SmFzb25Ccm93bg

[-]otto.barten3mo*41

It's currently on 12%. 12% seems really xrisk-relevant to me. I think xrisk could increase quite a bit if such a model would be much more lightweight (>=2 OOMs) than LLMs.

Also I think this would still be very relevant if the timeline was longer. Maybe a new post for eg five years?

[-]Yair Halberstadt3mo52

Currently on 9%

That's the level where manifold can't really tell you about exact probability because betting no ties up a lot of capital for minimal upside.

Also by 2026 I'd expect to have GPT 4 level LLMs with 1/10th the parameter count just due to algorithmic improvements (maybe I'm wildly wrong here) so doing the same but with different architecture isn't necessarily as indicative as it seems.

[-]otto.barten3mo10

Doing the same with a different architecture could open up the possibility of doing better down the road. I'd be equally interested in how fast it gets better as in how good it is. It would also beg the question: if two architectures can do this, how many more? Do they all max out at the same point or not at all? I think it could be quite important. Would be curious how big experts think the probability is that a different architecture could do LLM level thinking on a reasonably wide range of tasks in say five or ten years.

[-]JasonB3mo50

I've ended up making another post somewhat to this effect, trying to predict any significant architectural shifts over the next year and a half: https://manifold.markets/Jasonb/significant-advancement-in-frontier

[-]ACCount3mo2717

There is an awful lot of "promising new architectures" being thrown around. Few have demonstrated any notable results whatsoever. Fewer still have demonstrated their ability to compete with transformer LLMs on the kind of task transformer LLMs are well suited for.

It's basically just Mamba SSM and diffusion models, and they aren't "better LLMs". They seem like sidegrades to transformer LLMs at best.

HRMs, for example, seem to do incredibly, suspiciously well on certain kinds of puzzles, but I'm yet to see them do anything in language domain, or in math, coding, etc. Are HRMs generalists, like transformers? No evidence of that yet.

Concretely, these are the developments I am predicting within the next six months (i.e. before Feb 1st 2026) with ~75% probability:

Basically, off the top of my head: I'd put 10% on that. Too short of a timeframe.

[-]p.b.3mo153

SSMs are really quite similar to transformers. Similar to all the "sub-quadratic" transformer variants the expectation is at best that they will do the same thing but more efficiently than transformers.

HRMs or continuous thought machines or KANs on the other hand contain new and different ideas that make a discontinuous jump in abilities at least conceivable. So I think one should distinguish between those two types of "promising new architectures".

My view is that these new ideas accumulate and at some points somebody will be able to put them together in a new way to build actual AGI.

But the authors of these papers are not stupid. If there was straightforward applicability to language modelling they would already have done that. If there was line of sight for GPT4 level abilities in six month they probably wouldn't publish the paper.

[-]tailcalled3mo30

KANs seem obviously of limited utility to me...?

[-]p.b.3mo20

I think it is a cool idea and has its application but you are right that it seems very unlikely to contribute to AGI in any way. But there was nonetheless excitement about integrating KANs into transformers which was easy to do but just didn't improve anything.

[-]ryan_b3mo20

Ah, but is it a point-in-time sidegrade with a faster capability curve in the future? At the scale we are working now, even a marginal efficiency improvement threatens to considerably accelerate at least the conventional concerns (power concentration, job loss, etc).

[-]Fejfo3mo10

It's my impression that a lot of the "promising new architectures" are indeed promising. IMO a lot of them could compete with transformers if you invest in them. It just isn't worth the risk while the transformer gold-mine is still open. Why do you disagree?

[-]ACCount3mo10

I disagree because I'm yet to see any of those "promising new architectures" outperform even something like GPT-2 345M, weight for weight, at similar tasks. Or show similar performance with a radical reduction in dataset size. Or anything of the sort.

I don't doubt that a better architecture than LLM is possible. But if we're talking AGI, then we need an actual general architecture. Not a benchmark-specific AI that destroys a specific benchmark, but a more general purpose AI that happens to do reasonably well at a variety of benchmarks it wasn't purposefully trained for.

We aren't exactly swimming in that kind of thing.

[-]Cole Wyeth3mo2110

I think you haven't given sufficient reason to expect your predictions to come true.

Most things don't scale. Why should we expect HRMs to be an exception?

[-]testingthewaters3mo11-12

Mostly because the efficiencies being claimed here are truly staggering. HRM claims to have SOTA performance on ARC-AGI problem sets with 27 million parameters and 1000 training examples (with no general pretraining!). O3-mini-high, the next model on the board, has a parameter count on the order of hundreds of billions and was trained on the entire internet. This is a 3700x efficiency improvement in param count and probably more than 10000x improvement in data efficiency. At this scale even if the scaling peters out at 1000x you are talking about an entirely new paradigm, especially now that there is a partial replication as well.

[-]Cole Wyeth3mo167

Well, o3 wasn’t optimized to perform well on ARC-AGI as its primary purpose - is this a fair comparison?

[-]testingthewaters3mo32

I've also been tracking the development of these kinds of techniques for a year, and they have consistently been showing surprising improvements. The test time training people have been publishing for a while, and the ceiling seems nowhere in sight.

To be honest a complete recounting of why I believe this is somewhat beyond a comment's length. I also feel very strongly that progress is accelerating, and that if I don't say something now there will not be much time to react. Hence the post.

[-]Cole Wyeth3mo1612

Perhaps it's beyond the length of a comment, but why not recount it in the post?

[-]testingthewaters3mo103

I think that the most legible arguments are already in the post. The only thing I would make clearer is that the human brain is an existence proof that such highly efficient continuous learning algorithms exist, and therefore I see the development of these models as not particularly surprising. Stephen Brynes has some similar intuitions in his Foom and Doom series.

[-]Cole Wyeth3mo93

I read (some of) that series and disagreed with his assessment on the same grounds.

[-]testingthewaters3mo52

In which case, the best thing is probably for us to wait and see if the predictions come to pass, if that's okay with you. I might also be afk for a bit so might not be able to immediately reply to any further comments.

[-]Random Developer3mo70

The other thing that seems strange here is that the parameter counts are far, far below those of the human brain. I mean, yeah, o3 is probably 1 or 2 OOM below the brain, but that could be explained away by the fact that o3 is still missing some very basic human abilities, and perhaps it has found interesting efficiencies.

But 27 million parameters is just so many OOM below the human brain that I'm expecting a gotcha. Even at a 60-bit quantization, that's only about 54MB. Which means that the model's world knowledge must be ridiculously limited compared to even simple LLMs. Maybe if it's a specialist model that's tuned for just a handful of very specific benchmarks?

[-]Steven Byrnes3mo4112

“An HRM model trained to do ARC-AGI and nothing else” seems vaguely analogous to “An AlphaZero model trained to play Go and nothing else”. Right?

If you buy that, then I would note that HRM & AlphaZero probably have quite similar numbers of parameters (27M vs maybe 23M, see footnote here). And AlphaZero was just a plain old ResNet, I think.

So I don’t see “HRM solves ARC-AGI with 27M parameters” as evidence for HRM having unusual parameter efficiency. Right? Sorry if I’m misunderstanding.

[-]RussellThor3mo20

It is weak evidence, we simply won't know until we scale it up. If it is automatically good at 3d spatial understanding with extra scale up, then that starts to become evidence it has better scaling properties. (To me it is clear that LLM/Transformers won't scale to AGI, xAI already has close to maxed out scaling and Tesla autopilot probably does everything mostly right but is far less data efficient than people)

[-]ACCount3mo125

ARC-AGI, v1 and v2 both, is a very spatial-reasoning-shaped problem. And LLMs are not very spatial-reasoning-shaped.

It could be that this arch is an unusually good fit for spatial reasoning problems, and a poor fit for others.

We haven't seen it used for either text generation or image generation, both of which are hot topics in AI right now. Which is very weak evidence that it's unsuitable for that kind of task. And much stronger evidence that the authors couldn't get it to work on this kind of task.

[-]Kaj_Sotala3mo71

I wonder how much we need to worry about hybrid architectures. If LLMs do text generation well and continuous learning models do spatial reasoning well, and someone figures out an architecture that lets their strengths synergize with each other...

[-]testingthewaters3mo60

That is the basic idea behind Energy based transformers and test time training!

[-]RussellThor3mo73

OK our intelligence is very spatial-reasoning shaped. Bio architecture can't do language until it has many params. If it is terrible at text or image gen that isn't evidence it won't in fact scale to AGI and best Transformers with more compute. We simply won't know until it is scaled up.

[-]james oofou3mo95

Doesn't the human brain's structure provide something closer to an upper bound rather than a lower bound on the number of parameters required for higher reasoning?

Higher reasoning evolved in humans over a short period of time. And it is speculated that it was mostly arrived at simply by scaling up chimp brains.

This implies that our brains are very far from optimised for higher reasoning, so we should expect that to whatever extent factors other than scale can contribute to higher-reasoning ability, it is possible for brains smaller than our own to engage in higher reasoning.

The human brain should be seen as evidence that a certain scale is ~sufficient, but not that it is necessary.

[-]Random Developer3mo116

The human brain should be seen as evidence that a certain scale is ~sufficient, but not that it is necessary.

The human brain is often estimated to have 10^14 synapses, which would be a 100T model, give or take. Except that individual neurons also have a bunch of internal parameters, which might complicate things.

If you told me that the human brain was massively inefficient, and that you had squeezed human level AGI into 1T parameters, I would be only mildly surprised.

For that matter, if you told me you had squeezed a weak AGI into 30B parameters, I'd be interested in the claim. Qwen3 really is surprisingly capable in that size range. If you told me 4B, I'd be very skeptical, but then again, Gemma 3n does implausibly well on my diverse private benchmarks, and it's technically multi-modal. At the very least, I'd accept it as the premise of a science fiction horror story about tiny, unaligned AIs.

But if we drop all the way to 30 million parameters, I am profoundly suspicious of any kind of general model with language skills and reasonable world knowledge. Even if you store language and world knowledge as compressed text files, you're going to his some pretty hard limits at that size. That's a 60MB ZIP file or less. You'd be taking about needing only 1/3,000,000th of parameters of the brain. Which is a lot of orders of magnitude.

At that size, I'm assuming that any kind of genuinely interesting model would be something like AlphaGo, that demonstrates impressive knowledge and learning abilities in a very narrow domain. Which is fine! It might even be the final warning that AGI is inevitable. But I would still expect more than 6 months would be required to scale back up from such a tiny model to something with general world knowledge, language and common sense.

[-]Vladimir_Nesov3mo31

Higher reasoning evolved in humans over a short period of time. ... The human brain should be seen as evidence that a certain scale is ~sufficient, but not that it is necessary.

We can still see that a chimp scale brain with this architecture isn't sufficient, and human-built AI architectures were also only developed over a short period of time. Backprop and large scale training in parallel for one individual might give AIs an advantage that chimp/human brains don't have, but unclear if this overcomes the widely applicable unhobbling from the much longer efforts by evolution to build minds for efficient online learning robots.

[-]testingthewaters3mo30

To be clear, I don't think that HRM with 27 million params will be natural language capable. However, if my assumptions are correct, a scaled up version of HRM should be able to attain performance similar to frontier models while learning online and being relatively much smaller in size (1-2 OOMs smaller, based on a rough hunch).

[-]alexlyzhov3mo170

I was very impressed with the ARC-AGI results so I read the entire paper and also browsed the code a fair amount.

Only after browsing the code I realized that they likely train on all evaluation tasks in addition to training tasks—correct me if I'm wrong. During inference they only condition on x* and on the task embedding to predict y*, instead of on (x,y)_{1..3}. The only way they could get that task embedding is by training it. Evaluation tasks are harder than those in the train set.

They score a lot higher than the baseline transformer so clearly there's a lot of merit in what they're doing. But in the setting of training on evaluation tasks you can train on only 1 target ARC-AGI-1 task instead of 1000 tasks and still get 20% accuracy: https://iliao2345.github.io/blog_posts/arc_agi_without_pretraining/arc_agi_without_pretraining.html. Given this, it doesn't look earth-shattering.

[-]Kaj_Sotala3mo*80

It does sound like it, someone on Twitter pointed out this page 12 of the HRM paper (emphasis added):

For ARC-AGI challenge, we start with all input-output example pairs in the training and the evaluation sets. The dataset is augmented by applying translations, rotations, flips, and color permutations to the puzzles. Each task example is prepended with a learnable special token that represents the puzzle it belongs to. At test time, we proceed as follows for each test input in the evaluation set: (1) Generate and solve 1000 augmented variants and, for each, apply the inverse-augmentation transform to obtain a prediction. (2) Choose the two most popular predictions as the final outputs.3 All results are reported on the evaluation set.

Which sounds like they used the initial training and evaluation set for training, then generated a new set of "augmented variants" of the original puzzles for the actual evaluation. So when the model is evaluated, it knows that "this is a variant of puzzle X which I've seen before".

EDIT: Though also somebody on Twitter, it sounds like this might be just the ARC-AGI materials being confusingly named?

No foul play here. The ARC-AGI dataset is confusing because it has 2 different kinds of "train" sets. There are train set puzzles and train input-output examples within each puzzle. They trained on only the train examples of the train and validation puzzle sets, which is fair. [...]
It's "all input-output example pairs" not "all input-output pairs". The examples are the training data in ARC-AGI.

[-]alexlyzhov3mo73

They trained on only the train examples of the train and validation puzzle sets, which is fair.

Yes, I agree—I wasn't implying that's foul play. I just thought it's less impressive than I thought because:

It's finetuning on task examples and not in-context few-shot learning
They finetune on the harder evaluation set and not only on easier train set, so they don't demonstrate generalization across the easy->hard distribution shift
The result I linked to was 20% on ARC-AGI-1 by only fitting examples for 1 evaluation task using an MLP-type network vs the 40% result in the paper using 1000 tasks. These numbers are not directly comparable because they did a fair bit of custom architectural engineering to reach 20%, but it really put 40% in perspective for me.

[-]Erik Jenner3mo130

There are literal interpretations of these predictions that aren't very strong:

I expect a new model to be released, one which does not rely on adapting pretrained transformers or distilling a larger pretrained model
It will be inspired by the line of research I have outlined above, or a direct continuation of one of the listed architectures
It will have language capabilities equal to or surpassing GPT-4
It will have a smaller parameter count (by 1-2+ OOMs) compared to GPT-4

GPT-4 was rumored to have 1.8T parameters, so <180B parameters would technically satisfy 4. My impression is that current ~70B open-weight models (e.g. Qwen 2.5) are already roughly as good as the original GPT-4 was. (Of course that's not a fair comparison since the 1.8T parameter rumor is for an MoE model.)

So the load-bearing part is arguably "inspired by [this] line of research," but I'm not sure what would or wouldn't count for that. E.g. a broad interpretation could argue that any test-time training / continual learning approach would count, even if most of the capabilities still come from pretraining similar to current approaches. (Still a non-trivial prediction to be clear!)

My impression was that you're intending to make stronger claims than this broad interpretation. If so, you could consider picking slightly different concretizations to make the predictions more impressive if you end up being right. For example, I'd consider 2 OOMs fewer parameters than GPT-4 noticeably more impressive than 1 OOM (and my guess would be that the divergence between your view and the broader community would be even larger at 3 OOMs fewer parameters). Might be even better to tie it to compute and/or training data instead of parameters. You could also try to make the "inspired by this research" claim more concrete (e.g. "<10% of training compute before model release is spent on training on offline/IID data", if you believe some claim of that form).

[-]Noosphere893mo115

For what it's worth, while I do think this will matter 5-10 years down the line, or more years, I am currently relatively skeptical that this will be achieved soon, and my current view is that a lot of these papers will have gotchas that make them less useful than they think.

The HRM paper has a gotcha pretty much immediately:

https://www.lesswrong.com/posts/tEZa7PouYatK78bbb/?commentId=ELTcESCdWjikCq3HT

That said, worth keeping an eye out here.

I'd put my probability on a new architecture surpassing transformers within 2 years to be much closer to 1%.

[-]Galen3mo112

Flagging that the HRM paper strongly reads as low-substance, after seeing this post I revisited it for a deeper read to fully understand their method and for me this confirmed initial impressions. I used to get very excited about every novel architecture published and over time I think there's some amount of cognitive immunity you can build up, e.g. spending most of the paper rehashing vague "inspirations" tends to be a dark pattern employed when you want to make your use of a standard method seem more novel than it is.

I don't really have the time to dissect the paper but a good general heuristic is understanding something well enough to e.g. re-implement it in pytorch prior to accepting their results at face value. If this is the case here and you still believe that it's a meaningful research advance then it's prob just a difference in research taste and you should ignore this comment.

Otherwise I agree with the general take here.

[-]testingthewaters3mo40

I agree that the paper is not terribly novel. There is no new maths here or radical new technique. If you wanted to be reductive you could just call it two RNN modules stacked on top of each other run at different clock speeds. But the fact that this simple setup is enough to trigger such outsized improvements (which have been partially replicated externally ) is what is alarming to me.

[-]Galen3mo63

I agree that the results are legit, just taking issue with the authors presenting them without prior work context (e.g. setting the wrong reference class s.t. the improvement over baselines appears larger). RNNs getting outsized performance on maze/sudoku is to be expected and the main ARC result seems to be more of a strong data augmentation + SGD baseline rather than something unique to the architecture, ARC-1 was pretty susceptible to this (eg ARC-AGI Without Pretraining)

This being said I think it's a big deal that various RNN architectures have such different characteristics on these limit cases for transformers points to a pretty large jump in capabilities when scaling/pretraining is cracked. I think it'd be good for more people working on alignment to be study what types of behaviors are exhibited in these sorts of models at small scale, with the expectation that the paradigm will eventually shift in this direction.

[-]Galen3mo50

Update: ARC has published a blog post analyzing this, https://arcprize.org/blog/hrm-analysis. As expected swapping in a transformer works approx the same.

[-]SorenJ3mo80

What are your priors, or what is your base rate calculation, for how often promising new ML architectures end up scaling and generalizing well to new tasks?

[-]Matrice Jacobine3mo81

A single human brain has the energy demands of a lightbulb, instead of the energy demands for all the humans in Wyoming.

This is a non sequitur. The reasons AI models don't have the energy demands of a lightbulb isn't because they're too big and current algorithms are too inefficient. Quite the contrary, an actual whole brain emulation would require the world's largest supercomputer. Current computers are just nowhere near as efficient as the human brain.

[-]Mikhail Samin3mo7-2

I think if you expect some architecture to be a lot more efficient than current LLMs, you should not talk about it!

LessWrong isn't a secret society with vows to not spread the knowledge we've learned here if it might cause the end of the world. It's a public website, and many who work on AI capabilities read it.

If you want the community to pay attention to the fact that other architectures might scale well- that's okay, feel free to talk about it.

If you think you have exceptionally good intuitions about what will scale, sharing this publicly eats maybe the single most valuable common resource that we have: the timeline. Please don't do that & don't accelerate AI capabilities.

[-]testingthewaters3mo111

Hey Mikhail,

I don't claim to have any secret knowledge or exceptional intuition about what will scale well. Everything I link to is already public, and already quite well known (test time training was the number 2 pinned post on r/singularity, hierarchical reasoning model is blowing up X/twitter, Arc-agi without pretraining was front page on hacker news). In fact, commenters keep bringing up links that I already featured in the main post, suggesting that these developments are well known even within the lesswrong memesphere.

With that in mind, my goal was less to warn about any individual architecture but to point out the overall trend of alternative architecture work I'm observing, and some reasons I expect the trend to continue. I want to make sure that AI safety does not over index on LLM safety and ignore other avenues of risk. Basically exactly what you said here:

If you want the community to pay attention to the fact that other architectures might scale well- that's okay, feel free to talk about it.

Hope this makes sense.

[-]kaiwilliams2mo30

How does ARC-AGI's replication of the HRM result and ablations update you? [Link].

Basically, they claim that the HRM wasn't important; instead it was the training process behind it that had most of the effect.

[-]Cole Wyeth2mo20

And most of those tricks seem unlikely to generalize beyond ARC.

Though the refinement process does sound a bit like the feared “neuralese.” I’m not too worried about this though - the problem with this kind of recurrence is that it doesn’t scale, and HRMs are indeed small models that lag SOTA. So, I don’t see much reason to expect it to work this time??

[-]testingthewaters2mo10

Both the ARC-AGI replication and GPT-5's strong performance on agentic evals* have moved me away from there being a chance of a very rapid rise for a new AI paradigm. I expect my concrete predictions to be less likely to be true. However, I still stand by my original point behind this post: it is not a particular model but a line of research which is of concern, and I still think the research could bear fruit in unexpected and not-priced-in ways.

* - Including cyber!! The X-Bow report has really not been discussed here that much, people just seem to take openai's word for GPT-5 not being a big step ahead in agentic threat models

[-]otto.barten3mo*30

Super interesting post! I'm agnostic on whether this will happen and when, but I have something to add to the what we should do-section.

You are basically only talking there about alignment-action on the new models. I think that would be good to do, but at the same time I'm sceptical about alignment as a solution. Reasons include that I'm uncertain about the offense-defense balance in a multipolar scenario and very sceptical that the goals we set for an ASI in a unipolar scenario will be good in the medium term (>10 yrs) (even if we solve technical alignment). I don't think humanity is ready for having a god, even a steerable god. In addition, of course it could be that technical alignment does not get solved (in time), which is a more mainstream worry on LW.

Mostly for these reasons I put more trust in a regulatory approach. In this approach, we'd first need to inform the public (which is what I worked on the past four years), about the dangers of superintelligence (incl. human extinction), and then states would coordinate to arrive at global regulation (e.g. via our proposal, the Conditional AI Safety Treaty). By now, similar approaches are fairly mainstream in MIRI (because of technical alignment reasons), EA, FLI, PauseAI, and lots of other orgs. Hardware regulation is the most common way to enforce treaties, with sub-approaches such as FlexHEGs and HEMs.

If AGI would need a lot less flops, this would get a lot more difficult. I think it's plausible that we arrive at this situation due to a new paradigm. Some say hardware regulation is not feasible at all anymore in such a case. I think it depends on the specifics: how many flops are needed, how much societal awareness do we have, which regulation is feasible?

I think that in addition to your what we should do-list, we should also:

Try our best to find out how many flops, how much memory, and how much money are/is needed for takeover-level AI (a probability distribution may be a sensible output).
For the most likely outcomes, figure out hardware regulation plans that would likely be able to pause/switch off development in case political support is available. (My org will work on this as well.)
Double down on FlexHEG/HEM hardware regulation options, while taking into account the scenario that a lot less flops/memory/money might be needed than previously expected.
Double down on increasing public awareness of xrisk.
Explore options beyond hardware regulation that might succeed in enforcing a pause/off switch for a longer time, while doing as little damage as possible.

[-]Gordon Seidoh Worley3mo20

I agree that this is the dangerous thing to build and we shouldn't do it until we are sure we know how to align it, but I know we'll probably just build it anyway.

[-]dr_s3mo20

Is there so much difference in terms of transferability of alignment techniques? For example, in EBT, the verification model doesn't sound too different from the architectures we have to day, and in fact an excellent candidate to be the target of alignment and act as the "conscience" of the continuously running model.

[-]Josh Snider3mo10

https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/foom-and-doom-1-brain-in-a-box-in-a-basement (and the sequel) seem highly related.

[-]testingthewaters3mo10

Yeah, the original post quoted that post directly and it was definitely a big inspiration (see another comment I left on a different thread).

[-]Igor Ivanov3mo10

How do you see we can prepare in advance to study safety of such systems?

[-]testingthewaters3mo20

That's kind of the overall purpose behind my project, which generally aims to study how machine learning and human learning can overlap. The idea is to see where the theoretical overlap is and see how we can leverage that combined with empirical evidence to show how easy/difficult things like lie detection or value transfer are.

[-]ZY3mo10

Do agree not to just focus on LLM (LLM base or agents), but also other architectures.

LESSWRONG
LW

LESSWRONG
LW

254

I am worried about near-term non-LLM AI developments

254

254

TL;DR

Overview

The Agenda I am Worried About

Concrete Predictions

What I think we should do