2211

LESSWRONG
LW

2210
Crucial ConsiderationsForecasts (Specific Predictions)Reinforcement learningAI
Frontpage
2025 Top Fifty: 14%

248

I am worried about near-term non-LLM AI developments

by testingthewaters
31st Jul 2025
6 min read
56

248

248

I am worried about near-term non-LLM AI developments
61Trent Hodgeson
41JasonBrown
4otto.barten
3Yair Halberstadt
1otto.barten
5JasonBrown
27ACCount
14p.b.
3tailcalled
2p.b.
2ryan_b
1Fejfo
1ACCount
21Cole Wyeth
11testingthewaters
16Cole Wyeth
3testingthewaters
16Cole Wyeth
10testingthewaters
9Cole Wyeth
5testingthewaters
7Random Developer
40Steven Byrnes
2RussellThor
12ACCount
7Kaj_Sotala
6testingthewaters
7RussellThor
9james oofou
11Random Developer
3Vladimir_Nesov
3testingthewaters
17alexlyzhov
8Kaj_Sotala
7alexlyzhov
12Erik Jenner
11Noosphere89
11Galen
4testingthewaters
6Galen
5Galen
8SorenJ
7Mikhail Samin
11testingthewaters
7Matrice Jacobine
3kaiwilliams
2Cole Wyeth
1testingthewaters
3otto.barten
2Gordon Seidoh Worley
2dr_s
1Josh Snider
1testingthewaters
1Igor Ivanov
2testingthewaters
1ZY
New Comment
56 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:26 PM
[-]Trent Hodgeson2mo6144

To the degree worries of this general shape are legitimate (we think they very much are), seems like it would be wise for the alignment community to more seriously pursue and evaluate tons of neglected approaches that might solve the fundamental underlying alignment problem, rather than investing the vast majority of resources in things like evals and demos of misalignment failure modes in current LLMs, which definitely are nice to have, but almost certainly won't themselves directly yield scalable solutions to robustly aligning AGI/ASI.

Reply1
[-]JasonBrown2mo412

I made a manifold post for this for those who wish to bet on it: https://manifold.markets/JasonBrown/will-a-gpt4-level-efficient-hrm-bas?r=SmFzb25Ccm93bg

Reply221
[-]otto.barten1mo*41

It's currently on 12%. 12% seems really xrisk-relevant to me. I think xrisk could increase quite a bit if such a model would be much more lightweight (>=2 OOMs) than LLMs.

Also I think this would still be very relevant if the timeline was longer. Maybe a new post for eg five years?

Reply
[-]Yair Halberstadt1mo32

Currently on 9%

That's the level where manifold can't really tell you about exact probability because betting no ties up a lot of capital for minimal upside.

Also by 2026 I'd expect to have GPT 4 level LLMs with 1/10th the parameter count just due to algorithmic improvements (maybe I'm wildly wrong here) so doing the same but with different architecture isn't necessarily as indicative as it seems.

Reply
[-]otto.barten1mo10

Doing the same with a different architecture could open up the possibility of doing better down the road. I'd be equally interested in how fast it gets better as in how good it is. It would also beg the question: if two architectures can do this, how many more? Do they all max out at the same point or not at all? I think it could be quite important. Would be curious how big experts think the probability is that a different architecture could do LLM level thinking on a reasonably wide range of tasks in say five or ten years.

Reply
[-]JasonBrown1mo50

I've ended up making another post somewhat to this effect, trying to predict any significant architectural shifts over the next year and a half: https://manifold.markets/Jasonb/significant-advancement-in-frontier

Reply
[-]ACCount2mo2717

There is an awful lot of "promising new architectures" being thrown around. Few have demonstrated any notable results whatsoever. Fewer still have demonstrated their ability to compete with transformer LLMs on the kind of task transformer LLMs are well suited for.

It's basically just Mamba SSM and diffusion models, and they aren't "better LLMs". They seem like sidegrades to transformer LLMs at best.

HRMs, for example, seem to do incredibly, suspiciously well on certain kinds of puzzles, but I'm yet to see them do anything in language domain, or in math, coding, etc. Are HRMs generalists, like transformers? No evidence of that yet.

Concretely, these are the developments I am predicting within the next six months (i.e. before Feb 1st 2026) with ~75% probability:

Basically, off the top of my head: I'd put 10% on that. Too short of a timeframe.

Reply
[-]p.b.2mo142

SSMs are really quite similar to transformers. Similar to all the "sub-quadratic" transformer variants the expectation is at best that they will do the same thing but more efficiently than transformers. 

HRMs or continuous thought machines or KANs on the other hand contain new and different ideas that make a discontinuous jump in abilities at least conceivable. So I think one should distinguish between those two types of "promising new architectures".

My view is that these new ideas accumulate and at some points somebody will be able to put them together in a new way to build actual AGI. 

But the authors of these papers are not stupid. If there was straightforward applicability to language modelling they would already have done that. If there was line of sight for GPT4 level abilities in six month they probably wouldn't publish the paper. 

Reply2
[-]tailcalled2mo30

KANs seem obviously of limited utility to me...?

Reply
[-]p.b.2mo20

I think it is a cool idea and has its application but you are right that it seems very unlikely to contribute to AGI in any way. But there was nonetheless excitement about integrating KANs into transformers which was easy to do but just didn't improve anything. 

Reply
[-]ryan_b1mo20

Ah, but is it a point-in-time sidegrade with a faster capability curve in the future? At the scale we are working now, even a marginal efficiency improvement threatens to considerably accelerate at least the conventional concerns (power concentration, job loss, etc).

Reply
[-]Fejfo1mo10

It's my impression that a lot of the "promising new architectures" are indeed promising. IMO a lot of them could compete with transformers if you invest in them. It just isn't worth the risk while the transformer gold-mine is still open. Why do you disagree?

Reply
[-]ACCount1mo10

I disagree because I'm yet to see any of those "promising new architectures" outperform even something like GPT-2 345M, weight for weight, at similar tasks. Or show similar performance with a radical reduction in dataset size. Or anything of the sort.

I don't doubt that a better architecture than LLM is possible. But if we're talking AGI, then we need an actual general architecture. Not a benchmark-specific AI that destroys a specific benchmark, but a more general purpose AI that happens to do reasonably well at a variety of benchmarks it wasn't purposefully trained for.

We aren't exactly swimming in that kind of thing.

Reply
[-]Cole Wyeth2mo2110

I think you haven't given sufficient reason to expect your predictions to come true.

Most things don't scale. Why should we expect HRMs to be an exception?

Reply2
[-]testingthewaters2mo11-12

Mostly because the efficiencies being claimed here are truly staggering. HRM claims to have SOTA performance on ARC-AGI problem sets with 27 million parameters and 1000 training examples (with no general pretraining!). O3-mini-high, the next model on the board, has a parameter count on the order of hundreds of billions and was trained on the entire internet. This is a 3700x efficiency improvement in param count and probably more than 10000x improvement in data efficiency. At this scale even if the scaling peters out at 1000x you are talking about an entirely new paradigm, especially now that there is a partial replication as well.

Reply
[-]Cole Wyeth2mo167

Well, o3 wasn’t optimized to perform well on ARC-AGI as its primary purpose - is this a fair comparison? 

Reply
[-]testingthewaters2mo32

I've also been tracking the development of these kinds of techniques for a year, and they have consistently been showing surprising improvements. The test time training people have been publishing for a while, and the ceiling seems nowhere in sight.

To be honest a complete recounting of why I believe this is somewhat beyond a comment's length. I also feel very strongly that progress is accelerating, and that if I don't say something now there will not be much time to react. Hence the post.

Reply11
[-]Cole Wyeth2mo1612

Perhaps it's beyond the length of a comment, but why not recount it in the post?

Reply
[-]testingthewaters2mo103

I think that the most legible arguments are already in the post. The only thing I would make clearer is that the human brain is an existence proof that such highly efficient continuous learning algorithms exist, and therefore I see the development of these models as not particularly surprising. Stephen Brynes has some similar intuitions in his Foom and Doom series.

Reply21
[-]Cole Wyeth2mo93

I read (some of) that series and disagreed with his assessment on the same grounds. 

Reply
[-]testingthewaters2mo52

In which case, the best thing is probably for us to wait and see if the predictions come to pass, if that's okay with you. I might also be afk for a bit so might not be able to immediately reply to any further comments.

Reply
[-]Random Developer2mo70

The other thing that seems strange here is that the parameter counts are far, far below those of the human brain. I mean, yeah, o3 is probably 1 or 2 OOM below the brain, but that could be explained away by the fact that o3 is still missing some very basic human abilities, and perhaps it has found interesting efficiencies.

But 27 million parameters is just so many OOM below the human brain that I'm expecting a gotcha. Even at a 60-bit quantization, that's only about 54MB. Which means that the model's world knowledge must be ridiculously limited compared to even simple LLMs. Maybe if it's a specialist model that's tuned for just a handful of very specific benchmarks?

Reply
[-]Steven Byrnes2mo4012

“An HRM model trained to do ARC-AGI and nothing else” seems vaguely analogous to “An AlphaZero model trained to play Go and nothing else”. Right?

If you buy that, then I would note that HRM & AlphaZero probably have quite similar numbers of parameters (27M vs maybe 23M, see footnote here). And AlphaZero was just a plain old ResNet, I think.

So I don’t see “HRM solves ARC-AGI with 27M parameters” as evidence for HRM having unusual parameter efficiency. Right? Sorry if I’m misunderstanding.

Reply
[-]RussellThor2mo20

It is weak evidence, we simply won't know until we scale it up. If it is automatically good at 3d spatial understanding with extra scale up, then that starts to become evidence it has better scaling properties. (To me it is clear that LLM/Transformers won't scale to AGI, xAI already has close to maxed out scaling and Tesla autopilot probably does everything mostly right but is far less data efficient than people)

Reply
[-]ACCount2mo125

ARC-AGI, v1 and v2 both, is a very spatial-reasoning-shaped problem. And LLMs are not very spatial-reasoning-shaped.

It could be that this arch is an unusually good fit for spatial reasoning problems, and a poor fit for others. 

We haven't seen it used for either text generation or image generation, both of which are hot topics in AI right now. Which is very weak evidence that it's unsuitable for that kind of task. And much stronger evidence that the authors couldn't get it to work on this kind of task.

Reply
[-]Kaj_Sotala2mo71

I wonder how much we need to worry about hybrid architectures. If LLMs do text generation well and continuous learning models do spatial reasoning well, and someone figures out an architecture that lets their strengths synergize with each other...

Reply
[-]testingthewaters2mo60

That is the basic idea behind Energy based transformers and test time training!

Reply1
[-]RussellThor2mo73

OK our intelligence is very spatial-reasoning shaped. Bio architecture can't do language until it has many params. If it is terrible at text or image gen that isn't evidence it won't in fact scale to AGI and best Transformers with more compute. We simply won't know until it is scaled up.

Reply
[-]james oofou2mo95

Doesn't the human brain's structure provide something closer to an upper bound rather than a lower bound on the number of parameters required for higher reasoning? 

Higher reasoning evolved in humans over a short period of time. And it is speculated that it was mostly arrived at simply by scaling up chimp brains.

This implies that our brains are very far from optimised for higher reasoning, so we should expect that to whatever extent factors other than scale can contribute to higher-reasoning ability, it is possible for brains smaller than our own to engage in higher reasoning. 

The human brain should be seen as evidence that a certain scale is ~sufficient, but not that it is necessary. 

Reply
[-]Random Developer1mo116

The human brain should be seen as evidence that a certain scale is ~sufficient, but not that it is necessary.

The human brain is often estimated to have 10^14 synapses, which would be a 100T model, give or take. Except that individual neurons also have a bunch of internal parameters, which might complicate things.

If you told me that the human brain was massively inefficient, and that you had squeezed human level AGI into 1T parameters, I would be only mildly surprised.

For that matter, if you told me you had squeezed a weak AGI into 30B parameters, I'd be interested in the claim. Qwen3 really is surprisingly capable in that size range. If you told me 4B, I'd be very skeptical, but then again, Gemma 3n does implausibly well on my diverse private benchmarks, and it's technically multi-modal. At the very least, I'd accept it as the premise of a science fiction horror story about tiny, unaligned AIs.

But if we drop all the way to 30 million parameters, I am profoundly suspicious of any kind of general model with language skills and reasonable world knowledge. Even if you store language and world knowledge as compressed text files, you're going to his some pretty hard limits at that size. That's a 60MB ZIP file or less. You'd be taking about needing only 1/3,000,000th of parameters of the brain. Which is a lot of orders of magnitude.

At that size, I'm assuming that any kind of genuinely interesting model would be something like AlphaGo, that demonstrates impressive knowledge and learning abilities in a very narrow domain. Which is fine! It might even be the final warning that AGI is inevitable. But I would still expect more than 6 months would be required to scale back up from such a tiny model to something with general world knowledge, language and common sense.

Reply
[-]Vladimir_Nesov2mo31

Higher reasoning evolved in humans over a short period of time. ... The human brain should be seen as evidence that a certain scale is ~sufficient, but not that it is necessary.

We can still see that a chimp scale brain with this architecture isn't sufficient, and human-built AI architectures were also only developed over a short period of time. Backprop and large scale training in parallel for one individual might give AIs an advantage that chimp/human brains don't have, but unclear if this overcomes the widely applicable unhobbling from the much longer efforts by evolution to build minds for efficient online learning robots.

Reply
[-]testingthewaters2mo30

To be clear, I don't think that HRM with 27 million params will be natural language capable. However, if my assumptions are correct, a scaled up version of HRM should be able to attain performance similar to frontier models while learning online and being relatively much smaller in size (1-2 OOMs smaller, based on a rough hunch).

Reply1
[-]alexlyzhov1mo170

I was very impressed with the ARC-AGI results so I read the entire paper and also browsed the code a fair amount.

Only after browsing the code I realized that they likely train on all evaluation tasks in addition to training tasks—correct me if I'm wrong. During inference they only condition on x* and on the task embedding to predict y*, instead of on (x,y)_{1..3}. The only way they could get that task embedding is by training it. Evaluation tasks are harder than those in the train set.

They score a lot higher than the baseline transformer so clearly there's a lot of merit in what they're doing. But in the setting of training on evaluation tasks you can train on only 1 target ARC-AGI-1 task instead of 1000 tasks and still get 20% accuracy: https://iliao2345.github.io/blog_posts/arc_agi_without_pretraining/arc_agi_without_pretraining.html. Given this, it doesn't look earth-shattering.

Reply
[-]Kaj_Sotala1mo*80

It does sound like it, someone on Twitter pointed out this page 12 of the HRM paper (emphasis added):

For ARC-AGI challenge, we start with all input-output example pairs in the training and the evaluation sets. The dataset is augmented by applying translations, rotations, flips, and color permutations to the puzzles. Each task example is prepended with a learnable special token that represents the puzzle it belongs to. At test time, we proceed as follows for each test input in the evaluation set: (1) Generate and solve 1000 augmented variants and, for each, apply the inverse-augmentation transform to obtain a prediction. (2) Choose the two most popular predictions as the final outputs.3 All results are reported on the evaluation set.

Which sounds like they used the initial training and evaluation set for training, then generated a new set of "augmented variants" of the original puzzles for the actual evaluation. So when the model is evaluated, it knows that "this is a variant of puzzle X which I've seen before".

EDIT: Though also somebody on Twitter, it sounds like this might be just the ARC-AGI materials being confusingly named?

No foul play here. The ARC-AGI dataset is confusing because it has 2 different kinds of "train" sets. There are train set puzzles and train input-output examples within each puzzle. They trained on only the train examples of the train and validation puzzle sets, which is fair. [...]

It's "all input-output example pairs" not "all input-output pairs". The examples are the training data in ARC-AGI. 

Reply
[-]alexlyzhov1mo73

They trained on only the train examples of the train and validation puzzle sets, which is fair.

Yes, I agree—I wasn't implying that's foul play. I just thought it's less impressive than I thought because:

  • It's finetuning on task examples and not in-context few-shot learning
  • They finetune on the harder evaluation set and not only on easier train set, so they don't demonstrate generalization across the easy->hard distribution shift
  • The result I linked to was 20% on ARC-AGI-1 by only fitting examples for 1 evaluation task using an MLP-type network vs the 40% result in the paper using 1000 tasks. These numbers are not directly comparable because they did a fair bit of custom architectural engineering to reach 20%, but it really put 40% in perspective for me.
Reply
[-]Erik Jenner1mo120

There are literal interpretations of these predictions that aren't very strong:

  1. I expect a new model to be released, one which does not rely on adapting pretrained transformers or distilling a larger pretrained model
  2. It will be inspired by the line of research I have outlined above, or a direct continuation of one of the listed architectures
  3. It will have language capabilities equal to or surpassing GPT-4
  4. It will have a smaller parameter count (by 1-2+ OOMs) compared to GPT-4

GPT-4 was rumored to have 1.8T parameters, so <180B parameters would technically satisfy 4. My impression is that current ~70B open-weight models (e.g. Qwen 2.5) are already roughly as good as the original GPT-4 was. (Of course that's not a fair comparison since the 1.8T parameter rumor is for an MoE model.)

So the load-bearing part is arguably "inspired by [this] line of research," but I'm not sure what would or wouldn't count for that. E.g. a broad interpretation could argue that any test-time training / continual learning approach would count, even if most of the capabilities still come from pretraining similar to current approaches. (Still a non-trivial prediction to be clear!)

My impression was that you're intending to make stronger claims than this broad interpretation. If so, you could consider picking slightly different concretizations to make the predictions more impressive if you end up being right. For example, I'd consider 2 OOMs fewer parameters than GPT-4 noticeably more impressive than 1 OOM (and my guess would be that the divergence between your view and the broader community would be even larger at 3 OOMs fewer parameters). Might be even better to tie it to compute and/or training data instead of parameters. You could also try to make the "inspired by this research" claim more concrete (e.g. "<10% of training compute before model release is spent on training on offline/IID data", if you believe some claim of that form).

Reply
[-]Noosphere891mo115

For what it's worth, while I do think this will matter 5-10 years down the line, or more years, I am currently relatively skeptical that this will be achieved soon, and my current view is that a lot of these papers will have gotchas that make them less useful than they think.

The HRM paper has a gotcha pretty much immediately:

https://www.lesswrong.com/posts/tEZa7PouYatK78bbb/?commentId=ELTcESCdWjikCq3HT

That said, worth keeping an eye out here.

I'd put my probability on a new architecture surpassing transformers within 2 years to be much closer to 1%.

Reply
[-]Galen1mo112

Flagging that the HRM paper strongly reads as low-substance, after seeing this post I revisited it for a deeper read to fully understand their method and for me this confirmed initial impressions. I used to get very excited about every novel architecture published and over time I think there's some amount of cognitive immunity you can build up, e.g. spending most of the paper rehashing vague "inspirations" tends to be a dark pattern employed when you want to make your use of a standard method seem more novel than it is. 

I don't really have the time to dissect the paper but a good general heuristic is understanding something well enough to e.g. re-implement it in pytorch prior to accepting their results at face value. If this is the case here and you still believe that it's a meaningful research advance then it's prob just a difference in research taste and you should ignore this comment.

Otherwise I agree with the general take here.

Reply
[-]testingthewaters1mo40

I agree that the paper is not terribly novel. There is no new maths here or radical new technique. If you wanted to be reductive you could just call it two RNN modules stacked on top of each other run at different clock speeds. But the fact that this simple setup is enough to trigger such outsized improvements (which have been partially replicated externally ) is what is alarming to me.

Reply
[-]Galen1mo63

I agree that the results are legit, just taking issue with the authors presenting them without prior work context (e.g. setting the wrong reference class s.t. the improvement over baselines appears larger). RNNs getting outsized performance on maze/sudoku is to be expected and the main ARC result seems to be more of a strong data augmentation + SGD baseline rather than something unique to the architecture, ARC-1 was pretty susceptible to this (eg ARC-AGI Without Pretraining)

This being said I think it's a big deal that various RNN architectures have such different characteristics on these limit cases for transformers points to a pretty large jump in capabilities when scaling/pretraining is cracked. I think it'd be good for more people working on alignment to be study what types of behaviors are exhibited in these sorts of models at small scale, with the expectation that the paradigm will eventually shift in this direction.

Reply
[-]Galen1mo50

Update: ARC has published a blog post analyzing this, https://arcprize.org/blog/hrm-analysis. As expected swapping in a transformer works approx the same.

Reply
[-]SorenJ1mo80

What are your priors, or what is your base rate calculation, for how often promising new ML architectures end up scaling and generalizing well to new tasks?

Reply
[-]Mikhail Samin1mo7-2

I think if you expect some architecture to be a lot more efficient than current LLMs, you should not talk about it!

LessWrong isn't a secret society with vows to not spread the knowledge we've learned here if it might cause the end of the world. It's a public website, and many who work on AI capabilities read it.

If you want the community to pay attention to the fact that other architectures might scale well- that's okay, feel free to talk about it.

If you think you have exceptionally good intuitions about what will scale, sharing this publicly eats maybe the single most valuable common resource that we have: the timeline. Please don't do that & don't accelerate AI capabilities.

Reply
[-]testingthewaters1mo111

Hey Mikhail,

I don't claim to have any secret knowledge or exceptional intuition about what will scale well. Everything I link to is already public, and already quite well known (test time training was the number 2 pinned post on r/singularity, hierarchical reasoning model is blowing up X/twitter, Arc-agi without pretraining was front page on hacker news). In fact, commenters keep bringing up links that I already featured in the main post, suggesting that these developments are well known even within the lesswrong memesphere.

With that in mind, my goal was less to warn about any individual architecture but to point out the overall trend of alternative architecture work I'm observing, and some reasons I expect the trend to continue. I want to make sure that AI safety does not over index on LLM safety and ignore other avenues of risk. Basically exactly what you said here:

If you want the community to pay attention to the fact that other architectures might scale well- that's okay, feel free to talk about it.

Hope this makes sense.

Reply1
[-]Matrice Jacobine1mo71

A single human brain has the energy demands of a lightbulb, instead of the energy demands for all the humans in Wyoming.

This is a non sequitur. The reasons AI models don't have the energy demands of a lightbulb isn't because they're too big and current algorithms are too inefficient. Quite the contrary, an actual whole brain emulation would require the world's largest supercomputer. Current computers are just nowhere near as efficient as the human brain.

Reply
[-]kaiwilliams13d30

How does ARC-AGI's replication of the HRM result and ablations update you? [Link].

Basically, they claim that the HRM wasn't important; instead it was the training process behind it that had most of the effect.

Reply
[-]Cole Wyeth13d20

And most of those tricks seem unlikely to generalize beyond ARC.

Though the refinement process does sound a bit like the feared “neuralese.” I’m not too worried about this though - the problem with this kind of recurrence is that it doesn’t scale, and HRMs are indeed small models that lag SOTA. So, I don’t see much reason to expect it to work this time??

Reply
[-]testingthewaters13d10

Both the ARC-AGI replication and GPT-5's strong performance on agentic evals* have moved me away from there being a chance of a very rapid rise for a new AI paradigm. I expect my concrete predictions to be less likely to be true. However, I still stand by my original point behind this post: it is not a particular model but a line of research which is of concern, and I still think the research could bear fruit in unexpected and not-priced-in ways.

* - Including cyber!! The X-Bow report has really not been discussed here that much, people just seem to take openai's word for GPT-5 not being a big step ahead in agentic threat models

Reply
[-]otto.barten1mo*30

Super interesting post! I'm agnostic on whether this will happen and when, but I have something to add to the what we should do-section.

You are basically only talking there about alignment-action on the new models. I think that would be good to do, but at the same time I'm sceptical about alignment as a solution. Reasons include that I'm uncertain about the offense-defense balance in a multipolar scenario and very sceptical that the goals we set for an ASI in a unipolar scenario will be good in the medium term (>10 yrs) (even if we solve technical alignment). I don't think humanity is ready for having a god, even a steerable god. In addition, of course it could be that technical alignment does not get solved (in time), which is a more mainstream worry on LW.

Mostly for these reasons I put more trust in a regulatory approach. In this approach, we'd first need to inform the public (which is what I worked on the past four years), about the dangers of superintelligence (incl. human extinction), and then states would coordinate to arrive at global regulation (e.g. via our proposal, the Conditional AI Safety Treaty). By now, similar approaches are fairly mainstream in MIRI (because of technical alignment reasons), EA, FLI, PauseAI, and lots of other orgs. Hardware regulation is the most common way to enforce treaties, with sub-approaches such as FlexHEGs and HEMs.

If AGI would need a lot less flops, this would get a lot more difficult. I think it's plausible that we arrive at this situation due to a new paradigm. Some say hardware regulation is not feasible at all anymore in such a case. I think it depends on the specifics: how many flops are needed, how much societal awareness do we have, which regulation is feasible?

I think that in addition to your what we should do-list, we should also:

  • Try our best to find out how many flops, how much memory, and how much money are/is needed for takeover-level AI (a probability distribution may be a sensible output).
  • For the most likely outcomes, figure out hardware regulation plans that would likely be able to pause/switch off development in case political support is available. (My org will work on this as well.)
  • Double down on FlexHEG/HEM hardware regulation options, while taking into account the scenario that a lot less flops/memory/money might be needed than previously expected.
  • Double down on increasing public awareness of xrisk.
  • Explore options beyond hardware regulation that might succeed in enforcing a pause/off switch for a longer time, while doing as little damage as possible.
Reply
[-]Gordon Seidoh Worley1mo20

I agree that this is the dangerous thing to build and we shouldn't do it until we are sure we know how to align it, but I know we'll probably just build it anyway.

Reply
[-]dr_s1mo20

Is there so much difference in terms of transferability of alignment techniques? For example, in EBT, the verification model doesn't sound too different from the architectures we have to day, and in fact an excellent candidate to be the target of alignment and act as the "conscience" of the continuously running model.

Reply
[-]Josh Snider2mo10

https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/foom-and-doom-1-brain-in-a-box-in-a-basement (and the sequel) seem highly related.

Reply
[-]testingthewaters2mo10

Yeah, the original post quoted that post directly and it was definitely a big inspiration (see another comment I left on a different thread).

Reply
[-]Igor Ivanov2mo10

How do you see we can prepare in advance to study safety of such systems?

Reply
[-]testingthewaters2mo20

That's kind of the overall purpose behind my project, which generally aims to study how machine learning and human learning can overlap. The idea is to see where the theoretical overlap is and see how we can leverage that combined with empirical evidence to show how easy/difficult things like lie detection or value transfer are.

Reply
[-]ZY2mo10

Do agree not to just focus on LLM (LLM base or agents), but also other architectures. 

Reply
Moderation Log
More from testingthewaters
View more
Curated and popular this week
56Comments
Crucial ConsiderationsForecasts (Specific Predictions)Reinforcement learningAI
Frontpage

TL;DR

I believe that:

  • There exists a parallel track of AI research which has been largely ignored by the AI safety community.  This agenda aims to implement human-like online learning in ML models, and it is now close to maturity. Keywords: Hierarchical Reasoning Model, Energy-based Model, Test time training.
  • Within 6 months this line of research will produce a small natural-language capable model that will perform at the level of a model like GPT-4, but with improved persistence and effectively no "context limit" since it is constantly learning and updating weights.
  • Further development of this research will produce models that fulfill most of the criteria we associate with "AGI".

Overview

Almost all frontier models today share two major features in their training regime: they are trained offline and out of sequence. 

  • By offline, I mean that there is a distinct "pretraining" phase, followed by a separate "post-training", "character/safety training", or "fine-tuning" phase, and then finally a "deployment" phase. New iterations of the models must also go through these stages. In almost all cases, deployed models do not receive weight updates when they run and perform "inference".
  • By out of sequence, I mean that models receive random samples from the training set, instead of continuous sequences of tokens, characters, images, or audio samples. This is a necessary result of batched SGD, which attempts to both approximate an ideal gradient descent algorithm and also prevent problems like catastrophic forgetting. As a result, models cannot use past inputs to predict future inputs, only learn a general solution to the task they are being trained on.

These features of training regimes make sense if you believe the classic function approximation or statistical approximation explanation of machine learning. In this story the model is meant to learn some fixed "target distribution" or "target function" by sampling i.i.d. data points from the training set. The model is then tested on a holdout "test set" which contains new input-output pairs from the same target distribution or function. If the model generalises across the train set and test set, it is considered a good model of the target.

For many reasons, this story makes no sense when applied to the idea of AGI or trying to develop an ML model that is good at navigating the real world. Humans do not spend the first few years of their lives in a sensory deprivation tank, getting random webcam footage from different places on earth before they stop learning forever and are "deployed" into reality. Furthermore, if your plan is to learn the fixed distribution of all possible english sentences, you will naturally need a representative sample of... all possible english sentences. This is not how humans acquire language skills either, and explains why current ML approaches to natural language generation are becoming prohibitively expensive.

Most of us would agree that humans learn continuously, meaning that they learn online and in sequence. Instead of seeing a wide "context" made up of randomly sampled data from all across the internet, we have a very narrow "context" focused on the here and now. To make up for this, we are able to leverage our memories of the immediate and distant past to predict the future. In effect we live in one continuous learning "episode" that lasts from the moment we are born to the moment we die. Naturally, AI researchers have tried to find ways to replicate this in ML models.

The Agenda I am Worried About

I think that the AI safety community has seriously overindexed on LLMs and ChatGPT-style model safety. This is a reasonable choice, because LLMs are a novel, truly impressive, and promising line of AI development. However, in my opinion research into online in-sequence learning has the potential to lead to human-level AGI much more quickly. A single human brain has the energy demands of a lightbulb, instead of the energy demands for all the humans in Wyoming.

I am not alone in this belief. Research into online in-sequence learning has focused around small model, RNN-like approaches which do not use backpropagation through time. Instead, models must update their weights online and generalise based on only one input at a time, forcing them to learn how to leverage their memory/hidden state to predict future data points if they wish to be effective. By contrast, transformers are explicitly encouraged to memorise surface-level patterns to blindly apply to large blocks of context, instead of internalising the content and using it to predict future inputs.

Some notable papers applying this research include the Hierarchical Reasoning Model, Energy-based Models, ARC-AGI without pretraining, and Test Time Training. Some of these techniques (like Test Time Training or Energy-based Models) augment existing transformer architectures, while others represent entirely novel architectures like the ARC-AGI with no pretraining model and the Hierarchical Reasoning Model. These models share the same idea of getting more information out of each data point than a single backpropagation pass can extract. For example, Test Time Training uses a neural network as its hidden state. It also has an internal update step where key information contained in any incoming data point is compressed into the weights of the hidden state network. ARC-AGI without pretraining trains a new network on each data point (a single puzzle in the ARC-AGI corpus), again aiming to get some compressed representation of the key structural information contained in that puzzle. The Hierarchical Reasoning Model and Energy-based Model iterate on their internal representations either for some fixed number of cycles or until some convergence threshold is reached. That way they can extract maximum information from each data point and give themselves more "thinking time" compared to transformers, which must output the next token immediately after one forward pass. The Hierarchical Reasoning Model also uses higher and lower level recurrent modules to separate cognition into low level execution steps and high level planning steps.

So far, research within this track has produced strong/claimed-to-be-SOTA results for ARC-AGI 1, ARC-AGI 2 (along with Sudoku and Maze-solving), and long-context video generation. These models excel at tasks current frontier LLMs or reasoning models struggle at, or radically improve the otherwise lacklustre performance of standard LLMs. Despite differences in implementation, models in this line of research generally learn online and in linear time (i.e. seeing one data point after another, without parallelisation schemes like BPTT). They generally take less data to train, have smaller parameter counts, and have better bootstrapping performance (generalising based on a limited number of data points). Many of them also claim to be inspired by brain-like architectures or how humans learn. I think that of the approaches I have listed above the Hierarchical Reasoning Model is the most promising candidate to come out of this line of research so far.

Concrete Predictions

I believe that within 6 months this line of research will produce a small natural-language capable model that will perform at the level of a model like GPT-4, but with improved persistence and effectively no "context limit" since it is constantly learning and updating weights. It is likely that this model will not come from an existing major frontier lab, but rather a smaller lab focused on this line of research like Sapient (who developed the Hierarchical Reasoning Model). The simplest case would be something like "We have adapted the HRM for natural language tasks and scaled it up, and it just works".

I believe that further development of this research will produce models that fulfill most of the criteria we associate with "AGI". In general I define this as a model that learns continuously and online from new data, generalises efficiently to new domains while avoiding catastrophic forgetting, and is skilled in a wide variety of tasks associated with human intelligence: natural language generation and understanding, pattern matching, problem solving, planning, playing games, scientific research, narrative writing etc.

Concretely, these are the developments I am predicting within the next six months (i.e. before Feb 1st 2026) with ~75% probability:

  • I expect a new model to be released, one which does not rely on adapting pretrained transformers or distilling a larger pretrained model
  • It will be inspired by the line of research I have outlined above, or a direct continuation of one of the listed architectures
  • It will have language capabilities equal to or surpassing GPT-4
  • It will have a smaller parameter count (by 1-2+ OOMs) compared to GPT-4

Bonus points:

  • It will not be from a major lab (OpenAI, Google, Anthropic, Facebook)
  • It will feature continuous learning prominently as a selling point

What I think we should do

  • Move some resources away from LLM centric safety efforts and investigate these new architectures
  • Examine the possibility of aligning continuous learning models or facilitating value transfer
    • What does it mean for a model that is constantly updating its weights to be "aligned" or "safe"?
    • Is there overlap between how continuous learning/learning in general might work in models vs. humans? (This is my current research project, if you have ideas please reach out)
  • Examine the possibilities of alignment techniques or plans that do not involve pretraining and then aligning a model which is finally "deployed" as a "finished product"
    • For example, one approach I find promising might be to train/align a continuous learning model by interacting with it instead of using a fixed training corpus, like how we raise humans. If these models can learn at a near-human rate with human levels of training data, this becomes a possibility.
  • Reach out to labs, groups, or companies researching this line of models and investigate what their safety plans look like
Mentioned in
11We should think about the pivotal act again. Here's a better version of it.