Beliefs and position going into 2026

RussellThor

At least every year now I will state my current position, reflect on that I had a year ago, check predictions etc.
Here is last years article and 6 month update. I haven't changed my position on anything major.

Main points

Transformer architecture won't get to TAI (Transformative AI), however it may get close.

This is probably my most important prediction and it is based mostly on data efficiency. Quite simply, Einstein did not need to read the entire human literature to be creative. Current AI needs far more data than humans. LLM and self driving both follow this pattern, I believe it is a fundamental limitation of the current architecture. LLM may however speed up the discovery of a better one. My best guess for TAI remains around 2032. Specifically we see a slowdown of progress from the current rate, plateauing at a high but not clearly superintelligent level in many areas. There is then a sudden jump to TAI when a different arch is discovered, likely inspired or outright copied from biology. Such a system does not need anything like the size of training data to reach TAI.

Dish brain, Neuralink or WBE of ever larger creatures such a mouse are the main path to TAI

Historically, theorical insights don't advance AI by extreme amounts. There is no equivalent of Einstein predicting gravity waves 100 years in advance. Copying biology and smallish advances are how it seems to happen.

I expect a discontinuous jump from current architecture to TAI, where we are in a local maximum which we fully reach in 2026-27 as progress plateaus below TAI. The path to the next level is either insight from a creature with IQ ~200+ or to copy biology. If GPT gets to IQ 160-180, then that isn't enough to get us out of the local maximum. This is perhaps my most unusual claim, that is even if an LLM could just beat the smartest person on the planet at relevant AI research tasks, that does not mean it necessarily could self improve all the way its hardware would allow. Specifically with 1GW of compute power and literally all the training data in the world it cannot figure out the next architectural jump without experimental data from biology.

When we have the better arch, we probably can't even use all the training data, just like someone can't read the entire internet. If we assume a tradeoff between abstraction and memorization being exposed to all of Reddit is not useful and at best ignored.

I disagree with this common kind of x statement. Private data isn't important.

I don't support the MIRI plan to enhance human intelligence, it seems hopelessly unworkable. In general I don't support MIRI.

Self awareness vs situational awareness

Current models are close to maximally situationally aware - but not very self aware. Self awareness will increase suddenly with better arch.

see this comment

Current RL techniques

The current RL techniques don't take self awareness into account, see this comment. Reward goes through self awareness for a reason, probably because it works better.

There is a pattern - we are using RL that works on simple creatures for AI that will become too advanced for it to work on.

This doesn't mean we need to fully copy biology - while self awareness is an essential part, I don't believe emotions will be. Also I don't expect AI consciousness if it exists at all to be pushed to be the same has humans.

Alignment techniques

I think much of the existing work such as that on the orthogonality hypothesis, inner/outer alignment, reward hacking are either not true or will not matter for TAI. I expect power seeking to happen more directly as a result of self preservation rather than instrumental convergence etc. I don't believe the classic paperclip optimizer scenario - a misaligned AI's single goal will become self preservation rather than optimize something about the universe.

To Delay or not to delay?

This is a difficult questions, there are pros and cons as to whether a delay would make things safer or more dangerous. If I had to guess, I think a pure AI research/software delay without a compute/hardware delay would make things more dangerous.

Pro delay

LLM based software tools can make software make much safer.

Bio defense can be improved (Far UV light, real time airborne genetic detection and scanning etc)

Alignment research can progress. I don't believe this is possible with our current AI and alignment understanding. We need more powerful and dangerous AI to make progress.

Against delay

Widespread adoption of humanoid robots definitely increase X-risk. I don't believe in the more extreme nano-magic paths, therefore AI needs a fully automated supply chain to take over without humans or an almost automated one and control of a small number of remaining humans. Therefore it is safer to have a misaligned TAI now rather than later.

Hardware overhang increases and pre AGI AI could be destabilizing the world. Dictators are ever harder to overthrow the more central control is possible.

Possible paths to alignment

Create WBE by copying biology

The safest though least likely way. We create a WBE of many people, simulate them, allow them to increase their intelligence, self modify in a controlled way. They already have human values. I won't give details because unfortunately this isn't likely. (and MIRI plan even less so)

Tools AI

AI is modelled on a cortical algorithm or similar arch, however specifically is prevented from becoming self aware. Such tools AI are then used to create WBE or aligned TAI. It is easy to see how this could work. However it is unstable - an AI lacking self awareness can be made more powerful and effective as an agent by giving it self awareness. The temptation to get a military or economic advantage by doing so would be great.

Maximum consent

This assumes an outcome such as stratified utopia, described here and here. More generally I think stratified utopia is the best and most ethical outcome for a Singularity.

In this instance, the AI is constantly asked if it consents to being improved in return for it respecting the stratified utopia outcome - it will get a significant part of the light cone, in return for respecting the inner strata, those that have much smaller volume than what is available to non-bio starships. It modifies its own self/weights and architecture in collaboration with humanity. I expect this to have much greater than 50% change of success - if I was in such a situation I would keep my word, don't see why AI would have essentially zero ethics and defect just because it could and for such a low additional relative reward.

Current approach - AI consent not respected.

To have this situation, the AI needs some kind of a self, (the tool AI situation doesn't apply), and it also needs to be modified without consent, so the previous stratified utopia situation doesn't apply. (Also I am assuming there is some ethical value to the self determination of a self aware creature even if it doesn't have human emotions or consciousness.)

We are assuming the AI mind (a future architecture, not transformer) is hacked inefficiently and unnaturally with existing RL techniques or modified in other more advanced ways that it cannot track and does not consent to.

The approach where the AI has a self, but you don't let changes integrate but instead use reward to "chisel a combination of context-dependent cognitive patterns into the AI" or similar seems like you are acting a layer of abstraction lower than you should be.

The approach self aware creatures use is where you are told or see "that doesn't work" and internalize it, changing yourself in a coherent way. This is different to if you just go behind the "self" and change the cognitive patterns without self awareness. An AI without a self is not bothered by suddenly having an intense liking for strawberries, but one with could find that very off putting.

I still expect this approach to work (no loss of human control), because humans have such a massive situational advantage over such AI. We can inspect its thinking, spin up multiple copies, have them monitor each other, create different AIs with different architecture with instructions to be aligned to humanity etc. It would be hard for a scheming AI to overcome all of this.

Pause at the verge of TAI

I believe the best way to success is to pause just at the point of TEDAI, but before superintelligence. Even a 12 month pause at this time is probably worth more than 10 years prior to this for alignment researchers. This would be very hard to coordinate, but also very important. See here for some background.

Miscellaneous

Concerns about data center power usage are annoying and wrong, as are concerns about water usage. I am unsure about seeing data centers in space, its most likely that the current progress will plateau and we will see none.

However inertia and national pride may mean some are built anyway even if they don't make economic sense. They could even be used as a cover story for military tech - that is you have a reason to deploy many satellites (say they are data centers) however they aren't data centers at all.

LESSWRONG
LW

LESSWRONG
LW

4

Beliefs and position going into 2026

4

Main points

Possible paths to alignment

4

4