Telepathy Is (Algorithmically) Easy

Elliot Callender

Thought-sharing is the easiest approach for intelligence amplification given appropriate hardware. The main risks are psychosis and dissociative symptoms from identity disruption.

I'm around 30% that an implanted group of 10 would actually manage a pivotal act, conditional on hardware being solved.

Speech and text are extremely inefficient. For example, math textbooks are routinely more than one page long.

This sucks! I want the entirety of human hard-science results to pass through my mind at least once. Someone learned each of those concepts, but they can't just copy their Understanding to me.^[1]

Or perhaps they can?

If we can read and write enough neural state, then communication is a unusually friendly target for cognitive augmentation. Unlike most enhancements, it doesn't need (non-hardware) neuroscience breakthroughs in about half of possible worlds from my perspective.

Humans are already exceptionally skilled at communication despite terrible bandwidth. By speaking while learning neuralese, we can use spoken language and feature engineering as training wheels to bootstrap telepathy.

(To be clear, I'm talking about hardware and software to pass carefully-translated brain activity between people. It's not spooky.)

Groups of experts could then share deep understanding in minutes-to-days; I'd wager that, with help from a mathematician, I could understand most of modern algebraic topology in a week instead of a year.

This could go a few ways. We'll start with the most pessimistic success case, which I estimate is the top ~55% of possibilities. (Most of the bottom ~45%, where we telepathy entirely fails, are worlds where bootstraps don't scale.)

Say that we have absolutely no idea how to implement any algorithms which aren't scientifically replicated as of mid-2026.

Neurotech labs already translate low-dimensional data for speech, movement, and audio-visual stimuli. So we take thousands of these decoders running at much higher resolution across brain surface, and start by training a model on stimuli from a VR headset and haptic suit.

Screenshot from 2026-06-15 13-00-24.png — Left: computational graph for feature-engineered bootstrapping of telepathy model write component. The system learns to convert stimuli into neural activations. Right: same, for reading states.

We have a basis. This can decode and re-encode simple stimuli. We now train the model to predict what text this person will write and speak in a few seconds given their current activations; this takes a good bit longer, probably a few months. If scaling broke, this is where I'd expect it to.

Same idea as earlier, but now the model must learn *anticipatory* signals. Text pulled from Wikipedia. Actual delays probably gradually ramp from 1 second to 10 or more seconds.

And now, we connect two people using a shared translator model^[2]. They've learned explicit "macros" so it's a light application of will to send thoughts to the other person.

Screenshot from 2026-06-15 18-08-23.png — A simple intent-to-transmit decoder allows deliberate, controlled communication, though it's still extremely lossy.

This goes pretty terribly at first. Very imprecise. We keep the signal gain low to reduce weird effects (particularly psychosis, which I'll get to later).

The pair simply talk about interesting things together. As humans do, they begin to build stronger models of each other; neuralese becomes increasingly useful for refining communication.

The external verbal refinement loop. In this example, the sender (left) is transmitting the molecular structure of paraxanthine, which the receiver (right) interprets as caffeine.

After ~4 months of this, the pair now has much better bandwidth than unaided speech. It's more efficient to share learned insights than to learn independently.

As earlier, but now the refinement loop has tightened, being mostly nonverbal. "Where am I wrong about this being the chemical I canonically associate with coffee?" -> "This red atom is replaced with a hydrogen."

In more extreme hypotheticals (the upper 10% by my estimate), after about a year, they're better thought of as one entity than two. As typical brains split computation between hemispheres, so too the minds fluently delegate fractional thoughts.

Scaling the number of people gives nearly linear returns^[3]; we'd need router minds, but beyond that, scaling doesn't have a hard limit.

Alright, what if we know the brain's local learning algorithm and can do whatever extra cortical mass would do?

We could then train the translator much more efficiently; after pretraining to convert to a blurry common language, we run the translator at much higher learning rate to reduce local error.

As in, we make the translator convert messages into whatever each mind is asking for.

Thus we needn't wait for the two humans to become fluent in neuralese. The translator can adapt much quicker than human minds. Bottlenecks here are mostly psychological.

In the case where we can dramatically improve memory consolidation?

Here as well, we can probably accelerate translator convergence. Unlike most cognition, I strongly suspect that cross-human neuralese benefits (accounting for resources used) from strategically written replay code;

person Q was thinking P and then said something which resulted in idea K

seems like it could be pretty effectively scaffolded with some custom-built tools.

Alright, but beaming stimuli into my mind sounds a lot like hallucinations! I don't have agency over what I'm "thinking".

This is a misnomer; in humanlike intelligences, "control" is the result of lots of local computations with no central deciding entity. Those processes would probably not implode if they merged into one monolithic entity, although "I" would be less well-defined.

But the process which calls itself a me will still be disrupted by this change, and we don't want a crazy superintelligence pointed at human values.

So, at minimum, each person has control over sending and receiving neuralese.

Frequency-coded working memory gives a good inductive bias for message-passing. "Person X is thinking Y" goes on one channel, where "person X" and "Y" are flexibly-bound preexisting circuits.^[4]

Working memory might be very similar to FM radio, in which case members of a small group could "tune in" to others' broadcast channels.

We'd probably also include a loss term in the translator for raw sensory and motor signals, since these cause the worst subjective loss-of-agency feelings (sensory / movement data is mostly irrelevant to communication anyway).

I'm around 75% confident that these combined approaches would prevent first-order hallucinatory and psychotic effects, and around 80% conditional on non-acute psychosis that we'd avoid second-order (learned, more chronic) psychosis.

To restate:

Bootstrap the decoder using cheap data like stimulation and writing/speech so that augmentees can communicate anything useful at all; we want it to at least be coherent signals they're sending.
Augmentees talk, lots, for a long time, while simultaneously trying to send their thoughts through the neuralese channel.
Humans are pretty damn good at communication for having such trash bandwidth; so the augmentees get better at communicating much faster than we'd expect from performance on other tasks. There's a tight feedback loop of "what's the person actually saying?" which accelerates this much better than it would if they just worked on challenges together without speaking.
As this loop closes, it starts to close faster since they're now thinking more than speaking at each other; feedback loops are nearly thought-speed.

Out of the four approaches I've covered, I'm most confident that neuralese/telepathy is tractable with sufficient hardware.

Which brings us to hardware!

^{^}
This is one reason why bureaucracies aren't even vaguely superintelligent entities, despite often being composed of many individually very smart people.
^{^}
This architecture (CLIP) is used in multimodal embedding for some tasks like text-conditioned image diffusion and AI-guided molecular search.
^{^}
By the time linearity is saturated, the group is decidedly a superintelligence.
^{^}
Also note that, at group sizes where routing becomes a bottleneck, working memory items are probably the most interesting things to broadcast; they've been selected by the augmentee's cognition to be most relevant to whatever's happening.
^{^}
For example, broadcast storms.

Say that we have absolutely no idea how to implement any algorithms which aren’t scientifically replicated as of mid-2026. ... We now train the model to predict what text this person will write and speak in a few seconds given their current activations

Is this an algorithm which is scientifically replicated as of mid-2026, translating what will be spoken seconds in the future? That does not match my understanding of the field and I would be interested in a link to such research.

~~Yes, with similar accuracy (45% vs 68% in this low-res study) to instantaneous phoneme decoding:~~

Meanwhile, the area 55b arrays, and the dorsal 55b array in particular, appeared to encode the longer units of language, short sentences and sentences (i.e., those with contextual information), much better than phonemes and words, especially during the reading phase (Figure 5B).

The translation accuracy and precision in that study is quite unimpressive; as I mentioned though, resolution makes an enormous difference:

Enabled by these high-resolution recordings, our study participant—who can no longer speak intelligibly owing to amyotrophic lateral sclerosis—achieved a 9.1% word error rate on a 50-word vocabulary (2.7 times fewer errors than the previous state-of-the-art speech BCI2) and a 23.8% word error rate on a 125,000-word vocabulary (the first successful demonstration, to our knowledge, of large-vocabulary decoding). Our participant’s attempted speech was decoded at 62 words per minute, which is 3.4 times as fast as the previous record

This is with 256 total electrode channels. The tech I'm proposing has about a million times this resolution.

The linked study's 68% accuracy figure is on an exercise predicting which one of ten ~4 word phrases the subject has been cued to speak.

I find it unreasonable to call it "the most pessimistic" way things could go when you extrapolate that to "We will be able to read any novel improvised sentence out of people's brains faster than they can speak them." I can imagine a scenario much more pessimistic than that.

I see (agree), was misreading the decoder architecture. Will amend this post when I get back to my laptop.

The original study had two different architectures; one decoded phonemes and matched to the nearest of 50 words, while the other was not phonemic, matching only ~10 phrases. I completely missed this architectural gap for the first 20 minutes after Ninety-Three's second response.

Phonemic decoding seems to scale extremely well; 50 words -> 125,000 words only doubles error rate. I expect anticipatory / semantic decoding to scale worse, but not extremely poorly.

I agree that 68% and 45% accuracy are terrible, especially on a 50 word vocabulary. The 68% figure was to contextualize the 45% anticipatory accuracy; to show that anticipation doesn't cause a dramatic accuracy hit, per your original comment.

Then we see that improved methods at 256-electrode resolution (second study) brings accuracy up to ~76% at 125,000 word vocabulary.

So what I'm extrapolating from this is that, given 2023 SOTA, anticipatory accuracy on ~125,000 word decoding should be ~60-75%. I don't see why having even a mere hundred times the resolution should get less than 95% accuracy on priors?

Also, the study had a small *test* set, but that's not the same as *training* on only 10 phrases. Very different statements about underlying capacity.