This is a linkpost for AR Might be the Key to BCI from my Substack.

Summary: Pairing augmented reality with yet-unrealized advances in noninvasive, high-resolution brain scanning technology could be a key unlock in collecting and scaling data needed to enable robust brain-computer interfaces. I see this as especially useful in determining a shared latent space (i.e., an embedding interface) between neural data and such modalities as audio, vision, text, or even behavior. As the component technologies mature and become widespread, this could eventually open multiple paths to human emulation.

I was inspired to collect these thoughts more concretely after finally clicking through the hypertext in The Intelligence Curse and coming across Richard Ngo’s piece The Gentle Romance in the Asimov Press.

Setting the Stage (in the Theater of the Mind)

In The Gentle Romance, the narrator’s path to post-humanism begins with a pair of augmented reality glasses; a device not too dissimilar to nascent offerings from the likes of Meta, Xreal, and Viture. The spectacles offer a private virtual workspace, immersive gaming, and, critically, an AI assistant which learns from the user’s habits to better assist them.

In the story, the narrator quickly becomes frustrated at his speed of communication with the assistant. The sluggishness of text pales in comparison to the more abstract visual communication system demonstrated by an old college friend; a feature which allows the assistant to superimpose patterns of light in his periphery, initially pairing them with a verbal translation to bootstrap comprehension. Enabling this, the narrator becomes adept at interpreting the shapes and patterns, the pictorial pidgin coming to serve as an intermediary between the language of his mind and the AI’s incomprehensible embedding space. Eventually, hemodynamic brain scanning technology becomes mature enough for commercial applications; the assistant, or “meta-self”— previously a limited model of the narrator’s cognition extrapolated from his actions— can now “read thoughts in real-time.” In the short run, this gives the narrator greater command over his tools. In the long run, this is the first step towards simulating mental states learned straight from the source.

It seems to me that reading the mind is contingent upon getting a good signal from the brain, associating it with a stimulus or an intent, and doing this a lot. I think that that AR is the most viable pathway to this end.

What is a BCI?

A brain-computer interface, as the name suggests, is a device enabling information transmission between the brain and external digital systems. Clinically, BCIs can be categorized by their level of invasiveness. At one extreme, invasive BCIs (such as those under development by Neuralink leveraging implanted cortical threads) are installed via deep surgical intervention in the brain. In exchange for greater measurement density and a high SNR, devices like these contend with challenges posed by the physical plasticity of gray matter and higher risks of complications. On the other end, non-invasive solutions (potentially involving extra-cranial electrodes, ultrasound, or calcium imaging) are likely to be safer (or at the very least perceived as such) and more convenient for the user at the expense of difficulty in detecting a usable signal; a property rendering such devices challenging to build with current technology. (In between these two extremes, a semi-invasive BCI could be something like an electrode sheath implanted between the skull and the brain, offering the potential for a better SNR without the need to physically enter the brain.)

From a functional standpoint, I find it useful to categorize these devices by the direction of information transfer. Most BCI research to date has involved what I term brainreaders, which extract signals from the brain. When these signals are processed and interpreted for the purpose of controlling or influencing external devices— for example, to move a computer cursor or reconstruct the image the user is envisioning— the brainreader becomes a mindreader.

On the other hand, a brainwriter would be an external device that induces electrical activity in the brain. Still mostly the stuff of science fiction, this could potentially be used to effect some mental change in the user; for instance, mocking sensory input to induce a given perceptual experience, or triggering memory formation to impart knowledge or transmit skills. I refer to such a device as a mindwriter.

A bidirectional BCI is some combination of these types and would enable two-way information transfer between the brain and external devices.

AR → Mindreaders

How might a scanner learn to read thoughts? Through a physicalist lens, the mind’s inner state is a direct product of the physical behavior of the substrate. That is, brain states and the experience of the brain-owner are the same up to some mapping; to make a brainreader into a mindreader we just need to learn the dictionary that translates the two. From a functionalist perspective, the same qualia could arise from a variety of physical states— what matters about any given state is its function within the causal interplay between states. In this interpretation, reading the mind would require the context of a(n arbitrarily long) history of prior states to understand how the instantaneous experience has arisen.

Despite their differences, both perspectives are materialistic; i.e., rejecting the notion of any dualism between mind and matter. As such, though they may disagree on the details of implementation, proponents of either theory would agree that a sufficiently powerful ML model ought to be able to translate between thoughts— inner states as represented by sequences of brain states— and the conditions giving rise to those states.

The transformer architecture is a natural choice for converting between sequences, given sufficient model size and training data. It stands to reason that pairing a large corpus of [image, audio, behavior, etc.] sequences with time-matched neural data could enable training a transformer to convert between the two. These corpuses could be produced by combining a brainreading device with an implement that records the external conditions giving rise to the detected activity. AR is a compelling choice here; a non-intrusive pair of glasses could provide 1. always- (or usually-)on binocular audio/video recording (i.e., lots of data), 2. the ability to directly expose the wearer to a broad variety of content that they may not otherwise see, hear, or read (thus widening the space of possible brain states for observation; i.e., diversifying data), and 3. direct capturing of user behavior (e.g., rich integration with the work environment, eye tracking for focus detection), laying the groundwork for prediction of intent from brain activity.

While the development direction for such AR devices is fairly conceptually clear, with remaining challenges increasingly boiling down to engineering problems, the path forward for precise brainreaders is muddier. I summarize the present state and imminent challenges in the (§) Tech Tree - Brainreaders section below.

Aside: Universal vs. Personal Mindreading

If I collect a bunch of data about how my brain responds to different stimuli and invokes different actions, can this be easily translated to another user? My guess is that one-to-one correspondence is unlikely. We have yet to learn much about how the brain represents information, but given its remarkable plasticity— for example, its ability to maintain normal social functioning with over 90% of its mass missing— I suspect there is no reason to believe that different individuals’ brains converge on similar or identical representations.

In general, though, there may be some broad structural similarities in most typical brains that can be exploited to bootstrap the mindreading process. For example, one could imagine developing a foundation model trained on a variety of minds that needs a much smaller amount of supplementary data in order to fine-tune to a new user.

Mindreaders → Mindwriters

Say we have used our collected data to train a model capable of distilling some amount of neural data into a predicted image or token sequence or action with some degree of accuracy. It would seem that reversing the direction of translation would allow us to determine what signals need to be injected back into the brain to induce the desired images, inner speech, or (scarily) actions. So is a mindreading model tautologically a mindwriting one, too?

My intuition is that this is not necessarily the case, and that the efficacy of mindwriting will depend not just on the spatiotemporal fidelity of the writing device itself, but also the fidelity of the collected data used to train the model. A weak analogy might be figuring out the gestalt of an image from which patches of pixels have been removed. It’s plausible that half the image could be ablated and one can still tell that it’s an ocean scene with a ship; but trying to fill the patches back in, even with an understanding of the overarching scene, will surely produce something which materially differs from the original, unbroken image. In our case, the patchy image is a neural recording where each sample is taken from a group of neurons (rather than having a point per neuron) and the gestalt is the stimulus and we are trying to recreate. I suspect the sensitivity to this effect will differ based on the modality we are trying to predict, with more diverse and information-dense modalities being harder to write; e.g., maybe language is easier than audio, which is easier than images.

In general, though, as the technology of brainreading improves, we are likely to see gains in the detail we can extract from neural signals, and can therefore expect more faithful backwards translation.

The other key component here is, of course, the capability of the brainwriting device. The spatial localization of impulses, their density within the brain, the maximum frequency, and the parts of the brain that are covered are all likely to impact the fidelity of information that can be written as well as the way it is interpreted by the brain. For example, the capacity to write detailed memories may depend on the (more technically challenging) ability to rapidly and densely stimulate the hippocampus located deep within the brain, whereas mocking reasonable-fidelity audio may be comparatively easier, requiring only shallow stimulation of the auditory cortex.

The Path to Emulation

Copious quantities of stimulus, action, and neural data could be used to build emulations of human behavior and/or cognitive processes in a variety of ways.

The traditional form of mind uploading described in futurist works by those such as Bostrom, Sandberg, Hanson, and others is *whole brain emulation.* To create a WBE, one would need to build a model, usually of a particular human brain, obeying the laws of physics. This model could be run in a physical simulator with the appropriate scale of interaction (ranging from simply neuron-level activations of the connectome all the way down to quantum effects at the most compute-intensive extreme) to approximate the behavior of the physical counterpart.

Due to the intricacies required for even the least granular form of such a model, this is probably not enabled by the progression I have laid out (unless the mindreader/writer are composed of something like neural dust distributed throughout the brain). However, our intensive data collection gives us the chance to approximate this paradigm to some degree.

For instance, the grossest approximation would be to train a model to predict behavior directly from stimulus. This would avoid the “middle man” of individual neural states, saving a lot of computation. However, the result (to me) feels very p-zombie-esque. Instead, one might take a history of past neural states and an instantaneous set of stimuli and train a model to simply predict the next neural state. From here, the mindreader could be used to extract action as a natural auxiliary result of the new state.

The ethical and philosophical implications of such techniques are immense and have been treated at length by other authors. A few questions: Are there differences between deterministic models and quantum (random) models? Can behaviors be directly surmised from inputs given enough examples? How long of a prior state history is needed to compute the next state? Can states be skipped?

BCI Tech Tree

A diagram outlining how different data sources, technologies, and models are related.

There are three initial technologies upon whose maturity much of the picture above depends: AR, brain scanning, and AI. Below I summarize the current state of these technologies and further development needed.

Augmented Reality

The path forward for AR is the clearest. VR, while a very different medium with its own set of challenges and objectives, has been a significant donor of AR-related technologies, including optics, displays, sensors (e.g. accelerometers, outward- and inward-facing cameras), and processors. Already we have seen compelling demonstrations of pass-through “HUD-like” glasses, and it appears that all that remains is further compactification and improvement in fidelity.

Scanners

Invasive BCIs are surgical— in that they offer more precision, and because they literally require surgery to install. Electrodes planted directly in the cortex seem the most efficient way to get a clean signal out and offer high spatial and temporal resolution. However, this strategy comes with a number of disadvantages. Neuralink has already run up against challenges posed by the physical malleability of the brain: highly-publicized reports indicated that over 85% of its electrosensitive threads retracted shortly after the start of the company’s seminal human trial. The risk of more severe complications which are yet unknown— in addition to potentially carrying steep human cost— also pose as a barrier to mass adoption.

In The Intelligence Curse, Luke Drago and Rudolf Laine pose BCIs as a potential example of pro-human AI-enabled technology, and urge that “BCIs should be noninvasive to reduce adoption barriers.” Indeed, non-invasive BCIs’ lack of direct intracranial intervention is likely to be perceived as much safer, lowering the barrier to entry and unlocking mass data potential. The tradeoff is of course the difficulty in getting any signal at all.

Artificial Intelligence

To map brainstate to (text, image, audio, etc.) we are likely to need a lot of data. As discussed briefly in (§) Aside: Universal vs. Personal Mindreading, the source and quantity of this data may be contingent upon the universality of brain structure. If there are enough commonalities in brain structure and the information patterns within, we could be in a regime not unlike today’s LLMs— train a foundation model, refine it, and provide it to the end user as-is. This more centralized training process reduces the burden on scanning technology, as more capital-intensive methods could be shouldered by institutions. If different users are sufficiently unique, though, there may have to be widespread distribution of inexpensive scanning tech in order for individuals to train their own models.

Another question is whether continuous learning is needed. This possible necessity for advanced AI is commonly cited in the context of human-like adaptability; for example, Dwarkesh Patel believes that many of the major problems with today’s LLMs stem from an inability to “get better over time the way a human would”. It may be the case that improving sample efficiency is paramount to reducing hallucination, synthesizing disparate pieces of information, and responding to novel stimuli— all of which would be important for future mindreaders and -writers.

Caveats and Dangers

As expected, the dangers of BCIs scale with the increased capabilities of the underlying technology.

At the lowest level is simply safety. Active sensing might irradiate the brain, frying neurons; attempting to mindwrite with inappropriate inputs could induce a seizure. Implants could be rejected by the immune system or cause an infection. But a functioning, medically safe BCI unlocks a host of other concerns; in particular, privacy and agency.

Privacy

Mindreaders provide a direct link to someone’s psyche, a modality ripe for exploitation by bad actors. If not properly encrypted, raw brain data combined with the transformations needed to adapt it to a legible format could be used to steal personally identifiable information such as someone’s SSN or home address, bank and payments info, and other sensitive data— just by the victim thinking about it. Even scarier is the possibility of deep personal secrets being extracted for the purposes of public humiliation, bullying, or blackmail.

Additionally, if BCIs are not intuitive to use and chock-full of safeguards, user error alone could result in snafus that are embarrassing at best and catastrophic at worst; imagine, for instance, having an intrusive thought while texting and mistakenly sending it— or unintentionally slandering someone in a public forum simply by thinking negatively of them at the wrong time.

Agency

Enabling write access to the brain invites a whole new suite of problems. On its face, the most benign is intracranial advertisement, an idea widely parodied in media and yet completely plausible if consumer protections aren’t strong enough. Similarly, social media addiction is already prevalent; the endless flow of content on tap is likely to become all the more compelling— and all the more addictive— when that media is extremely immersive and instantly accessible (see: the experience machine). On the more sinister side, attackers could flood a victim’s mind with disturbing images, sounds, sensations, or full-fledged experiences. And if the motor cortex is in-bounds, a puppeteering attack could have horrific consequences.

Conclusion

In sum, augmented reality coupled with high-resolution, noninvasive brain scanning is a plausible bridge to mindreading and mindwriting— though with significant safety and privacy caveats. In particular, AR solves the problem of scale: continuous, time-synced streams of stimulus, behavior, and neural data allow for translation to be learned. Alone, these technologies could revolutionize the way we live and work; but eventually, the very same technologies and the data they provide could open the door to whole-brain emulation.

LESSWRONG
LW