The purpose of this post is to sketch some ways that Brain Computer Interface (BCI) technology might help with various AI alignment techniques. Roughly, we can divide the strategic relevance of BCI technology into three broad categories.[1]

  1. Enhancement. BCI technology could enhance human intelligence - for example by providing new sensory modalities, or augmenting cognition. [2]

  2. Merge. BCI technology could enable a “merge” between AIs and humans. This is advocated by among others, Sam Altman and Elon Musk, and is the stated raison d'etre of Neuralink:

“I think if we can effectively merge with AI by improving the neural link between your cortex and your digital extension of yourself, which already...exists, just has a bandwidth issue. And then effectively you become an AI-human symbiote. And if that then is widespread, with anyone who wants it can have it, then we solve the control problem as well, we don't have to worry about some evil dictator AI because we are the AI collectively. That seems like the best outcome I can think of.” -Elon Musk, interview with Y Combinator (2016) [3]

  • On these proposals, humans are not merely enhanced - in some radical sense, humans merge with AI. It’s not entirely clear what these “merge” proposals mean, what merging would look like (Niplav: “It seems worrying that a complete company has been built on a vision that has no clearly articulated path to success.”), and "merge" as a alignment strategy seems to be quite unpopular in the AI safety community. In future work, I’d like to clarify merge proposals more.
  1. Alignment aid. BCI allows us to get data from the brain that could improve the effectiveness of various AI alignment techniques. Whereas enhancement would indirectly help alignment by making alignment researchers smarter, alignment aid proposals are about directly improving the techniques themselves.

This post is about category 3. In conversation, several AI safety researchers have mentioned that BCI could help with AI alignment by giving us more data or better data. The purpose of this post is to sketch a few ways that this could go, and prompt further scrutiny of these ideas.

1. Types of Brain Computer Interfaces

Various technologies fall under the term “brain computer interface”. The commonality is that they record neural activity and transmit information to a computer for further use -- e.g., controlling an electronic device, moving a prosthetic arm, or playing pong. In April 2021 a prominent the Neuralink demo, showed a macaque monkey playing 'mind pong' using “a 1,024 electrode fully-implanted neural recording and [Bluetooth!] data transmission device”.

Some BCIs only “read” neural signals and use them, as in the ‘mind pong’ demo, while other BCIs also involve “writing” to the brain. While both reading and writing would be necessary for enhancement and merge, I will assume in this post that “reading” capabilities alone would be needed for the alignment aid proposals.

In assessing the state of BCI technology, we can look at three main criteria:

  • Invasiveness: do we have to (e.g.) drill a hole in someone’s skull, or not?
  • Resolution: how fine-grained is the data, both temporally and spatially?
  • Scale: how much of the brain can we record from?

Different kinds of BCI, each with different ways of being relevant to AI, score differently on these measures. Non-invasive techniques techniques are, as you would imagine, more common, and it is non-invasive techniques that are used in current commercial applications. From “Progress in Brain Computer Interface: Challenges and Opportunities” (2021):

Non-invasive BCI exploiting EEG are most common, although more recently, functional near infrared spectroscopy (fNIRS) (Matthews et al., 2007), magnetoencephalography (MEG) (Fukuma et al., 2016), functional magnetic resonance imaging (fMRI) (Kaas et al., 2019) and functional transcranial Doppler ultrasonography (Faress and Chau, 2013; Lu et al., 2015; Khalaf et al., 2019) have been exploited.

Kernel, one of the most prominent BCI companies, has two products using non-invasive techniques: Kernel Flow which uses functional near infrared spectroscopy fNIRS (see here for a demo), and Kernel Flux which uses MEG. The resolution of non-invasive techniques is limited by hard physical constraints in, given the fundamental difficulty of listening in on individual neurons when there is scalp, fat, and skull in the way. With current technology, invasive techniques are necessary for recording from individual neurons:

In contrast, invasive intracortical electrodes (Pandarinath et al., 2017) and electrocorticography (ECoG) (Kaiju et al., 2017) have been used, providing a superior signal-to-noise ratio and better localization of brain activity.

But of course invasiveness comes with a variety of problems of its own: greater cost and risk, and the body's eventual rejection of implanted devices.

It seems that for significant human augmentation or for “merge”-like scenarios, we would not just improvements of current methods, but breakthroughs in implantation techniques and materials.[4] Alignment aid proposals, in contrast, might be possible with currently available non-invasive BCI. That said, I’m not at all clear on what levels of scale and resolution are necessary for these alignment proposals to be feasible, or how their usefulness would scale with resolution.

2. BCI for alignment

Most of these alignment aid ideas come from conversation with Evan Hubinger. Errors or unclarity are from my transmission.

High-level picture

A key component of various AI alignment proposals is teaching AIs something about humans: how humans think, or why we do the things we do, or what we value. AIs have a limited amount of data from which to learn these things. BCI technology might improve the quantity and quality of data available for learning.

Quantity: AI models will need a ton of data to effectively learn. And the amount of information that one can extract from, e.g., reading sentences that a human is generating, is a lot lower than the amount of information one could extract if one also had information about what's happening inside that human’s brain.

Quality: if training data includes detailed information about how humans think, then we can train models not just to output the same sentences that a human would, but to think about those sentences like a human would. (A perennial worry is that we will train models that don't think in the ways that humans do, but instead latch onto inhuman concepts.) But having data which comes directly from human brains could help us train models to think in a more human-like way.

So at a very high level: BCI might give us more training data and better training data for working with humans. A person sitting in a room for 30 minutes answering questions on a computer, would generate far more data with a BCI.

(Question from Evan Hubinger: “You can probably estimate empirically, or with careful Fermi estimation: in 30 minutes of a person with some BCI attached to them, how much meaningful data (excluding noise) do they produce compared to a person who's just typing at a keyboard? What is the actual informational entropy of this data?”)

Intersection with AI alignment techniques:

a. Imitative amplification

Instead of training models to imitate human language, one could train them to imitate language plus (some subset of) the outputs of the BCI. (Imitating all of the outputs of a BCI might be too difficult, and unnecessary.) Alternatively, one could give BCI outputs to the model, while having the task be to imitate only the sentences. (Hubinger: “This second alternative is a little bit tricky because this is not how machine learning is usually set up, so you'd have to think a little bit about how to get the BCI output information to the model”).

b. Approval signals for debate or approval-based amplification

In either debate or approval-based amplification, one could use BCI to get a richer approval signal, a signal that contains more information about how the human thinks and feels. This richer signal could produce a much richer loss function than a binary yes-or-no approval signal could.

At a high level, BCI would give us access to internal data about how the human is reacting to things.

Presumably, one would need to “ground” BCI signals in actual sentiments; one would need to train a model to decode sentiment from BCI signals. And given the complexity and range of human reactions, this might be difficult to implement. It’s an open question whether adding BCI sentiment is significantly better than just having people rate things on a more fine-grained scale from (say) 1 - 10, or than other ways of eliciting fine-grained feedback (see below).

c. Value learning

In conversation, the BCI-alignment connection people usually make is to propose using BCI for value learning. The general idea is that information about the brain could help us learn (approximations of) human value functions more effectively than language and behavior alone. The hope is that additional neural information could help constrain the set of possible value functions that could be learned. There might be gains from this additional information, whether or not it makes sense to think we could “read” the value representations in the brain.

While “value learning” is in some sense the most obvious connection to AI alignment, it's hard to say more about what this proposal would look like, in the absence of detailed ideas about the neuroscience of value representation and the state of art in value learning is. This is one of the key further questions in the final section.

Other sources of more data besides BCI

Of course, neural recording is not the only source of additional information about humans. Other examples include: video (and in particular, information about body language and facial expression), pupil dilation, heart rate monitoring, electrodermal activity. AI safety researchers I have spoken are not aware of any work that utilizes these sources - one reason being that there is just that there is so much more to explore using language alone, and this research is both cheaper and prima facie more promising than work incorporating these additional measures.

3. Future directions and open questions

As noted, all of these proposals are very sketchy. To evaluate their plausibility, we need: a clearer picture of the proposal, and a more fine-grained analysis of what level of BCI technology would be needed. Greater knowledge of neuroscience than I have would be helpful here (paging Steve Byrnes).

While there is a lot to be gained from more conceptual clarity, I do suspect that many key questions would only resolve in light of actually trying some experiments. Even with today’s limited BCI technologies, I surmise that we could learn a lot even with toy experiments using non-invasive techniques like fNIRs or EEG. That said, even a toy experiment would be rather expensive given the hardware required.

More generally, I hope this prompts more work on BCI. As Niplav has recently noted, writing about BCI in the AI strategy community is mostly cursory and scattered around the Internet.[5] Even if the AI safety community concludes that BCI technology is not likely to be strategically relevant at all, it would be good to have more clearly articulated reasons for why.[6]

Thanks to Evan Hubinger, Anders Sandberg, Steve Byrnes, and Oliver Habryka for discussion. Thanks to Miranda Dixon-Luinenburg and the LessWrong feedback service!


  1. I take this distinction from Borg and Sandberg (unpublished draft), who break down BCI relevance into “enhancement”, “merge”, and “improving our ability to accurately learn human values”. For my third category, I look at using BCI a few different alignment proposal, not just “learning human values”. ↩︎

  2. For intriguing examples of sensory augmentation, see Thomson et al. (2013) on using a neuroprosthesis to allow rats to see otherwise invisible infrared light, and Schumann and O'Reagan (2017) on a non-invasive method for training a ‘sense’ of magnetic North. ↩︎

  3. More quotes from Musk and Altman: this tweet by Musk. The essay "The merge" by Altman: "[U]nless we destroy ourselves first, superhuman AI is going to happen, genetic enhancement is going to happen, and brain-machine interfaces are going to happen....The merge can take a lot of forms: We could plug electrodes into our brains, or we could all just become really close friends with a chatbot. But I think a merge is probably our best-case scenario. If two different species both want the same thing and only one can have it—in this case, to be the dominant species on the planet and beyond—they are going to have conflict. We should all want one team where all members care about the well-being of everyone else." ↩︎

  4. Tim Urban’s Wait But Why article on Neuralink covers some of the proposed new methods and materials; “Physical principles for scalable neural recording” (2014) maps out the fundamental physical constraints facing neural recording technology. ↩︎

  5. The most extensive treatment of BCI in the AI safety literature is when Bostrom is pessimistic that BCI-enhanced humans would be competitive with “pure” AI in Superintelligence, ch. 2. These arguments do not directly apply to using BCI as an alignment aid. ↩︎

  6. Some further BCI research projects that could be useful: further evaluating the ‘enhancement’ and ‘merge’ proposals; a picture of the BCI landscape: the state of the art, current actors, amount of funding; forecasting future BCI capabilities; other ways BCI could be strategically relevant for AI: ‘mind-reading’, advancing neuroscience research (including consciousness research), political impacts. ↩︎

37

10 comments, sorted by Click to highlight new comments since: Today at 12:12 AM
New Comment

A few little nitpicks:

Scale: how much of the brain can we record from?

Maybe "coverage" would have been a better term here? Like I'm thinking: if some future researcher recorded 100000 neurons in 1mm^3 of brain tissue, totally ignoring the rest of the brain, they would still presumably call it "large-scale brain recording" in their press release. :-P

Kernel, one of the most prominent commercial applications of BCI, uses functional near infrared spectroscopy fNIRS

I think Kernel has a fNIRS device ("kernel flow") and an MEG device ("kernel flux").

difficulty of listening in on individual neurons when there is scalp, fat, and skull in the way. With current technology, invasive techniques are necessary for recording from individual neurons: …

You don't exactly say this, but your text here kinda suggests that ECoG records individual neurons, which is wrong. If anyone is curious: ECoG is basically the same as EEG but instead of putting electrodes on the scalp, you put them directly on the surface of the brain. ECoG spatial resolution is a few millimeters at best, I think.

Hi Steven, thanks!

  1. On terminology, I agree.

Wait But Why, which of course is not an authoritative neuroscience source, uses "scale" to mean "how many neurons can be simultaneously recorded". But then it says fMRI and EEG have "high scale", but "low spatial resolution" - somewhat confusing since low spatial resolution means that fMRI and EEG don't record any individual neurons. So, my gloss on "scale" is more like WBW actually is talking about, and probably is better called "coverage". And then it's best to just talk about "number of simultaneously recorded [individual] neurons" without giving that a shorthand--and only talk about that when we really are recording individual neurons. That's what Stevenson and Kording (2011) do in "How advances in neural recording affect data analysis".

  1. Good call on Kernel, I'll edit to reflect that.

  2. Yep - invasive techniques are necessary - but not sufficient, as the case of ECoG shows.

Thanks for writing this, it's a helpful post!

rate things on a more fine-grained scale from (say) 1 - 10

Something that jumps out at me here is, BCI could enable more than one scale—e.g. this scores 4/10 on approval, 6/10 on cringiness, 1/10 on grossness, 7/10 on funny, 2/10 on exciting, …. Those are definitely there in the brain, but hard to get out without BCI, because who can introspect along 15 (or whatever) different axes simultaneously?

(Might be hard to get that information with BCIs too, because things like insula & cingulate cortex & amygdala are kinda deep in the brain. Hmm, actually maybe this is an area where we want to use skin conductance, microexpressions, etc.? Dunno.)

No opinion on whether it actually helps for alignment to have 15 (or whatever) different axes instead of just "approval".

Even if we were able to get good readings from insula & cingulate cortex & amygdala et alia, do you have thoughts on how and whether we could "ground" these readings? Would we calibrate on someone's cringe signal, then their gross signal, then their funny signal - matching various readings to various stimuli and subjective reports?

In principle, I think different reactions should project to different subcortical structures (e.g. the hypothalamus has lots of little cell groups that look different and do different things, I think). In practice, I dunno, I guess what you said sounds about right.

I'm interested in whether explaining brain data causes more weight to be placed on hypotheses that seem similar to our naive introspection, or less. Arguments for "less" might be if our introspection is self-deceptive in ways we don't want to be disabused of, or if more data would promote a more microphysical, less psychological way of looking at humans. But I think our success rides on overcoming these arguments and designing AI where more is better.

I guess what I'm saying is that I see work on meta-preferences and regularization as being crucial inputs to work that uses brain data, and conversely, using brain data might be an important testbed for our ability to extract psychological models from physical data. Does that make sense?

Thanks for your thoughts! I think I'm having a bit of trouble unpacking this. Can you help me unpack this sentence:

"But I our success rides on overcoming these arguments and designing AI where more is better."

What is "more"? And what are "these arguments"? And how does this sentence relate to the question of whether explain data makes us put place more or less weight on similar-to-introspection hypotheses?

Whoops, I accidentally a word there.

I've edited that sentence to "But I think our success rides on overcoming these arguments and designing AI where more is better."

Where "more" means more data about humans, or more ability to process the information it already has. And "these arguments" means the arguments for why too much data might lead the AI to do things we don't want (maybe the most mathematically clear example is how CIRL stops being corrigible if it can accurately predict you).

So to rephrase: there are some reasons why adding brain activity data might cause current AI designs to do things we don't want. That's bad; we want value learning schemes that come with principled arguments that more data will lead to better outcomes.

Nice article! I can't find anything I disagree with, and especially like the distinction between enhancement, merging and alignment aid.

Also good point about the grounding in value learning. Outside of value learning, perhaps one won't need to ground BCI signals in actual sentiments? Especially if we decide to focus more on human imitation, just the raw signals might be enough. Or we learn how to extract some representation of inconsistent proto-preferences from the BCI data and then apply some methods to make them consistent (though that might require a much more detailed understanding of the brain).

There's also a small typo where you credit Anders "Samberg" instead of "Sandberg", unless there's two researchers with very similar names in this area :-)

fixed the "Samberg" typo - thanks!

Samberg