Large Language Models Suggest a Path to Ems

anithite

TL:DR:

Whole brain emulation from first principles isn't happening
LLMs are pretty good/human like
- This suggests neural nets are flexible enough to implement human like cognition despite alien architecture used
Training a (mostly) human shaped neural net on brain implant sourced data would be the fastest way to aligned human level AGI.

TL:DR end

supposition: intelligence learning scale of difficulty:

from scratch in a simulated world via RL(reinforcement learning) (AlphaGo Zero, XLand, Reward is Enough)
from humans via imitation learning (LLMs, RL pretrained on human play(Minecraft), human children)

plus a bit of RL to continuously learn required context to interpret training data and to fine tune after the fact (EG:self play to reach superhuman levels)

Distillation (learn from teacher in white box model)

IE: look inside teacher model during training

Distillation in a nutshell:

why train a model from another model?
to make it cheaper to run (teach a smaller model to imitate a bigger one)
- EG: Stable Diffusion model with half the layers doing 4x the work per pass
teacher is not a black box, internal state of teacher is used:
- teacher attention
- patterns of activation (feature based distillation)
- output distribution (AKA:"soft answers")((candidate answer,probability) list for given input)
- if model has similar architecture just train for similarity of intermediate activations
- etc.
this requires:
- prefect information about the teacher model (or some parts thereof)
- ability to run complex linear algebra on internal states and weights
  - differentiation, attribution, finding teacher/student state projections etc.

supposition:why this works

intermediate states give insight into teacher model sub-computations
- student only has to learn sub-computations not derive everything from scratch
- student model can obviously be smaller because they're doing less exploration of the compute space
  - https://www.lesswrong.com/posts/iNaLHBaqh3mL45aH8/magna-alta-doctrina#Who_art_thou_calling_a_Local_optimizer
soft answers give a little more insight into sub-computations near the end of the model giving better gradients at all layers of the network.

My pitch: Train AI from humans via brain implant generated data.

Ideally:
- crack open volunteer skulls like eggshells
- apply reading electrodes liberally
Other output channels include:
- gaze tracking
- FMRI(doesn't scale well but has already yielded some connection topology data)
- EEG(the bare minimum).
Tradeoff of (invasiveness/medical risk) vs. (data rate,granularity)
- likely too much: 100,000 Neuralink fibers/subject
- likely too little: EEG skull caps
- better off paying for more people without EEG caps
nothing gives complete access to all neurons but much better than black box
progress on the visual system was made via poking individual neurons
FMRI data seems granular enough to do some mind-reading
this is very encouraging
good middle ground might be non/minimally brain penetrating electrodes

How this might work:

figure out brain topology
imitate with connected neural net layers as modules
add some globally connected modules with varying feedback time scales

Problems:

some of the interesting stuff is under other stuff
some of the stuff is dynamic (EG:memory, learning)
but whatever happens there will be lots of data to help figure things out
even a human level model lacking long term memory/learning is very useful

Alignment?

Obviously get training data from kind non-psychopaths. If the resulting models have similar behavior to the humans they're modeled from they're going to be aligned because they're basically copies. Problems arise if the resulting copies are damaged in some way especially if that leads to loss of empathy, insanity or something.

Reasons for Hope

RL and other pure AI approaches seem to require a lot of tinkering to get working. Domain specific agents designed around the task can do pretty well but AGI is elusive. They do well when training data can be generated through self play(Chess, Go, DOTA) or where human world models don't apply(AlphaFold(Protein folding), FermiNet(Quantum mechanics)) and the AI is beating the best human designed algorithms but pure RL approaches have to learn a world model and a policy from scratch. Human data based approaches get both from the training dataset. This is an after the fact rationalisation of why LLMs have had so much success.

As a real life example, RL approaches have a ways to go before they can do things like solve real world action planning problems which LLMs get more or less for free(palm-saycan).

Could this go Superhuman?

Larger models should just end up overparameterized. All bets are off if you add 20 zeroes as with any ML system.

Capabilities Generalization?

Capabilities research for imitation AI shouldn't transfer well to RL because the core problem with RL is training signal sparsity and the resulting credit attribution problem. Knowing how the adult human brain works might help but info on early childhood development won't be there (hopefully) which is a good thing for ASI risk because early childhood brain development details would be very useful to anyone designing RL based AIs. Additionally, the human brain uses feedback connections extensively. Distilling an existing human brain from lots of intermediate neuron level data should stabilize things but from scratch training of a similar network won’t work using modern methods.

The bad scenario is if data collected allows reverse engineering a general purpose learning algorithm. Maybe the early childhood info isn't that important. Someone then clones a github repo adds some zeroes to the config and three months into the training run the world ends. Neuralink has some data from live animal brains collected over month-long timespans. Despite this, the world hasn't ended which lowers my belief this sort of data is dangerous. They also might be politely sitting on something very dangerous.

If capabilities don't transfer there's plausibly some buffer time between AGI and ASI. Scalable human level AGI based on smart technically capable non psychopaths is a very powerful tool that might be enough to solve big problems. There's a number of plausible okayish outcomes that could see us not all dying! That's a win in my book.

Isn't this just ems?

Yes, yes it is ... probably. In an ideal world maybe we wouldn't do this? On the one hand, human imitation based AGIs are in my moral circle, abusing them is bad, but some humans are workaholics voluntarily and/or don't have no-messing-with-my-brain terminal values such that they would approve of becoming aligned workaholics or contributing to an aligned workaholic gestalt especially in the pursuit of altruistic goals. Outcomes where ems don't end up in virtual hell are acceptable to me or at least better than the alternative(paperclips). Best case scenario, em (AI/conventional)programmers automate the boring stuff so no morally valuable entities have to do boring mindless work. There are likely paths where high skill level ems won't be forced to do things they hate. The status quo is worse in some ways. Lots of people hate their jobs.

Won't a country with no ethics boards do this first?

They might have problems trusting the resulting ems. You could say they'd have a bit of an AGI alignment problem. Also democratic countries have an oligopoly on leading edge chip manufacturing so large scale deployment might be tough. Another country could also steal some em copies and offer better working conditions. There's a book to be written in all of this somewhere.

Practical next steps (no brain implants)

Develop distillation methods that rely on degraded information about teacher model internal states
- Concretely:Develop distillation method using a noisy randomly initialised lower-dimensional projection of teacher internal state
- Concretely:Develop "mind reading" for the teacher model using similar noisy data
Develop methods to convert between recurrent and single pass networks
- this could have practical applications (EG:make LLMs cheaper to run by using a recurrent model trained to imitate internal states in the LLM thus re-using intermediate results)
- really this is more to show that training that normally blows up (large recurrent nets or nets with feedback) can be stabilized by having internal state from another model available to give local objectives to the training process.

If this can be shown to work and work efficiently, starting the real world biotech side becomes more promising.

[-]Gordon Seidoh Worley2y20

Interesting idea but I'm suspicious that LLMs are enough for us to accept these as EMs. I think more likely people will treat such trained models as not true EMs but rather like ghosts who are fixed on who the person was when the model was trained.

The idea from fiction that came to mind is the people in portraits in Harry Potter.

Of course, such a thing is still pretty useful! I'm not sure LLMs are good enough at the sort of online learning and ontological shifts and other complex things we expect from people and thus EMs.

[-]anithite2y32

This is just a way to take a bunch of humans and copy paste till current pressing problems are solvable. If public opinion doesn't affect deployment it doesn't matter.

Models that can't learn or change don't go insane. Fine tuning on later brain data once subjects have learned a new capability can substitute. Getting the em/model to learn in silicon is a problem to solve after there's a working model.

I edited the TL:DR to better emphasize that the preferred implementation is using brain data to train whatever shape of model the data suggests, not necessarily transformers.

The key point is that using internal brain state for training an ML model to imitate a human is probably the fastest way to get a passable copy of that human and that's AGI solved.

LESSWRONG
LW