Notable Progress Has Been Made in Whole Brain Emulation

Dom Polsinelli

Summary

We have [relatively] recently scanned the whole fruit fly brain, simulated it, confirmed it is pretty highly constrained by morphology alone. Other groups have been working on optical techniques and genetic work to make the scanning process faster and simulations more accurate.

Fruit Flies When You’re Having Fun

The Seung Lab famously mapped the fruit fly connectome using serial section electron microscopy. What is underappreciated is that another group used this information to create a whole brain emulation of the fruit fly. Now, it used leaky integrate and fire neurons and did not model the body of the fly, but it is still a huge technical achievement. The first author has gone off to work at Eon Systems, which is very explicitly aimed at human whole brain emulation.

They did some cool things in the simulation. One is that they shuffled the synaptic weights to see how much that changed the neural activity. Turns out, quite a bit. This is a good thing because it means they’re probably right about how synaptic weight manifests in morphology.

Although modelling using the correct connectome results in robust activation of MN9 in 100% of simulations when sugar-sensing neurons are activated at 100 Hz, only 1 of 100 shuffled simulations did (Supplementary Table 1d). Therefore, the predictive accuracy of our computational model depends on the actual connectivity weights of the fly connectome.

I would recommend reading the whole paper. I think I would do it a disservice by giving an intermediate level of detail in a summary. They just got mind-blowingly good results for such a simple model and it really gives me hope that the actual simulation aspect is a much more tractable problem than I once thought^[1].

Connectome Tracing Now And The Near Future

The two biggest issues with connectome tracing right now are speed and accuracy. It takes a long time to image all the samples and is very costly to parallelize the process because electron microscopes are expensive. As for accuracy, it seems like it would be unreasonable to ask for more resolution than an electron microscope offers. This is true but because everything is grayscale segmentation becomes hugely challenging. One of the biggest bottlenecks in the pipeline is human proofreading of the data. We have a good algorithm for this, but it does require a substantial human effort after the first pass. The whole fruit fly brain took ~33 years of human proofreading to complete. Accuracy stays around 90% in the most optimistic case without human involvement. A naïve extrapolation from the fly → mouse brain time would be ~10,000 years of proofreading which is suboptimal. Additionally, many of the proofreaders were trained in neuroanatomy which would further increase the difficulty of using human workers for this process. So yeah, I really want people to work on this problem it seems very important to me.

I am of the opinion that electron microscopy is not the way forward because of these factors and others that will be discussed later. Still, it is the only proven technology and there may be a place to do a hybrid approach with optical providing some information using traditional stains and electron microscopes providing the highest possible resolution.

There are also issues of sample preparation and the exact kind of electron microscope you use. Samples must be sectioned very thin in the axial direction as scanning microscopes can’t see subsurface detail and transmission microscopes have limited penetration depth. If the samples are cut mechanically they generally have artifacts from that which make segmentation across the boundary more challenging. Samples can be destroyed with ion milling or treated such that they photodegrade, this leaves a much cleaner surface for the next imaged section but destroying the samples makes multiple imaging steps challenging.

For much, much more detail I recommend reading this projection for what would it would take to image a whole mouse brain.

Multimodal Data Analysis

There are two big obvious limitations with the fruit fly simulation. The first is that it does not even attempt to model the rest of the fly’s body. I’m comfortable with this, people have been trying to simulate C. elegans for decades now and they still don’t have a complete biophysical model. This is a big challenge, but not my chief interest. The second limitation is in their cell model itself. They used a leaky integrate and fire model that was identical for each neuron. I understand why they did this and I don’t think they actually could have done much better with the data they had, but they also openly admit this is a limitation. Well, there is some recent progress that addresses this gap.

Neurons are inhomogeneous in many ways. One is electrical activity, two neurons will spike differently when given the same current stimulus. Another is gene expression. There are a lot of genes that are known to govern ion channels which determine the electrical activity in the neuron. It is a very natural to ask whether or not you can predict the electrophysiology properties of the neurons based on the gene expression. Well, one recent paper sets out to answer just that. I would say that the big conclusion relevant to this is

… that the variation in gene expression levels can be used to predict electrophysiology accurately on a family level but less so on the cell-type level.

Despite this, I am still confident about this technique being viable for generating models of individual neurons. Why is that? Because the technique they used to measure gene expression is known to be inaccurate. Other methods of measuring gene expression, I am a proponent of MERFISH^[2], are comparable or perhaps even better. In the event that these techniques remain inaccurate or are insufficient by themselves, it seems likely that traditional antibody staining could allow for direct measurement of ion channel density^[3].

I also want to make it very clear that I have an extreme admiration for the work that they did. I personally tried using some of the same data to achieve the gene expression to biophysical model transformation and can attest to the fact that it is quite challenging. Their paper has a lot of stuff in it I wish I tried, and is quite readable in my opinion. Specifically, I applaud them for trying to fit a relatively simple model. One of my biggest frustrations when I read neuroscience papers are people trying to answer questions they clearly didn’t lay the groundwork for.

Now, even once that is achieved we will still be missing some important factors like how hormones or peptides influence the activity of a neuron. But this is a step in the right direction. Knowing the connectome with weights gets you a long way, making specific cell models gets you closer, knowing the effects of hormones, peptides, blood flow, whatever glial cells are doing, etc. matters but might have a collectively smaller effect than the first two factors. I am not sure how confident I am in that statement, biology is an bottomless well of complexity and some of those higher order effects could be much more important than I appreciate. But all this is really just dancing around my main opinion, which I endorse quite strongly, that we need to have a model more specific than a single template leaky integrate and fire neuron for most of the neurons in the brain and we can likely achieve this with current generation imaging techniques.

E11 Bio

E11 Bio is a focused research organization that is, well, focused on researching connectome mapping. They have a cool technique combining expansion microscopy and genetic barcoding to make tracing neurons much much easier.

I discussed the limitations of electron microscopy above. Well, expansion microscopy is a cool way to get around this. The sample can be permeated by a hydrogel that swells causing the whole thing to expand roughly homogenously. This can be up to 10x in a single step iirc but that cool thing is that you can do it multiple times if you really want. E11 Bio is doing 5x which I trust is sufficient for their need. The genetic barcoding is a way to have functionally infinite color channels such that you can uniquely identify each neuron. I’m not natively a genetics guy so I might summarize this wrong, but my understanding is that each neuron is infected by a random subset of viruses that are injected into the brain. Each virus codes for a specific protein that can be bound to antibody stains. By sequentially staining and then washing away antibodies bound to fluorescent probes you can image the sample once for each possible virus. Each neuron will either be infected or not for each given virus and so it will either fluoresce or not for each given stain/image/wash cycle. This gives each one a unique bit string to identify it even across long projections. All in all, very cool and computationally more simple than trying to segment cell images taken in grayscale. It only marginally improves (~5x fewer errors) the automatic segmentation accuracy and would still rely heavily on human proofreading^[4]. But still, very obvious step in the right direction and I am glad to hear it is being worked on.

You do start to get issues with distortion if you expand too much but then it becomes an engineering trade off. Would you rather the computer have to correct for these distortions, or deal with the numerous physical and computational challenges EM data introduces? I’ll admit I’m biased here but the technology is really cool and opens up a huge range of microscopy techniques giving potential OOMs improvements for imaging and post processing speed. If you are interested in connectome tracing feasibility, I would recommend this paper comparing expansion microscopy to electron microscopy. Their most optimistic timeline for mice is ~5 days but ~30 years for a human brain. 30 years is a long time to wait around, improvements will be made in speed and cost allowing more work to be done in parallel but it is unclear if imaging a whole human brain in sufficient resolution will be feasible any time soon.

What I Would Work On

Based on the above, there are several key problems that I think need to be addressed if we want to do whole brain emulation. This is by no means an exhaustive list, these are things I can point to as clearly identified gaps.

High throughput imaging with sufficient detail, ideally less than 10nm in all directions^[5]
1. Improve mechanical or destructive sectioning to gather all necessary information while minimizing artifacts at the boundaries
2. Speed is the biggest consideration, this can be achieved by bringing cost down so more microscopes can operate in parallel or by making each on faster without increasing cost proportionally
More sub cellular detail
1. Find the density of ion channels for a particular neuron
2. Identify gap junctions between neurons
3. Identify the neurotransmitters used by each neuron more accurately^[6]
4. A way to extract information relevant to neuromodulation, this is not possible or extremely hard with EM data
Improve automated segmentation, eliminating the need for any human proofreading is ideal
More advanced modeling of cells with verification that the subcellular details listed above recreate the electrical and chemical activities accurately
This is a lot of data, you need a lot of storage and fast transfers to avoid that bottlenecking the microscopes^[7]

^{^}
I am still really worried about biological learning rules, I don’t think anyone understands those well enough that we could make a WBE of a mouse and have it memorize a maze or something. This is a drum I beat frequently but this is not the time to go into the gory details and honestly I should know more than I do before making such sweeping claims.
^{^}
MERFISH can measure a specific subset of genes optically. It requires multiple steps to attach and detach the antibodies but because it optical it can be done in parallel with large FOV microscopes. I am unsure if it can be combined with E11’s PRISM but if it could I think that would be super neat and should not add any time.
^{^}
As far as I know, nobody has used antibody staining to measure ion channel densities and create a corresponding, accurate, biophysical model. If such a thing exists, this sections is largely moot but I would be really happy to read that paper.
^{^}
I’m not doing the “accuracy” metric justice in that sentence or this footnote. It breaks down into a few sub problems. There is identifying which cell is which and then there is identifying which cells are connected. There are the cells falsely being split apart leaving something just hanging out unassigned or parts being falsely merged with the wrong cell. Bottom line is this: if you know how to do computer vision you should work on this problem, it is important and cool!
^{^}
As said previous, expansion microscopy lets you get away with a microscope that does not have that high resolution natively. If you have a 10x expansion factor you can have a resolution of 100nm. The fruit fly brain was mapped with 4x4x40 nm voxels.
^{^}
It is often assumed that they only use one, this is called Dale’s Law and is not 100% accurate. It is unclear to me how important the second or third most commonly used neurotransmitter is to a particular neuron or the computation at large.
^{^}
I hesitate to put this here because it feel like a problem that will be solved by the normal computer industry well before it becomes a real issue for WBE but it was mentioned as a serious problem, exabytes of data are no joke

(Just noting that I agree with your footnote that the learning part is the hard part; that's the part that seems necessary for real minds, and that, when I ask neuro people about it, they're like "oh yeah, we don't know how to do that".)

I do wonder how behaviorally similar a synapse-frozen human brain emulation would be to a weight-frozen LLM.

I can't say for sure but a (really interesting) worst case scenario is this guy with a 7 second memory:

Wouldn't this be analogous to a LLM with a very tiny context window in addition to frozen weights?

I'm skeptical of WBE happening for reasons discussed here but still enthusiastic about the connectomics stuff for other reasons discussed here.

RE Dale's Law, I recall reading that almost every neuron in the brain detects and produces one or more neuropeptides.

Hey Steven this is unrelated but I wanted to say I really appreciate your posts and comments here!

Having just reread your objections to uploading without reverse engineering I think it merits a response more detailed than the one I am about to give. It may be correct or at least have some room for middle ground where a lot of the short timescale/easy stuff is directly simulated and then corrections are ~~spaghetti coded~~ added on to prevent particular failures with data from real experiments.

That said, my (limited) experience with trying to reverse engineer what is going on in a mouse's brain during social interaction makes me feel utterly hopeless and everyday I dream about how much easier it would be if we could do a barebones direct simulation like the fruit fly simulation to see if we are even on the right track. Because of this (again, quite limited) experience trying to do something like the reverse engineering you suggest, I expect it to take ~forever whereas disentangling the mess that is all the higher order corrections past simple electrical models of cells connected with one way chemical synapses would merely take a really, really long time.

Also, I think doing the kind of reverse engineering on humans is challenging for purely ethical reasons whereas sufficiently detailed models for neuron/other components from mice would just carry over to human WBE much better than a fully reverse engineered mouse. I may be misunderstanding what depth you feel reverse engineering is necessary and what experiments it would require.

oh oops sorry if I already shared that with you, I forgot, didn’t mean to spam.

My actual expectation is that WBE just ain’t gonna happen at all (at least not before ASI), for better or worse. I think the without-reverse-engineering path is impossible, and the with-reverse-engineering path would be possible given infinite time, but would incidentally involve figuring out how to make ASI way before the project is done, and that recipe would leak (or they would try it themselves). Or even more realistically, someone else on Earth would invent ASI first, via an unrelated effort. So I spend very little time thinking about WBE.

You left a comment one a previous post of mine about this but that was almost a year ago I think, so hardly spam.

My perfect very wishful thinking world involves ASI miraculously not happening and normal human neuroscience efforts shifting toward uploading and away from what it is now, which is a lot of wheel spinning and performative science. I do not assign a high probability to either of these. I also feel I am not well informed enough on either to make such sweeping claims.

then corrections are spaghetti coded added on to prevent particular failures with data from real experiments

My guess would be that the failures would be quite systematic, and would reflect the absence of substantial algorithms. That would suggest that you either have to come up with more algorithms, and/or you have to learn them from data. But to learn them from data without coming up with the algorithms or with algorithmic search spaces that sufficiently promote the relevant pieces, you need a lot of data; and brain algorithms that work on a time scale of an hour or a day have correspondingly or $10^{5}$ less data feasibly available compared to ~second-long events.

Anyone trying to use super-resolution microscopy techniques without or alongside expansion? Or is that still under "microscopes too expensive" according to popular wisdom? I guess yeah, trying to modulate the phase of polarized light so you can get extra spatial data from the Fourier transform (or whatever) sounds expensive.

I can't say that I have heard of anyone doing that specifically for connectomics. I would not be surprised if I just missed it this paper (already in the post) or it is being done for other biological research but is too expensive for connectomics. I would also recommend looking at this guy's new lab/old work for another place to start.

If you want to scan the whole brain you don't want contrasting agents, you just scan the biofield, which gives you accuracy down to photons. There is a lot of noise in the biofield, but you can reduce it by focusing on fewer parts at a time. You get molecular fingerprints, which would be unlabeled, and the big effort is effectively labeling which is which. Nonetheless, if your goal is just WBE you don't need to know "what each part is", but just the cause and effect of a sequence of variables. This has already been done...but I digress.

I do not understand exactly what you mean. Are you proposing something like this: https://en.wikipedia.org/wiki/Energy-dispersive_X-ray_spectroscopy

in molecular detail? Looking up the words "biofield" gives some weird and highly varied stuff. Anyway, iirc from

https://gwern.net/doc/ai/scaling/hardware/2008-sandberg-wholebrainemulationroadmap.pdf

most scientists don't think a direct molecular simulation is wise or necessary.

Where has it already been done? What sequence of variables do we know the cause and effect of?

Bioresonance. Current (public) hardware and software can get you to 1mm resolution, advancements could probably get you to cell size. You don't exactly get molecular resolution, but exchange of energy (electrons, photons in bulk and their spin), which is what I mean by cause and effect. The technology is non invasive and is not damaging (like xrays) so you can collect lots of data and then train a NN on it. The output of the NN still needs labeling though, but labeling the outputs is a much smaller problem. Its "probably" been done before because the technology is old but not really popular in US as it is in Russia (and east Europe), but still not public because not all science is open.