Over the upcoming weeks I'll be posting highlights from the work on X, and you can also subscribe on the report website to get updates on additional data releases and translations.
I'll paste the executive summary verbatim below. Enjoy!
Accurate brain emulations would occupy a unique position in science: combining the experimental control of computational models with the biological fidelity needed to study how neural activity gives rise to cognition, disease, and perhaps consciousness.
A brain emulation is a computational model that aims to match a brain’s biological components and internal, causal dynamics at a chosen level of biophysical detail. Building a brain emulation requires three core capabilities: 1) recording brain activity, 2) reconstructing brain wiring, and 3) digitally modelling brains with respective data. In this report, we explain how all three capabilities have advanced substantially over the past two decades, to the point where neuroscientists are collecting enough data to emulate the brains of sub-million neuron organisms, such as zebrafish larvae and fruit flies.
The first core technique required to build brain emulations is neural dynamics, in which electrodes are used to record how neurons — from a few dozen to several thousands — fire. Functional optical imaging transitioned from nascent technology to large-scale recordings: calcium imaging, where genetically encoded indicators report correlates of neural activity, now captures approximately one million cortical neurons in mice (though without resolving individual spikes), while voltage imaging resolves individual spikes in tens of thousands of neurons in larval zebrafish. Taking neuron count and sampling rate into account, these improvements represent about a two-order-of-magnitude increase in effective data bandwidth of neural recordings in the past two decades.
Causal perturbation methods, like optogenetics, have also improved. It is now feasible to propose systematic reverse-engineering of neuron-level input-output relationships across entire small nervous systems. Yet, neural activity recording today still faces significant trade-offs across spatial coverage, temporal resolution, recording duration, invasiveness, signal quality, and behavior repertoire. Even more challenging is recording of modulatory molecules like hormones and neuropeptides. Defining “whole-brain” as capturing more than 95 percent of neurons across 95 percent of brain volume simultaneously, no experiment to date has delivered that scale with single-neuron, single-spike resolution in any organism during any behavior. It seems plausible that this barrier will be overcome for sub-million neuron organisms in the upcoming years.
The second core technique, Connectomics, is used to reconstruct wiring diagrams for all neurons in a brain. Connectomics models have today moved past C. elegans worm brain mappings to produce, more recently, two fully reconstructed adult fruit fly brain connectomes. This is a big achievement because fruit flies have about three orders-of-magnitude more neurons than a C. elegans worm. Several additional scans in other organisms, such as larval zebrafish, have also been acquired and are expected to complete processing in the near future. Dataset sizes now increasingly reach petabyte scale, which challenges storage/backup infrastructure not only with costs, but also the ability to share and collaborate.
It is faster to make connectomics maps today than it was just a few years ago, in part because of how the actual images are acquired and “stitched” together. Progress is being enabled by a mix of faster electron microscopy, automated tissue handling pipelines and algorithmic image processing / neuron tracing. Each of these improvements have contributed to push cost per reconstructed neuron from an estimated $16,500 in the original C. elegans connectome to roughly $100 in recent larval zebrafish projects. Proofreading, the manual process of fixing errors from computerized neuron tracing, remains the most time- and cost-consuming factor. This holds particularly for mammalian neurons with large size and complex morphologies. Experts are optimistic that machine-learning will eventually overcome this bottleneck and reduce costs further. As of now, all reconstruction efforts are basically limited to contour tracing to reconstruct wiring diagrams, but lack molecular annotations of key proteins, limiting their direct utility for functional interpretation and computational modeling. Many experts are optimistic that, in the future, one might be able to build connectomes much more cheaply by using expansion microscopy, rather than electron microscopy, combined with techniques that enable molecular annotation, including protein barcoding for self-proofreading.The final capability is Computational Neuroscience, or the ability to model brains faithfully. The capacity to simulate neural systems has advanced, enabled by richer datasets, more powerful software and hardware. In C. elegans, connectome-constrained and embodied models now reproduce specific behaviors, while in the fruit fly, whole-brain models recapitulate known circuit dynamics. At the other end of the spectrum, feasibility studies on large GPU clusters have demonstrated simulations approaching human-brain scale, albeit with simplified biophysical assumptions.
On the hardware side, the field has shifted from specialized CPU supercomputers toward more accessible GPU accelerators. For mammalian-scale simulations, the primary hardware bottlenecks are now hardware memory capacity and interconnect bandwidth, not raw processing power. On the software side, improvements come from automatically differentiable data-driven model parameter fitting, efficient simulation methods and the development of more rigorous evaluation methods. Still, many biological mechanisms like neuromodulation are still largely omitted. A more fundamental limitation is that models remain severely data-constrained. Experimental data are scarce in general, complementary structural and functional datasets from the same individual are rare, and where they exist, they lack sufficient detail. Moreover, passive recordings alone struggle to uniquely specify model parameters, highlighting the need for causal perturbation data.
Conclusion The past two decades delivered meaningfully improved methods and a new era of scale for data acquisition. Two challenges will shape the next phase of research: first, determining which biological features (from gap junctions to glial cells and neuromodulators) are necessary to produce faithful brain emulation models. Empirically answering such questions calls for more comprehensive evaluation criteria to include neural activity prediction, embodied behaviors and responses to controlled perturbations.
Second, there is a widening gap between our ability to reconstruct ever-larger connectomes and our much more limited capacity to record neural activity across them. This discrepancy necessitates that the neuroscience community develops better methods to infer functional properties of neurons and synapses primarily from structural and molecular data. For both challenges, sub-million neuron organisms — where whole-brain recording is already feasible — present a compelling target. Here, comprehensive functional, structural, and molecular datasets are attainable at scale, making it possible to empirically determine which biological details are necessary for a faithful emulation. Furthermore, the cost-efficient collection of aligned structural and neural activity datasets from multiple individuals provides the essential ground truth for developing and rigorously evaluating methods to predict functional properties from structure alone. The evidence this generates, defining what is needed for emulation and validating methods that infer function from structure, will be critical to guide and justify the large-scale investments required for mammalian brain projects.
In short, faithful emulation of small brains is the necessary first step toward emulating larger ones. To make that happen …mammalian brain projects will also require parallel progress in cost-effective connectomics. The deeply integrated, end-to-end nature of this research calls for integrated organizational models to complement the vital contributions of existing labs at universities and research campuses.
A one-year project with over 45 expert contributors from MIT, UC Berkeley, Allen Institute, Harvard, Fudan University, Google and other institutions.
You can find all of the content on https://brainemulation.mxschons.com
If you are new to the field, please check-out the companion article on Asimov Press: https://www.asimov.press/p/brains/
Over the upcoming weeks I'll be posting highlights from the work on X, and you can also subscribe on the report website to get updates on additional data releases and translations.
I'll paste the executive summary verbatim below. Enjoy!
Accurate brain emulations would occupy a unique position in science: combining the experimental control of computational models with the biological fidelity needed to study how neural activity gives rise to cognition, disease, and perhaps consciousness.
A brain emulation is a computational model that aims to match a brain’s biological components and internal, causal dynamics at a chosen level of biophysical detail. Building a brain emulation requires three core capabilities: 1) recording brain activity, 2) reconstructing brain wiring, and 3) digitally modelling brains with respective data. In this report, we explain how all three capabilities have advanced substantially over the past two decades, to the point where neuroscientists are collecting enough data to emulate the brains of sub-million neuron organisms, such as zebrafish larvae and fruit flies.
The first core technique required to build brain emulations is neural dynamics, in which electrodes are used to record how neurons — from a few dozen to several thousands — fire. Functional optical imaging transitioned from nascent technology to large-scale recordings: calcium imaging, where genetically encoded indicators report correlates of neural activity, now captures approximately one million cortical neurons in mice (though without resolving individual spikes), while voltage imaging resolves individual spikes in tens of thousands of neurons in larval zebrafish. Taking neuron count and sampling rate into account, these improvements represent about a two-order-of-magnitude increase in effective data bandwidth of neural recordings in the past two decades.
Causal perturbation methods, like optogenetics, have also improved. It is now feasible to propose systematic reverse-engineering of neuron-level input-output relationships across entire small nervous systems. Yet, neural activity recording today still faces significant trade-offs across spatial coverage, temporal resolution, recording duration, invasiveness, signal quality, and behavior repertoire. Even more challenging is recording of modulatory molecules like hormones and neuropeptides. Defining “whole-brain” as capturing more than 95 percent of neurons across 95 percent of brain volume simultaneously, no experiment to date has delivered that scale with single-neuron, single-spike resolution in any organism during any behavior. It seems plausible that this barrier will be overcome for sub-million neuron organisms in the upcoming years.
The second core technique, Connectomics, is used to reconstruct wiring diagrams for all neurons in a brain. Connectomics models have today moved past C. elegans worm brain mappings to produce, more recently, two fully reconstructed adult fruit fly brain connectomes. This is a big achievement because fruit flies have about three orders-of-magnitude more neurons than a C. elegans worm. Several additional scans in other organisms, such as larval zebrafish, have also been acquired and are expected to complete processing in the near future. Dataset sizes now increasingly reach petabyte scale, which challenges storage/backup infrastructure not only with costs, but also the ability to share and collaborate.
It is faster to make connectomics maps today than it was just a few years ago, in part because of how the actual images are acquired and “stitched” together. Progress is being enabled by a mix of faster electron microscopy, automated tissue handling pipelines and algorithmic image processing / neuron tracing. Each of these improvements have contributed to push cost per reconstructed neuron from an estimated $16,500 in the original C. elegans connectome to roughly $100 in recent larval zebrafish projects. Proofreading, the manual process of fixing errors from computerized neuron tracing, remains the most time- and cost-consuming factor. This holds particularly for mammalian neurons with large size and complex morphologies. Experts are optimistic that machine-learning will eventually overcome this bottleneck and reduce costs further. As of now, all reconstruction efforts are basically limited to contour tracing to reconstruct wiring diagrams, but lack molecular annotations of key proteins, limiting their direct utility for functional interpretation and computational modeling. Many experts are optimistic that, in the future, one might be able to build connectomes much more cheaply by using expansion microscopy, rather than electron microscopy, combined with techniques that enable molecular annotation, including protein barcoding for self-proofreading.The final capability is Computational Neuroscience, or the ability to model brains faithfully. The capacity to simulate neural systems has advanced, enabled by richer datasets, more powerful software and hardware. In C. elegans, connectome-constrained and embodied models now reproduce specific behaviors, while in the fruit fly, whole-brain models recapitulate known circuit dynamics. At the other end of the spectrum, feasibility studies on large GPU clusters have demonstrated simulations approaching human-brain scale, albeit with simplified biophysical assumptions.
On the hardware side, the field has shifted from specialized CPU supercomputers toward more accessible GPU accelerators. For mammalian-scale simulations, the primary hardware bottlenecks are now hardware memory capacity and interconnect bandwidth, not raw processing power. On the software side, improvements come from automatically differentiable data-driven model parameter fitting, efficient simulation methods and the development of more rigorous evaluation methods. Still, many biological mechanisms like neuromodulation are still largely omitted. A more fundamental limitation is that models remain severely data-constrained. Experimental data are scarce in general, complementary structural and functional datasets from the same individual are rare, and where they exist, they lack sufficient detail. Moreover, passive recordings alone struggle to uniquely specify model parameters, highlighting the need for causal perturbation data.
Conclusion The past two decades delivered meaningfully improved methods and a new era of scale for data acquisition. Two challenges will shape the next phase of research: first, determining which biological features (from gap junctions to glial cells and neuromodulators) are necessary to produce faithful brain emulation models. Empirically answering such questions calls for more comprehensive evaluation criteria to include neural activity prediction, embodied behaviors and responses to controlled perturbations.
Second, there is a widening gap between our ability to reconstruct ever-larger connectomes and our much more limited capacity to record neural activity across them. This discrepancy necessitates that the neuroscience community develops better methods to infer functional properties of neurons and synapses primarily from structural and molecular data. For both challenges, sub-million neuron organisms — where whole-brain recording is already feasible — present a compelling target. Here, comprehensive functional, structural, and molecular datasets are attainable at scale, making it possible to empirically determine which biological details are necessary for a faithful emulation. Furthermore, the cost-efficient collection of aligned structural and neural activity datasets from multiple individuals provides the essential ground truth for developing and rigorously evaluating methods to predict functional properties from structure alone. The evidence this generates, defining what is needed for emulation and validating methods that infer function from structure, will be critical to guide and justify the large-scale investments required for mammalian brain projects.
In short, faithful emulation of small brains is the necessary first step toward emulating larger ones. To make that happen …mammalian brain projects will also require parallel progress in cost-effective connectomics. The deeply integrated, end-to-end nature of this research calls for integrated organizational models to complement the vital contributions of existing labs at universities and research campuses.