(Target audience: People who are already quite familiar with my research.)
Sometimes people ask me for non-obvious, hobby-compatible, ways that they can help me do my AGI-safety-targeted neuroscience research.
(“Obvious” things include reading what I write and giving feedback; non-“hobby-compatible” things include going into connectomics or laboratory neuroscience, etc.)
When that happens, I appreciate the thought! 🙏 So here is a collection of neuroscience items from my to-do list, that I’m unlikely to do anytime soon.[1]
If someone wants to try their hand at any of these, have at it!! And good luck!!
(I really don’t expect anything to come of my posting this,[2] but what the hell, worth a shot.)
(I also have lots of to-do list items that are about AGI safety but not really neuroscience. These tend to be very messy and open-ended, not self-contained, and hard to describe. So I’m leaving them out of this post, with one exception (§3.1).)
This is related to my post “I’m confused about innate smell neuroanatomy” (2023). After publishing that post and getting feedback, I wound up sufficiently non-confused that I felt OK dropping the matter and moving on to other things. But I’m still mildly confused about innate smell neuroanatomy in various ways. It would be nice if there were a lit review on this topic that was less half-assed than mine. For example, in the post, I talk about finding a certain paper (“Russo 2018”) to be confusingly worded, and apparently inconsistent with the rest of the literature. Someone could just reach out to the authors or other SMEs and probably sort it out.
(The relevant parts of the “Steering Subsystem” here are mainly the hypothalamus, BNST, and to a lesser extent various brainstem and pallidal areas, I think.)
Such lit reviews already exist—e.g. Mei et al. 2023—but there’s always room for more scrutiny and study and synthesis.
According to my theories of the brain, every innate behavior in animals has a corresponding “symbol grounding problem” sorting out how that behavior winds up associated with the right external conditions or situations (see here & here). In my work, I’ve mostly focused on symbol grounding for human social instincts, since it’s most directly AI-alignment-relevant, but it would be interesting and informative to learn about others. For example:
I have some half-assed speculations about filial imprinting here. Filial imprinting definitely has a bunch of existing literature, which I mostly haven’t looked into. For the other ones, I basically haven’t looked into them at all.
In general, the easy part is reading papers, and the hard part is coming up with plausible nuts-and-bolts hypotheses.
For example, I sketch out a specific algorithm near the bottom of my post “Woods’ new preprint on object permanence” (2024), and suggest that the superior colliculus runs this algorithm (among other roles). This hypothesis could be compared with the massive experimental literature on superior colliculus structure and function, and refined or replaced.
As another example, if I’m correct that the superior colliculus[3] detects slithering snakes and skittering spiders, exactly where in the superior colliculus does that happen, and how? It’s not a deep mystery—it seems like the kind of thing that the superior colliculus ought to be able to do—but pinning down the detailed pathways and anatomy etc. would be valuable.
For example, write down a circuit that plausibly captures human anger. Why is it so satisfying to find someone to blame? How does anger work? If Ahmed is angry at Bob, Ahmed will generally feel very strongly that Bob should be aware that Ahmed is angry at him—this fact seems like a hint about what’s going on.
Anxiety is probably not a “social behavior” per se, but I’m interested in that one too, see my brief discussion in §6 here.
I feel like I have nice elegant frameworks for understanding NPD, ASPD, and BPD. What about OCPD? What about all the other ones? (DSM lists 6 more personality disorders: “paranoid”, “schizoid”, “schizotypal”, “histrionic”, “voidant”, “dependent”.) Warning that this might be a dead-end, because these diagnoses might not be “cutting reality at the joints”. For example, I think the ASPD diagnosis is actually applied in practice to two quite different groups of people. (Possibly related to a couple of these: my ideas about schizophrenia—1,2.)
For example, Tadross et al. 2025 set up an open-source pipeline for comparing hypothalamus cells between humans and mice. Then they apply that general-purpose tool to analyze mouse-human differences pertinent to obesity. It might be cool to reproduce the pipeline, but instead hone in on differences pertinent to sociality, e.g. neurons that produce or detect oxytocin, or cells that have been identified in studies of parenting or aggression or loneliness or whatever.
There’s also the “BICCN cell atlas”, and probably many other resources that I don’t know about. Not sure what there is to be found here.
This one is not really neuroscience, but very important. I feel pretty blocked on the basis that one presumably needs to have a plan for AGI alignment before thinking about how to test and validate that plan. But I dunno, maybe there are things to do right now? See some of my discussion here & here.
As for actually building secure test environments and such, that might be a big project. A few people have toyed with the idea of starting such projects, but to my knowledge nobody is actually doing so right now.[4]
Basically, I think human personality variation largely comes down to innate social drives and reactions having different strengths in different people. Now, the literature on human personality variation is a mountain of disparate data, contradictory theories, and decades-old controversies; meanwhile, my ideas on how human innate social drives work are somewhat vague and incomplete. So using each of these to help “solve” the other would be a hell of a project.
It sounds staggeringly arrogant for me to say it, but I think I might have a shot at success! Or at least, it would sure be fun to try. But I have more urgent priorities, and am very unlikely to even start thinking about this before, I dunno, mid-2026, if ever.
In fact, I’ve mostly been on neuroscience research hiatus for the past 10 months and counting, as planned.
As the saying goes, “the first and foremost thing which any ordinary person does is nothing”. More specifically, it’s not that nobody has the initiative and ability to tackle one of these problems, but rather that the people who do have that kind of initiative and ability have also probably already come up with their own things that they want to work on.
Or maybe the neighboring parabigeminal nucleus or whatever.
I have vague recollections that up to three AI safety groups at least toyed with the idea of doing things in this vicinity, but I think none of them wound up following through. I think one of them involved Andrew Critch, the second involved Jacob Cannell, and the third was never publicly announced so I won’t share, but the person seems to have moved on to other things. I might be misremembering though.