SERI MATS '21, Cognitive science @ Yale '22, Meta AI Resident '23,  former LTFF grantee. Currently doing prosocial alignment research @ AE Studio. Very interested in work at the intersection of AI x cognitive science x alignment x philosophy.


Paradigm-Building for AGI Safety Research

I'm definitely sympathetic to the general argument here as I understand it: something like, it is better to be more productive when what you're working towards has high EV, and stimulants are one underutilized strategy for being more productive. But I have concerns about the generality of your conclusion: (1) blanket-endorsing or otherwise equating the advantages and disadvantages of all of the things on the y-axis of that plot is painting with too broad a brush. They vary, eg, in addictive potential, demonstrated medical benefit, cost of maintenance, etc. (2) Relatedly, some of these drugs (e.g., Adderall) alter the dopaminergic calibration in the brain, which can lead to significant personality/epistemology changes, typically as a result of modulating people's risk-taking/reward-seeking trade-offs. Similar dopamine agonist drugs used to treat Parkinson's led to pathological gambling behaviors in patients who took it. There is an argument to be made for at least some subset of these substances that the trouble induced by these kinds of personality changes may plausibly outweigh the productivity gains of taking the drugs in the first place.

27 people holding the view is not a counterexample to the claim that it is becoming less popular.

Still feels worthwhile to emphasize that some of these 27 people are, eg, Chief AI Scientist at Meta, co-director of CIFAR, DeepMind staff researchers, etc. 

These people are major decision-makers in some of the world's leading and most well-resourced AI labs, so we should probably pay attention to where they think AI research should go in the short-term—they are among the people who could actually take it there.


See also this survey of NLP

I assume this is the chart you're referring to. I take your point that you see these numbers as increasing or decreasing (despite that where they actually are in an absolute sense seems harmonious with believing that brain-based AGI is entirely possible), but it's likely that these increases or decreases are themselves risky statistics to extrapolate. These sorts of trends could easily asymptote or reverse given volatile field dynamics. For instance, if we linearly extrapolate from the two stats you provided (5% believe scaling could solve everything in 2018; 17% believe it in 2022), this would predict, eg, 56% of NLP researchers in 2035 would believe scaling could solve everything. Do you actually think something in this ballpark is likely?


Did the paper say that NeuroAI is looking increasingly likely?

I was considering the paper itself as evidence that NeuroAI is looking increasingly likely. 

When people who run many of the world's leading AI labs say they want to devote resources to building NeuroAI in the hopes of getting AGI, I am considering that as a pretty good reason to believe that brain-like AGI is more probable than I thought it was before reading the paper. Do you think this is a mistake?

Certainly, to your point, signaling an intention to try X is not the same as successfully doing X, especially in the world of AI research. But again, if anyone were to be able to push AI research in the direction of being brain-based, would it not be these sorts of labs? 

To be clear, I do not personally think that prosaic AGI and brain-based AGI are necessarily mutually exclusive—eg, brains may be performing computations that we ultimately realize are some emergent product of prosaic AI methods that already basically exist. I do think that the publication of this paper gives us good reason to believe that brain-like AGI is more probable than we might have thought it was, eg, two weeks ago.

However, technological development is not a zero-sum game. Opportunities or enthusiasm in neuroscience doesn't in itself make prosaic AGI less likely and I don't feel like any of the provided arguments are knockdown arguments against ANN's leading to prosaic AGI.

Completely agreed! 

I believe there are two distinct arguments at play in the paper and that they are not mutually exclusive. I think the first is "in contrast to the optimism of those outside the field, many front-line AI researchers believe that major new breakthroughs are needed before we can build artificial systems capable of doing all that a human, or even a much simpler animal like a mouse, can do" and the second is "a better understanding of neural computation will reveal basic ingredients of intelligence and catalyze the next revolution in AI, eventually leading to artificial agents with capabilities that match and perhaps even surpass those of humans." 

The first argument can be read as a reason to negatively update on prosaic AGI (unless you see these 'major new breakthroughs' as also being prosaic) and the second argument can be read as a reason to positively update on brain-like AGI. To be clear, I agree that the second argument is not a good reason to negatively update on prosaic AGI.

Thanks for your comment! 

As far as I can tell the distribution of views in the field of AI is shifting fairly rapidly towards "extrapolation from current systems" (from a low baseline).

I suppose part of the purpose of this post is to point to numerous researchers who serve as counterexamples to this claim—i.e., Yann LeCun, Terry Sejnowski, Yoshua Bengio, Timothy Lillicrap et al seem to disagree with the perspective you're articulating in this comment insofar as they actually endorse the perspective of the paper they've coauthored.

You are obviously a highly credible source on trends in AI research—but so are they, no? 

And if they are explicitly arguing that NeuroAI is the route they think the field should go in order to get AGI, it seems to me unwise to ignore or otherwise dismiss this shift.

Agreed that there are important subtleties here. In this post, I am really just using the safety-via-debate set-up as a sort of intuitive case for getting us thinking about why we generally seem to trust certain algorithms running in the human brain to adjudicate hard evaluative tasks related to AI safety. I don't mean to be making any especially specific claims about safety-via-debate as a strategy (in part for precisely the reasons you specify in this comment).

Thanks for the comment! I do think that, at present, the only working example we have of an agent able explicitly self-inspect its own values is in the human case, even if getting the base shards 'right' in the prosocial sense would likely entail that they will already be doing self-reflection. Am I misunderstanding your point here?  

Thanks Lukas! I just gave your linked comment a read and I broadly agree with what you've written both there and here, especially w.r.t. focusing on the necessary training/evolutionary conditions out of which we might expect to see generally intelligent prosocial agents (like most humans) emerge. This seems like a wonderful topic to explore further IMO. Any other sources you recommend for doing so?

Hi Joe—likewise! This relationship between prosociality and distribution of power in social groups is super interesting to me and not something I've given a lot of thought to yet. My understanding of this critique is that it would predict something like: in a world where there are huge power imbalances, typical prosocial behavior would look less stable/adaptive. This brings to mind for me things like 'generous tit for tat' solutions to prisoner's dilemma scenarios—i.e., where being prosocial/trusting is a bad idea when you're in situations where the social conditions are unforgiving to 'suckers.' I guess I'm not really sure what exactly you have in mind w.r.t. power specifically—maybe you could elaborate on (if I've got the 'prediction' right in the bit above) why one would think that typical prosocial behavior would look less stable/adaptive in a world with huge power imbalances?

I broadly agree with Viliam's comment above. Regarding Dagon's comment (to which yours is a reply), I think that characterizing my position here as 'people who aren't neurotypical shouldn't be trusted' is basically strawmanning, as I explained in this comment. I explicitly don't think this is correct, nor do I think I imply it is anywhere in this post.  

As for your comment, I definitely agree that there is a distinction to be made between prosocial instincts and the learned behavior that these instincts give rise to over the lifespan, but I would think that the sort of 'integrity' that you point at here as well as the self-aware psychopath counterexample are both still drawing on particular classes of prosocial motivations that could be captured algorithmically. See my response to 'plausible critique #1,' where I also discuss self-awareness as an important criterion for prosociality.  

Interesting! Definitely agree that if people's specific social histories are largely what qualify them to be 'in the loop,' this would be hard to replicate for the reasons you bring up. However, consider that, for example,

Young neurotypical children (and even chimpanzees!) instinctively help others accomplish their goals when they believe they are having trouble doing so alone...

which almost certainly has nothing to do with their social history. I think there's a solid argument to be made, then, that a lot of these social histories are essentially a lifelong finetuning of core prosocial algorithms that have in some sense been there all along.  And I am mainly excited about enumerating these. (Note also that figuring out these algorithms and running them in an RL training procedure might get us the relevant social histories training that you reference—but we'd need the core algorithms first.)

"human in the loop" to some extent translates to "we don't actually know why we trust (some) other humans, but there exist humans we trust, so let's delegate the hard part to them".

I totally agree with this statement taken by itself, and my central point is that we should actually attempt to figure out 'why we trust (some) other humans' rather than treating this as a kind of black box. However, if this statement is being put forward as an argument against doing so,, then it seems circular to me.

