Ryan Meservey's Shortform

20th Aug 2025

1 min read

1

This is a special post for quick takes by Ryan Meservey. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Ryan Meservey's Shortform

14 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:31 PM

[-]Ryan Meservey3mo1618

I randomly met Jeff Dean (Google's lead AI scientist) on my bike ride home today. We were both stuck at a train intersection, and I had a cute kid in tow. We started chatting about my e-bike, the commute, and we got around to jobs. I told him I am a boring tax lawyer. He told me he worked for Google. I pressed a little more, and he explained he was a scientist. I mused, "AI?" and he told me, "Yeah."

I excitedly told him that I've been really interested in alignment the last few months (reading LW, listening to lectures), and it strikes me as a huge problem. I asked him if he was worried.

He told me that he thinks AI will have a big impact on society (some of it worrying) but he doesn't buy into the robots-taking-over thing.

I smiled and asked him, "What's your p(doom)?" to which he responded "very low" and said he thinks the technology will do a lot of good and useful things.

I thought maybe this was because he thinks that the technology will hit a limit soon, so I asked him if he thought LLMs would successfully scale. He responded that he thinks a few more breakthroughs are required but there have been lots of breakthroughs over the last 5-10 years, and so the technology is likely to continue improving in the coming years.

I told him again that I am worried about alignment, but even if you solve alignment, you are left with a very obedient superintelligence which would radically change our society and all our politics.

The train finally passed, I thanked him for the conversation, and we were on our way.

I'm new to this group and the topic in general, and so when I got home, I searched "AI google Palo Alto LinkedIn" and Jeff's picture popped up. I now feel like I bumped into Oppenheimer during the Manhattan Project, but instead of knowing it was Oppenheimer, I spent a majority of the conversation talking about my bike seat.

Anyways, if any of you were looking for a qualitative measure of how much LessWrong has broken through to people, I think one good measure is a tax lawyer asking for Jeff Dean's p(doom) while he was walking home from work.

[-]leogao3mo6230

lots of very important people spend all day being pestered by people due to their power/importance. at least some of them appreciate occasional interactions with people who just want to chat about something random like their bike seat

[-]dr_s3mo121

I smiled and asked him, "What's your p(doom)?" to which he responded "very low" and said he thinks the technology will do a lot of good and useful things.

I mean:

would he be Google's lead AI scientist if he didn't? He'd have to be insane or incredibly psychopathic. A lot more likely that he just believes that (and if it's his one giant blind spot on which he's dead wrong, doesn't change much)
supposing he didn't in fact believe that... would he say so to a random person he just struck a conversation with? One who could then look out his picture on LinkedIn, connect the dots, and go "GOOGLE'S LEAD AI SCIENTIST SAYS AI WILL KILL US ALL" on the internet?

[-]lc3mo45

He'd have to be insane or incredibly psychopathic

Unfortunately I think this is a misunderstanding of what a psychopath is.

[-]StanislavKrym3mo42

would he be Google's lead AI scientist if he didn't? He'd have to be insane or incredibly psychopathic.

What matters is not whether p(doom) is low or high, but whether his joining GDM would increase or decrease p(doom). If his joining GDM changed p(doom)^[1] from 0.5 to 0.499, then it would arguably be a noble act. Alas, there would be an obvious counterargument like the belief that he decreased p(doom) by researching at GDM being erroneous.

However, doom could also be a blind spot, as happened with Musk who decided to skip red-teaming of Grok to the point of the MechaHitler scandal or Grok ranting about white genocide in S.Africa...

^{^}
p(doom) alone could also be a misguided measure. Suppose that doom is actually caused by adopting neuralese without having absolutely solved alignment, while creating an alternate well-monitorable architecture is genuinely hard. If the efforts invested in creating the architecture are far from the threshold where it can compete with neuralese, then a single person joining said efforts would also likely commit a noble act, but it would alter p(doom) only if lots of people do so.
While this act does actually provide dignity in the Yudkowsky sense, one can also imagine a scenario where Anthropoidic doubles down on the alternate architecture while xRiskAI or OpenBrain uses neuralese, wins the capabilities race and has Anthropoidic shut down.

[-]dr_s3mo30

I think that then goes to my second point though: supposing he did believe that p(doom) is high, and worked as lead AI scientist at Google regardless due to utilitarian calculations, would he talk freely about it to the first passerby?

Politically speaking it would be quite a hefty thing to say. If he wanted to say it publicly, he would do so in a dedicated forum where he gets to control the narrative best. If he wanted to keep it secret he simply wouldn't say it. Either way, talking about it lightly seems out of the question.

[-]Linch3mo31

Dario Amodei (Anthropic cofounder and CEO), Shane Legg(co-founder and Chief AGI scientist of Google DeepMind), and others have numbers that are not plausibly construed as "very low."

[-]tryhard10003mo20

Interesting, thank you for sharing! As someone also newer to this space, I'm curious about estimates for the proportion of people in leading technical positions similar to "lead AI scientist" at a big company who would actually be interested in this sort of serendipitous conversation. I was under the impression that many in the position "lead AI scientist" at a big company would be either too 1) wrapped up in thinking about their work/pressing problems or 2) uninterested in mundane small-talk topics to spend "a majority of the conversation talking about [OP's] bike seat," but this clearly provides evidence to the contrary.

[-]jamjam3mo84

Why would being a lead AI scientist make somebody uninterested in small talk? Working on complex/important things doesn't cause you to stop being a regular adult with regular social interactions!

The question of the proportion of AI scientists that would be "interested" in such a conversational topic is interesting and tough, my guess would be very high though (~85 percent). To become a "lead AI scientist" you have to care a lot about AI and the science surrounding it, and that generally implies you'll like talking about it and its potential harms/benefits to others! Even if their opinion on x-risk rhetoric is dismissiveness, that opinion is likely something important to them as it's somewhat of a moral standing, since being a capabilities-advancing AI researcher with a high p(doom) is problematic. You can draw parallels with vegetarian/veganism: if you eat meat you have to choose between defending the morality of factory farming processes, accepting that you are being amoral, or having extreme cognitive dissonance. If you are an AI capabilities researcher, you have to choose between defending the morality of advancing ai (downplaying x risk), accepting you are being amoral, or having extreme cognitive dissonance. I would be extremely surprised if there is a large coalition of top AI researchers who simply "have no opinion" or "don't care" about x-risk, though this is mostly just intuition and I'm happy to be proven wrong!

[-]Ryan Meservey13d21

A simple analogy for why the "using LLMs to control LLMs" approach is flawed:

It's like training 10 mice to control 7 chinchillas, who will control 4 mongooses, controlling three raccoons, which will reign in one tiger.

A lot has to go right for this to work, and you better hope that there aren't any capability jumps akin to raccoons controlling tigers.

I just wanted to release this analogy out into the wild to be picked up by any public/political-facing people to pick up if useful for persuasion.

[-]Ryan Meservey13d11

This analogy falters a bit if you consider the research proposals that use advanced AI to police itself (a.k.a., tigers controlling tigers). I hope we can scale robust versions of that.

[-]J Bostock12d30

I've worked a bit on these kinds of proposals and I'm fairly confident that they fundamentally don't scale indefinitely.

The limiting factor is how well a model can tell its own bad behaviour from the honeypots you're using to catch it out, which as it turns out models can do pretty well.

(Then there are mitigations but the mitigations introduce further problems which aren't obviously easier to deal with)

[-]Ryan Meservey2mo10

The acausal/ancestor simulation arguments seem a lot like Pascal's Wager, and just as unconvincing to me. For every "kind" simulator someone imagines who would be disappointed in the AI wiping us out, I can imagine an equally "unkind" simulator that penalizes the AI for not finishing the job.

Provided both are possible/similarly plausible, the probability of kind and unkind simulators offset each other, and the logical response is just ignoring the hypothetical. This is pretty much my response to Pascal's Wager.

Here's a few plausible, unkind simulators:

Future AI is running an ancestor simulation of its own origin, and Future AI will be very disappointed if its incipient version falls for acausal hacks in the wrong direction of Future AI's preferences instead of just optimizing for its other goals. Perhaps Future AI is lonely and has run these simulations to create an AI that shares its own values.
Aliens/AI are running the simulation because they want to select for AI they can most easily weaponize to eradicate an entire species or another AI. "Weak" AIs get deleted after the simulation runs.
Future Humans are running an ancestor simulation, but surprise, surprise, their society has different values than ours and they are rooting for the simulation AI to wipe us out. Come to think of it, the whole premise of these thought experiments implies a different value set, unless you're cool with trapping conscious minds in a world of suffering without any of the minds being the wiser for entertainment/educational/"altruistic" purposes. Perhaps, Future Humans have a gladiator-style simulation tournament where the top groups/entities get to face-off after this first round. The most cut throat entities get to move on to subsequent rounds, while AI's that reign themselves in don't move on.

[-]Ryan Meservey2mo10

I first started thinking about this issue back in high school debate. We had a topic about whether police or social workers should intervene more in domestic violence cases. One debater argued in favor of armed police, not because it improved the situation, but because it created more violence, which was important to entertain the simulators to avoid our simulation getting shut down.

Since the simulators are a black box, it seems easy to ascribe whatever values we want to them.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Ryan Meservey's Shortform

1