LESSWRONG
LW

908
Caleb Biddulph
1283Ω37131801
Message
Dialogue
Subscribe

MATS 8.1 scholar, mentored by Micah Carroll. Former SWE at Google Gemini. Founding president of Cornell Effective Altruism.

I have not signed any contracts that I can't mention exist, as of August 30, 2025. I'll try to update this statement at least once a year, so long as it's true. I added this statement thanks to the one in the gears to ascension's bio.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
People Seem Funny In The Head About Subtle Signals
Caleb Biddulph2d141

The things you're saying may be true, but I'm not sure the Slytherin necklace is a super good example. I feel like she put on the necklace that morning and had a moment where she thought "haha this is Slytherin-coded," and she wanted to share that feeling with you in a playful way. I doubt she was thinking "when I wear this necklace, I predict that people will associate me with Slytherin. I shall now test this hypothesis by asking John."

My very uninformed model of this girl says that if she read this post, she'd kind of roll her eyes and say "lol it really wasn't that deep." But only she could say for sure.

Reply
Why I Transitioned: A Case Study
Caleb Biddulph5d161

From Fiora's Twitter:

oh geez. i wonder if it wasn't "i want to be cute/hot/beautiful so i can be loved by *others*," so much as to finally be worthy of loving *myself*. that... cuts even closer to the core of it, probably.

I was about to comment the same thing here. I think for many lesbian trans girls, being loved by men isn't appealing except maybe insofar as it affirms that one is the kind of person who could be loved by one's (past, male) self

Reply
kave's Shortform
Caleb Biddulph15d42

I forgot that I could choose not to filter out personal blogposts - I think I will set this to "default" from now on. Feels like there's probably lots of decent content that I haven't been seeing

Reply
eggsyntax's Shortform
Caleb Biddulph20d20

Yeah no worries, I was just curious if you'd seen it. No need to do a lit review before writing a shortform :)

Reply1
eggsyntax's Shortform
Caleb Biddulph20d40

Did you see the discussion here? It seems like many normie commenters thought Eliezer's arguments were confusing and didn't really connect with the questions being asked.

Although I don't know of any interviews that would be much better to recommend.

Reply
anaguma's Shortform
Caleb Biddulph24d82

The comments on the video are a bit disheartening... lots of people saying Yudkowsky is too confusing, answers everything too technically or with metaphors, structuring sentences in a way that's hard to follow, and Ezra didn't really understand the points he was making. 

One example: Eliezer mentioned in the interview that there was a kid whose chatbot encouraged him to commit suicide, with the point that "no one programmed the chatbot to do this." This comment made me think:

if you get a chance listen to the interviews with the parents and the lawyers who are suing chatgpt because that kid did commit suicide.

Oh yeah, probably most people telling this story would at least mention that the kid did in fact commit suicide, rather than treating it solely as evidence for an abstract point...

Reply1
Musings from a Lawyer turned AI Safety researcher (ShortForm)
Caleb Biddulph1mo2-3

She does seem like a LinkedIn grifter, but if she's a popular LinkedIn grifter I guess this could mean something.

I'm not sure if important people at Fortune 500s are reading LinkedIn grifter newsletters. Or if Fortune 500s that aren't Alphabet or Nvidia are actually relevant for AI.

Maybe Luisa Jarovsky's recommendation is primarily important as an indicator that "normies" (who can vote, etc.) are aware of IABIED.

This is the 29th book Luisa has recommended for her "AI book club," so possibly she just needed something to recommend and IABIED is a recent AI book with a lot of marketing around it. And even in her recommendation, she mentions that she "disagrees with catastrophic framings of AI risk."

Reply1
Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior
Caleb Biddulph1mo20

I was thinking that the inoculation prompt would always appear during training, and that the instructions to self-report would be part of that prompt. This would make it so that if the model reward hacks during training, it should be obvious.

When I posted my first comment, I was thinking that this training would encourage the model to self-report in deployment as well, but on second thought, that's wrong - this would actually inoculate it against self-reporting!

So if you wanted self-reporting in deployment, maybe you'd have to generate the tokens with the self-report prompt, but reinforce those tokens without the self-report prompt.

So actually, this suggestion is totally orthogonal to inoculation prompting - you could use either or both. Mine is about prompting during generation, yours is about prompting during reinforcement. (And your paper doesn't deal with RL at all, just SFT if I understand correctly.)

Reply1
Tim Hua's Shortform
Caleb Biddulph1mo21

True true. It's better to do the simplest things first. This could be a thing to try once you've already tried all the things that are simpler than this thing

Reply
Tim Hua's Shortform
Caleb Biddulph1mo40

Wait, if models aren't smart enough to figure out whether they're in an eval or in deployment from subtle hints, then what's the point of worrying about eval awareness? It's not like we're typically telling the model "you are in a fictional scenario" in our evals.

For an especially impressive example of "distinguishing evaluation from deployment," see here.

Reply
Load More
4Caleb Biddulph's Shortform
9mo
46
27LLMs as amplifiers, not assistants
5mo
8
23What was so great about Move 37?
Q
5mo
Q
4
7Procedural vs. Causal Understanding
5mo
2
54Vestigial reasoning in RL
7mo
8
4Caleb Biddulph's Shortform
9mo
46
4Why not train reasoning models with RLHF?
Q
9mo
Q
4
45Worries about latent reasoning in LLMs
10mo
6
465 ways to improve CoT faithfulness
Ω
1y
Ω
40
98OpenAI's Sora is an agent
2y
25
12Is Metaethics Unnecessary Given Intent-Aligned AI?
2y
0
Load More
Sora
2 years ago
(+82)