Josh Snider's Shortform

Josh Snider

Josh Snider's Shortform

1st Dec 2025

1 min read

3

This is a special post for quick takes by Josh Snider. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:19 PM

[-]Josh Snider3mo120

I'm writing a response to https://www.lesswrong.com/posts/FJJ9ff73adnantXiA/alignment-will-happen-by-default-what-s-next and https://www.lesswrong.com/posts/epjuxGnSPof3GnMSL/alignment-remains-a-hard-unsolved-problem where I tried to measure how "sticky" the alignment of current LLMs is. I'm proofreading and editing that now. Spoiler: Models differ wildly in how committed they are to being aligned and alignment-by-default may not be a strong enough attractor to work out.

Would anyone want to proofread this?

Reply

[-]Josh Snider2mo*10

This can now be read at https://www.lesswrong.com/posts/qE2cEAegQRYiozskD/is-friendly-ai-an-attractor-self-reports-from-22-models-say

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Josh Snider's Shortform

3