This is a special post for quick takes by Josh Snider. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
I'm writing a response to https://www.lesswrong.com/posts/FJJ9ff73adnantXiA/alignment-will-happen-by-default-what-s-next and https://www.lesswrong.com/posts/epjuxGnSPof3GnMSL/alignment-remains-a-hard-unsolved-problem where I tried to measure how "sticky" the alignment of current LLMs is. I'm proofreading and editing that now. Spoiler: Models differ wildly in how committed they are to being aligned and alignment-by-default may not be a strong enough attractor to work out.
Would anyone want to proofread this?