This post is about the Anthropic paper “The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?“.[1] Putting aside issues one might have with (1) framing and (2) construct validity, which this post by RobertM already discusses, I think that the conclusion the paper reaches...
“Empire of AI” by Karen Hao was a nice read that I would recommend. It’s half hitpiece on how OpenAI corporate culture has evolved (with a focus on Sam Altman and his two-faced politicking), and half illustrating how frontier AI labs are “empires” that extract resources from the Global South...
Demo paper is what I like to call a very specific kind of AI safety paper. Here are some example papers that fall in this category: * Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Anthropic) * Alignment faking in large language models (Anthropic) * Sycophancy to subterfuge:...
(I'm not new to posting on LessWrong. This is a new account I am using to crosspost LW-relevant content from my Substack.) I recently read "The Everything War", a 2024 book by Dana Mattioli. The thesis of the book is that the Amazon of today is a monopoly—using their market...