I've been reading a lot of posts recently that say that LLM RL, and persona-shaping or lack thereof is part of the problem for AI misalignment. To name a few: * Why we should expect ruthless sociopath ASI: This is the one that made me decide to write this post....
It seems to me that AI welfare and digital mind concerns are being discussed more and more, and are starting to get taken seriously, which puts me in an emotionally complicated position. On the one hand, AI welfare has been very important to me for a long time now, so...
Here is a little Q&A Can you explain your position quickly? I think autonomous replication and adaptation in the wild is under-discussed as an AI threat model. And this makes me sad because this is one of the main reasons I'm worried. I think one of AI Safety people's main...
Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post. Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Simon Cosson, Léo Dana and Diego Dorn for useful feedback. TLDR: We present a new method for a safer-by design AI development. We think...
Slime Mold Time Mold (SMTM) has just published the results of their studies on the potato diet. The idea is to eat nothing but potatoes for four weeks, and report how it went. You can read SMTM's reasoning here, the basic idea is that they've observed some annectdotal data point...