Kelsey Piper and I just launched a new blog about AI futurism and AI alignment called Planned Obsolescence. If you’re interested, you can check it out here.
Both of us have thought a fair bit about what we see as the biggest challenges in technical work and in policy to make AI go well, but a lot of our thinking isn’t written up, or is embedded in long technical reports. This is an effort to make our thinking more accessible. That means it’s mostly aiming at a broader audience than LessWrong and the EA Forum, although some of you might still find some of the posts interesting.
So far we have seven posts:
- What we're doing here
- "Aligned" shouldn't be a synonym for "good"
- Situational awareness
- Playing the training game
- Training AIs to help us align AIs
- Alignment researchers disagree a lot
- The ethics of AI red-teaming
Thanks to ilzolende for formatting these posts for publication. Each post has an accompanying audio version generated by a voice synthesis model trained on the author's voice using Descript Overdub.
You can submit questions or comments to mailbox@planned-obsolescence.org.
But Google didn't want their AIs to do that, so if the AIs do that then the AIs weren't aligned. Same with the mind-hacking.
In general, your AI has some best guess at what you want it to do, and if it's aligned it'll do that thing. If it doesn't know what you meant, then maybe it'll make some mistakes. But the point is that aligned AIs don't take creative actions to disempower humans in ways that humans didn't intend, which is separate from humans intending good things.