This project was developed as part of the BlueDot AI Alignment Course. Introduction AI Control: Improving Safety Despite Intentional Subversion introduces a variety of strategies to curate trustworthy output from a powerful but untrustworthy AI. One of these strategies is untrusted monitoring, where a second copy of the untrustworthy AI...
Recently, I was talking to a friend who hadn't seen me in a while. They mentioned that my hair had grown noticeably, and then asked whether my hair grew fast or slow. I said that my hair growth was probably around average, but upon consideration, I realized that statement was...
Inspired by The AI in a box boxes you, Matryoshka Faraday Box, and I attempted the AI Box Experiment (and lost). This is part creative writing exercise, part earnest attempt at constructing an argument that could persuade me to let the AI out of the box. It may be disturbing...
AKA: If it's stupid but it works, it's still stupid. TL;DR: It's easy to slip into a routine which accomplishes some task but is inconvenient or annoying. Sometimes, there's a trivial change that makes the task significantly easier. The trick is noticing that you're stuck in such a routine. 1:...