One reason aligning superintelligent AI will be hard is because we don’t get to test solutions in advance. If we build systems powerful enough to take over the world, and we fail to stop them from wanting to, they aren’t going to give us another chance[1]. It’s common to assume...
[Crossposted from my substack Working Through AI.] Alice is the CEO of a superintelligence lab. Her company maintains an artificial superintelligence called SuperMind. When Alice wakes up in the morning, she’s greeted by her assistant-version of SuperMind, called Bob. Bob is a copy of the core AI, one that has...
[Crossposted from my substack Working Through AI.] When I was finishing my PhD thesis, there came an important moment where I had to pick an inspirational quote to put at the start. This was a big deal to me at the time — impressive books often carry impressive quotes, so...
[Crossposted from my substack Working Through AI.] It’s pretty normal to chunk the alignment problem into two parts. One is working out how to align an AI to anything at all. You want to figure out how to control its goals and values, how to specify something and have it...
[Crossposted from my substack Working Through AI. I'm pretty new to writing about AI safety, so if you have any feedback I would appreciate it if you would leave a comment. If you'd rather do so anonymously, I have a feedback form.] TLDR: When something helps us achieve our goals,...