Principles for the AGI Race
Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race Why form principles for the AGI Race? I worked at OpenAI for 3 years, on the Alignment and Superalignment teams. Our goal was to prepare for the possibility that OpenAI succeeded in its stated mission of building AGI (Artificial General Intelligence, roughly able to do most things a human can do), and then proceed on to make systems smarter than most humans. This will predictably face novel problems in controlling and shaping systems smarter than their supervisors and creators, which we don't currently know how to solve. It's not clear when this will happen, but a number of people would throw around estimates of this happening within a few years. While there, I would sometimes dream about what would have happened if I’d been a nuclear physicist in the 1940s. I do think that many of the kind of people who get involved in the effective altruism movement would have joined, naive but clever technologists worried about the consequences of a dangerous new technology. Maybe I would have followed them, and joined the Manhattan Project with the goal of preventing a world where Hitler could threaten the world with a new magnitude of destructive power. The nightmare is that I would have watched the fallout of bombings of Hiroshima and Nagasaki with a growing gnawing panicked horror in the pit of my stomach, knowing that I had some small share of the responsibility. Maybe, like Albert Einstein, I would have been unable to join the project due to a history of pacifism. If I had joined, I like to think that I would have joined the ranks of Joseph Rotblat and resigned once it became clear that Hitler would not get the Atomic Bomb. Or joined the signatories of the Szilárd petition requesting that the bomb only be used after terms of surrender had been publicly offered to Japan. Maybe I would have done something to try to wake up before the finale of the nightmare. I don’t know what I would h

Would be interested in a quick write-up of what you think are the most important virtues you'd want for AI systems, seems good in terms of having things to aim towards instead of just aiming away from.