TLDR: Alignment as we currently define it seems hard because we recognize that what we as humans want is pretty arbitrary from a non-human perspective. In contrast, global utility maximization is something that an ASI might independently discover as a worthwhile goal, regardless of any human alignment attempts. Global utility...
Epistemic status: The idea here has likely been articulated before, I just haven't noticed it, so it might be worth pointing it out again. Foom describes the idea of a rapid AI takeoff caused by an AI's ability to recursively improve itself. Most discussions about Foom assume that each next...
When I tell people that I think there is a decent chance that an unaligned AGI will bring about the apocalypse within the next 20 years or so, they tend to not take it too seriously. Often that's because they think I would act differently if I really assigned a...
Disclaimer: I don't have a background in alignment research or reinforcement learning, and I don't think any of the ideas discussed here are new, but they might be interesting to some. A recent post suggested that humans provide an untapped wealth of evidence about alignment. I strongly agree with that...