Edit (2024-07-31): I changed the introduction. If you have Googled “how to do acceptance” and you have found a lot of the explanations confusing or in conflict with one another, you have come to the right place. Learning about “acceptance” as a mental move was surprisingly challenging to me too,...
For the Open Philanthropy AI Worldview Contest Executive Summary * Conditional on AGI being developed by 2070, we estimate the probability that humanity will suffer an existential catastrophe due to loss of control (LoC) over an AGI system at ~6%, using a quantitative scenario model that attempts to systematically assess...
I'm often confused by how "loss of control" is somewhat similar in meaning to "misalignment" in the context of AGI risk. An average person might reasonably equate the two, or feel like there's an large overlap of meaning between the two words. Here's my attempt at deconfusing myself. TLDR: *...
TL;DR * I was feeling quite dissatisfied with a bunch of categories like “inner and outer misalignment” and “reward misspecification and goal misgeneralisation” * I think most of this dissatisfaction stems from them not being able to correctly describe situations where there are multiple points of failure or gaps. *...