Chance that "AI safety basically [doesn't need] to be solved, we’ll just solve it by default unless we’re completely completely careless"
As part of the 2020 Effective Altruism Student Summit, AI safety researcher Buck Shlegeris advised newcomers to "think through the entire problem," determining what the concrete dangers of unsafe AI actually are and how to address them. In an earnest effort to do just that, Nicholas L. Turner, Aidan Kierans, Morpheus, Quinn Dougherty, Jeffrey Yun, and others started a small AI Safety Forecasting Workshop to practice defining, estimating, and forecasting different outcomes. We've perceived a small degree of success in making questions more concrete, and so we decided to share our thoughts to help other aspirants. The following is adapted from our notes for this week's meeting, and we've added context or elaboration when necessary. Specifically, Nick, Aidan, Morpheus, and Quinn pulled up Elicit's x-risk tag and went on a deep dive of one of the questions. Source. > Chance that "AI safety basically [doesn't need] to be solved, we’ll just solve it by default unless we’re completely completely careless" As one might imagine, this is a very useful question to consider for a group of newbies. The engagement on Elicit with this question was modest (with two participants, the average estimate of the chance currently sits at 17.5%), but it captured our intuitions well enough to start getting our hands dirty. Terms We started by defining some terms, as resolving at least some of the ambiguities of this prompt allows us to better understand the relationship between the prompt, our intuitions about it, and states of the world. * "solving AI safety" * "default" * "careless" * "we" Solving AI safety First, what is the goal that we're trying to reach by "solving" AI safety? At which point would we consider AI safety solved or not? Moreover, how should we define AI safety? OPTION 1. AN AI IS SAFE WHEN IT IS TRANSFORMATIVE WITHOUT BEING CATASTROPHIC. When thinking about future applications of AI, people often imagine a point in time when the technology fundamentally alt
I think at least some people do, but I don't have a good argument or evidence to support that claim. Even if your only terminal values are more traditional conceptions of utility, diversity still serves those values really well. A homogenous population is not just more boring, but also less resilient to change (and pathogens, depending on the degree of homogeneity). I think it would be shortsighted and overconfident to design an optimal, identical population since they would lack the resilience and variety of experience to maintain that optimally once any problems appeared.