tl;dr: Ask questions about AGI Safety as comments on this post, including ones you might otherwise worry seem dumb!
Asking beginner-level questions can be intimidating, but everyone starts out not knowing anything. If we want more people in the world who understand AGI safety, we need a place where it's accepted and encouraged to ask about the basics.
We'll be putting up monthly FAQ posts as a safe space for people to ask all the possibly-dumb questions that may have been bothering them about the whole AGI Safety discussion, but which until now they didn't feel able to ask.
It's okay to ask uninformed questions, and not worry about having done a careful search before asking.
AISafety.info - Interactive FAQ
Additionally, this will serve as a way to spread the project Rob Miles' team[1] has been working on: Stampy and his professional-looking face aisafety.info. This will provide a single point of access into AI Safety, in the form of a comprehensive interactive FAQ with lots of links to the ecosystem. We'll be using questions and answers from this thread for Stampy (under these copyright rules), so please only post if you're okay with that!

You can help by adding questions (type your question and click "I'm asking something else") or by editing questions and answers. We welcome feedback and questions on the UI/UX, policies, etc. around Stampy, as well as pull requests to his codebase and volunteer developers to help with the conversational agent and front end that we're building.
We've got more to write before he's ready for prime time, but we think Stampy can become an excellent resource for everyone from skeptical newcomers, through people who want to learn more, right up to people who are convinced and want to know how they can best help with their skillsets.
Guidelines for Questioners:
- No previous knowledge of AGI safety is required. If you want to watch a few of the Rob Miles videos, read either the WaitButWhy posts, or the The Most Important Century summary from OpenPhil's co-CEO first that's great, but it's not a prerequisite to ask a question.
- Similarly, you do not need to try to find the answer yourself before asking a question (but if you want to test Stampy's in-browser tensorflow semantic search that might get you an answer quicker!).
- Also feel free to ask questions that you're pretty sure you know the answer to, but where you'd like to hear how others would answer the question.
- One question per comment if possible (though if you have a set of closely related questions that you want to ask all together that's ok).
- If you have your own response to your own question, put that response as a reply to your original question rather than including it in the question itself.
- Remember, if something is confusing to you, then it's probably confusing to other people as well. If you ask a question and someone gives a good response, then you are likely doing lots of other people a favor!
- In case you're not comfortable posting a question under your own name, you can use this form to send a question anonymously and I'll post it as a comment.
Guidelines for Answerers:
- Linking to the relevant answer on Stampy is a great way to help people with minimal effort! Improving that answer means that everyone going forward will have a better experience!
- This is a safe space for people to ask stupid questions, so be kind!
- If this post works as intended then it will produce many answers for Stampy's FAQ. It may be worth keeping this in mind as you write your answer. For example, in some cases it might be worth giving a slightly longer / more expansive / more detailed explanation rather than just giving a short response to the specific question asked, in order to address other similar-but-not-precisely-the-same questions that other people might have.
Finally: Please think very carefully before downvoting any questions, remember this is the place to ask stupid questions!
- ^
If you'd like to join, head over to Rob's Discord and introduce yourself!
In My Childhood Role Model, Eliezer Yudkowsky says that the difference in intelligence between a village idiot and Einstein is tiny relative to the difference between a chimp and a village idiot.This seems to imply (I could be misreading) that {the time between the first AI with chimp intelligence and the first AI with village idiot intelligence} will be much larger than {the time between the first AI with village idiot intelligence and the first AI with Einstein intelligence}. If we consider GPT-2 to be roughly chimp-level, and GPT-4 to be above village idiot level, then it seems like this would predict that we'll get an Einstein-level AI within at least the next year. This seems really unlikely and I don't even think Eliezer currently believes this. If my interpretation of this is correct, this seems like an important prediction that he got wrong and I haven't seen acknowledged.
So my question is: Is this a fair representation of Eliezer's beliefs at the time? If so, has this prediction been acknowledged wrong, or was it actually not wrong and there's something I'm missing? If the prediction was wrong, what might the implications be for fast vs slow takeoff? (Initial thought... (read more)
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some aut... (read more)
From 38:58 of the podcast:
Here's a form you can use to send questions anonymously. I'll check for responses and post them as comments.
I regularly find myself in situations where I want to convince people that AI safety is important but I have very little time before they lose interest. If you had one minute to convince someone with no or almost no previous knowledge, how would you do it ? (I have considered printing eliezer's tweet about nuclear)
A survey was conducted in the summer of 2022 of approximately 4271 researchers who published at the conferences NeurIPS or ICML in 2021, and received 738 responses, some partial, for a 17% response rate. When asked about impact of high-level machine intelligence in the long run, 48% of respondents gave at least 10% chance of an extremely bad outcome (e.g. human extinction).
Anonymous #4 asks:
Anonymous #3 asks:
Is there work attempting to show that alignment of a superintelligence by humans (as we know them) is impossible in principle; and if not, why isn’t this considered highly plausible? For example, not just in practice but in principle, a colony of ants as we currently understand them biologically, and their colony ecologically, cannot substantively align a human. Why should we not think the same is true of any superintelligence worthy of the name? “Superintelligence" is vague. But even if we minimally define it as an entity with 1,000x the knowledge, speed,... (read more)
What's the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we've got for alignment and to be pretty optimistic about it, but I haven't heard anyone else talking about it. Either I'm completely misunderstanding what he's talking about, or he's somehow found a way around all of the alignment problems.
Video of him explaining it here for reference, and thanks in advance:
I have been surprised by how extreme the predicted probability is that AGI will end up making the decision to eradicate all life on earth. I think Eliezer said something along the lines of “most optima don’t include room for human life.” This is obviously something that has been well worked out and understood by the Less Wrong community it just isn’t very intuitive for me. Any advice on where I can start reading.
Some back ground on my general AI knowledge. I took Andrew Ng’s Coursera course on machine learning. So I have some basic understanding of n... (read more)
Is there a trick to write a utility satisficer as a utility maximizer?
By "utility maximizer" I mean the ideal bayesian agent from decision theory that outputs those actions which maximize some expected utility E[U(x)] over states of the world x.
By "utility satisficer" I mean an agent that searches for actions that make E[U(x)] greater than some threshold short of the ideally attainable maximum, and contents itself with the first such action found. For reference, let's fix that 0<U<1 and set the satisficer threshold to 1/2.
The satisficer is not someth... (read more)
Anonymous #7 asks:
Is there a primer on what the difference between training LLMs and doing RLHF on those LLMs post-training is? They both seem fundamentally to be doing the same thing: move the weights in the direction that increases the likelihood that they output the given text. But I gather that there are some fundamental differences in how this is done and RLHF isn't quite a second training round done on hand-curated datapoints.
Anonymous #5 asks:
I know the answer to "couldn't you just-" is always "no", but couldn't you just make an AI that doesn't try very hard? i.e., it seeks the smallest possible intervention that ensures 95% chance of whatever goal it's intended for.
This isn't a utility maximizer, because it cares about intermediate states. Some of the coherence theorems wouldn't apply.
Anonymous #1 asks:
... (read more)What could be done if a rogue version of AutoGPT gets loose on the internet?
OpenAI can invalidate a specific API key, if they don't know which one they can cancel all of them. This should halt the thing immediately.
If it were using a local model the problem is harder. Copies of local models may be distributed around the internet. I don't know how one could stop the agent in this situation. Can we take inspiration from how viruses and worms have been defeated in the past?
Anonymous #6 asks:
I have noticed in discussions of AI alignment here that there is a particular emphasis on scenarios where there is a single entity which controls the course of the future. In particular, I have seen the idea of a pivotal act (an action which steers the state of the universe in a billion years such that it is better than it otherwise would be) floating around rather a lot, and the term seems to be primarily used in the context of "an unaligned AI will almost certainly steer the future in ways that do not include living humans, and the only way to prevent th... (read more)
Anonymous #2 asks:
Why is there so little mention of the potential role of the military industrial complex in developing AGI rather than a public AI lab? The money is available, the will, the history (ARPANET was the precursor to the internet). I am vaguely aware there isn't much to suggest the MIC is on the cutting edge of AI-but there wouldn't be if it were all black budget projects. If that is the case, it presumably implies a very difficult situation because the broader alignment community would have no idea when crucial thresholds were being crossed.
I've been wondering on what sorts of ways we can buy ourselves time to figure out alignment. I'm wondering if maybe a large government organization equipped with many copies of potent tool AI could manage to oversee and regulate significant compute pools will enough to avoid rogue AGI catastrophes. Is there any writing specifically on this subject?
Intuitively, I assume that LLMs trained on human data are unlikely to become much smarter than humans, right? Without some additional huge breakthrough, other than just being a language model?
Hello, this concerns an idea I had back in ~2014 which I abandoned because I didn't see anyone else talking about it and I therefore assumed was transparently stupid. After talking to a few researchers, I have been told the idea is potentially novel and potentially useful, so here I go (sweating violently trying to suppress my sense of transgression).
The idea concerns how one might build safety margin into AI or lesser AGI systems in a way that they can be safely iterated on. It is not intended as anything resembling a solution to alignment, just an easy-t... (read more)
What is the connection between the concepts of intelligence and optimization?
I see that optimization implies intelligence (that optimizing sufficiently hard task sufficiently well requires sufficient intelligence). But it feels like the case for existential risk from superintelligence is dependent on the idea that intelligence is optimization, or implies optimization, or something like that. (If I remember correctly, sometimes people suggest creating "non-agentic AI", or "AI with no goals/utility", and EY says that they are trying to invent non-wet water o... (read more)
If a superintelligent AI is guaranteed to be manipulative (instrumental convergence) how can we validate any solution to the alignment problem? Afaik, we can't even guarantee that a model optimizes to the defined objective due to mesa optimizers. So that adds more complexity to a seemingly unanswerable problem.
My other question is, people here seem to think of intelligence as single dimension type of thing. But I always maintained the belief that the type of reasoning useful in scientific discovery does not necessarily unlock the secret of human communicat... (read more)
Is this a plausible take?
- some types of AI can be made non-catastrophic with a modest effort:
- AI trained only to prove math theorems
- AI trained only to produce predictive causal models of the world, by observation alone (an observer and learner, not an active agent)
- AIs trained only to optimize w.r.t a clearly specified objective and a formal world model (not actually acting in the world and getting feedback - only being rewarded on solving formal optimization problems well)
- the last two kinds (world-learners and formal-model-optimizers) should be kept separate
... (read more)What does foom actually mean? How does it relate to concepts like recursive self-improvement, fast takeoff, winner-takes-all, etc? I'd appreciate a technical definition, I think in the past I thought I knew what it meant but people said my understanding was wrong.