Disclaimer: My English isn't very good, but do not dissuade me on this basis - the sequence itself will be translated by a professional translator.
I want to create a sequence that a fifteen or sixteen year old smart school student can read and that can encourage them to go into alignment. Right now I'm running an extracurricular course for several smart school students and one of my goals is "overcome long inferential distances so I will be able to create this sequence".
I deliberately did not include in the topics the most important modern trends in machine learning. I'm optimizing for the scenario "a person reads my sequence, then goes to university for another four years, and only then becomes a researcher." So (with the exception of the last part) I avoided topics that are likely to become obsolete by this time.
Here is my (draft) list of topics (the order is not final, it will be specified in the course of writing):
- Introduction - what is AI, AGI, Alignment. What are we worried about. AI Safety as AI Notkilleveryoneism.
- Why AGI is dangerous. Orthogonality Thesis, Goodhart's Law, Instrumental Convergency. Corrigibility and why it is unnatural.
- Forecasting. AGI timelines. Takeoff Speeds. Arguments for slow and fast takeoff.
- Why AI boxing is hard/near to impossible. Humans are not secure systems. Why even Oracle AGI can be dangerous.
- Modern ML in a few words (without math!). Neural networks. Training. Supervised Learning. Reinforcement Learning. Reward is not the goal of RL-agent.
- Interpretability. Why it is hard. Basic ideas on how to do it.
- Inner and outer alignment. Mesa-optimization. Internal, corrigible and deceptive alignment. Why deceptive alignment seems very likely. What can influence its probability.
- Decision theory. Prisoner's Dilemma, Newcomb's problem, Smoking lesion. CDT, EDT and FDT.
- What exactly are optimization and agency? Attempts to define this concepts. Optimization as attractors. Embedded agency problems.
- Eliezer Yudkowsky's point of view. Pivotal actions. Why it can be useful to have imaginary EY over your shoulder even if you disagree with him.
- Capability externalities. Avoid them.
- Conclusion. What can be done. Important organisations. What are they working on now?
What else should be here? Maybe something should not be here? Are there reasons why the whole idea can be bad? Any other advices?
My quick take here is that your list of topics is not an introduction to AI Safety, it is an introduction to AI safety as seen from inside the MIRI/Yudkowsky bubble, where everything is hard, and nobody is making any progress. Some more diversity in viewpoints would be better.
For your audience, my go-to source would be to cover bits of Christian's The Alignment Problem.
My overall reactions:
Some specific edits I'd make, in order of their destination:
Thanks for your answer!
I'm going to post each part on LW and collect feedback before I put it all together, to avoid this failure mode in particular.
I will think about it.
I'm not sure it should be in the forecasting section, more like in the introduction (or, if it is harder than I think, in its own separate section).
Seems like a good proposal, thanks!
Your outline has a lot of beliefs you expect your students to walk away with, but basically zero skills. If I was one of your prospective students, this would look a lot more like cult indoctrination than a genuine course where I would learn something.
What skills do you hope your students walk away with? Do you hope that they'll know how to avoid overfitting models? That they'll know how to detect trojaned networks? That they'll be able to find circuits in large language models? I'd recommend figuring this out first, and then working backwards to figure out what to teach.
Also, don't underestimate just how smart smart 15- and 16-year-olds can be. At my high school, for example, there were at least a dozen students who knew calculus at this age, and many more who knew how to program. And this was just a relatively normal public high school.
Thanks for your answer!
This is about... I wouldn't say "beliefs" - I will make a lot of caveats like "we are not sure", "there are some smart people who disagree", "this is an arguments against this view", etc. (mental note: do it MORE, thank you for your observation) - but about "motivation" and "discourse". Not about technical skills, that's true.
I have a feeling that there is an attractor "I am AI-researcher and ML is AWESOME, and I will try to make it even more AWESOME, and yes, there are this safety folks and I know some of their memes and may be they have some legitimate concerns, but we will solve it later and everything will be OK". And I think that when someone learns some ML-related technical skills before basic AI Safety concepts and discourse, it's very easy for them to get into this attractor. And from this point it's pretty hard to return back. So I want to create something like a vaccine against this attractor.
Technical skills are neccesary, but for most of them there are already good courses, textbooks and such. The skills I saw no texbooks for are "to understand AIsafetyspeak" and "to see why alignment-related problem X is hard and why obvious solutions may not work". Because of the previously mentioned attractor I think it's better to teach this skills before technical skills.
I make an assumption that average 15-16-year-olds in my target audience know how to program at least a little bit (In Russia basic programming in theory is in the mandatory school program. I don't know about US), but don't know calculus (but I think smart school student can easily understand a concept of a derivative without strict mathematical definition).