This post is for students who hope to eventually work on technical problems we don’t understand, especially agency and AI alignment, and want to know what to study or practice.
Current alignment researchers have wildly different recommendations on paths into the field, usually correlated with the wildly different paths these researchers have themselves taken into the field. This also correlates with different kinds of work on alignment. This guide largely reflects my own path, and I think it is useful if you want to do the sort of research I do. That means fairly theoretical work (for now), very technical, drawing on models and math from a lot of different areas to understand real-world agents.
Specializing in Problems We Don’t Understand lays out a general framework which guides many of the recommendations here. I’ll also briefly go over some guiding principles more specific to choosing what (and how much) to study:
- Breadth over depth
- Practice generalizing concepts
- Be able to model anything
- High volume of knowledge
Breadth Over Depth
In general, study in any particular topic has decreasing marginal returns. The first exposure or two gives you the basic frames, tells you what kinds of questions to ask and what kinds of tools are available, etc. You may not remember everything, but you can at least remember what things to look up later if you need them - which is a pretty huge improvement over not even knowing that X is a thing you can look up at all!
Another way to frame this: problems-we-don’t-understand rely heavily on bringing in frames and tools from other fields. (If the frames and tools of this field were already sufficient, it wouldn’t be a problem-we-don’t-understand in the first place.) So, you want to have a very large library of frames and tools to apply. On the other hand, you don’t necessarily need very much depth in each frame or tool - just enough to recognize problems where it might apply and maybe try it out in a quick-and-dirty way.
Practice Generalizing Concepts
Bringing in frames and tools from other fields requires the ability to recognize and adapt those frames and tools for problems very different from the field in which we first learned them. So, practice generalizing concepts from one area to another is particularly important.
Unfortunately, this is not a focus in most courses. There are exceptions - applied math classes often involve applying tools in a wide variety of ways, and low-level physics courses often provide very good practice in applying a few mathematical tools to a wide variety of problems. Ultimately, though, this is something you should probably practice on your own a lot more than it’s practiced in class.
Keeping a list of 10-20 hard problems in the back of your mind, and trying out each new frame or tool on one of those problems, is a particularly useful technique to practice generalization.
Be Able To Model Anything
One common pitfall is to be drawn into areas which advertise extreme generality, but are rarely useful in practice. (A lot of high-level math is like this.) On the other hand, we still want a lot of breadth, including things which are not obviously useful to whatever problem we’re most interested in (e.g. alignment). After all, if the obviously-relevant tools sufficed, then it wouldn’t be a problem-we-don’t-understand in the first place.
To that end, it’s useful to look for frames/tools which are at least useful for something. Keeping a list of 10-20 hard problems in the back of your mind is one useful test for this. Another useful heuristic is “be able to model anything”: if there’s some system or phenomenon which you’re not sure how to model, even in principle, and field X has good tools for modelling it, then study field X.
This heuristic is useful for another reason, too: our intuitions for a problem of interest often come from other systems, and you never know what system will seem like a useful analogue. If we can model anything, then we always know how to formalize a model based on any particular analogy - we’re rarely left confused about how to even set it up.
High Volume of Knowledge
Lastly, one place where I differ from the recommendations which I expect most current alignment researchers to give: I recommend studying a lot. This is based on my own experience - I’ve covered an awful lot of ground, and when I trace the sources of my key thoughts on alignment and agency, they come from an awful lot of places.
To that end: don’t just take whatever courses are readily available. I recommend heavy use of online course material from other schools, as well as textbooks. Sometimes the best sources are a lot better than the typical source - I try to highlight any particularly great sources I know of in this post. Also, I’ve found it useful to “pregame” the material even for my normal college courses - i.e. find a book or set of lectures covering similar material, and go through them before the semester starts, so that the in-person class is a second exposure rather than a first exposure. (This also makes the course a lot easier, and makes it easier overall to maintain ok grades without having to sink overly-pointless levels of effort into the class.)
Other useful tips to squeeze out every last drop:
- Skipping pre-reqs is often a good idea.
- Audit courses. This doesn’t just have to be at your school - I’ve audited half a dozen courses at schools where I had no formal affiliation. Just walk in on the first day of class and sit down, it’s usually totally fine, professors love it (since you’re actually interested).
All that said, obviously this advice is for the sort of person who is not already struggling to keep up with a more normal course load. This advice is definitely not for everyone.
With guiding principles out of the way, on to the main event: things to study. We’ll start with technical foundations, i.e. the sort of stuff which might be “common core classes” at a high-end STEM college/university. Then, we’ll cover topics which might be in an (imaginary) “alignment and agent foundations” degree. Finally, I’ll go through a few more topics which aren’t obviously relevant to alignment or agency, but are generally-useful for modelling a wide variety of real-world systems.
If I know of a particularly good source I’ll link to it, but sometimes the only sources I’ve used are mediocre or offline. Sorry. Also, I went to Harvey Mudd College, so any references to classes there are things I did in-person.
If your high-school doesn’t have a programming class, use a MOOC, preferably in Python. There are lots of good sources available nowadays; the “intro to programming” market is very saturated. Heck, the “intro” market is pretty saturated in all of these.
Physics and calculus go together; calculus will likely feel unmotivated without physics, and physics will have a giant calculus-shaped hole in it without calculus.
You should probably take more than one undergrad-level intro programming course, ideally using different languages. Different courses focus on very different things: low-level computer system concepts, high-level algorithms, programming language concepts, etc. Also, different languages serve very different use-cases and induce different thinking-patterns, so it’s definitely worth knowing a few, ideally very different languages.
Besides basic programming fluency, you should learn:
- Basics of big-O analysis
- A conceptual understanding of how a computer works (but probably not all the low-level details)
Personally, I’ve used Harvard’s CS50, a set of intro lectures from UNSW, CS5 & CS60 at Harvey Mudd, plus a Java textbook in high school. At bare minimum, you should probably work with C/C++, Python, and a LISP variant. (Harvard’s CS50 is good for C/C++, MIT has an intro in LISP which is widely considered very good, and lots of courses use Python.)
Once you’ve had one or two intro programming classes, there’s usually a course in data structures. It will cover things like arrays, linked lists, hash tables, trees, heaps, queues, etc. This is the bread-and-butter of most day-to-day programming.
Although the coursework may not emphasize it, I recommend building a habit of keeping a Fermi estimate of program runtime in the back of your head. I’d even say that the main point of learning about all these data structures is to make that Fermi estimate.
- First or second derivatives of high-dimensional functions, or
- Data on which we calculate correlations/run linear regressions.
Alas, when first studying the subject, it will probably be very abstract and you won’t see good examples of what it’s actually used for. (It is useful, though - I last used linear algebra yesterday, when formulating an abstraction problem as an eigenproblem.)
Linear algebra took me many passes to learn well. I read three textbooks and took two in-person courses (from different schools) in linear algebra, then took another two courses (also from different schools) in linear systems. Out of all that, the only resource I strongly recommend is Boyd’s lectures on linear dynamical systems, probably after one or two courses in linear algebra. I also hear Linear Algebra Done Right is good as an intro, but haven’t used it personally. MIT’s lectures are probably very good, though sadly I don’t think they were online back when I was learning the subject.
f you take more advanced math/engineering, you’ll continue to learn more linear algebra, especially in areas like linear control theory, Fourier methods, and PDEs.
Mechanics & Differential Equations
Mechanics (usually a physics class) and differential equations (a math class) are the two courses where you go from mostly-not-knowing-how-to-model-most-things to mostly-having-some-idea-how-to-model-most-things-at-least-in-principle. In particular, I remember differential equations as the milestone where I transitioned from feeling like there were small islands of things I knew how to model mathematically, to small islands of things I didn’t know how to model mathematically, at least in principle. (I had taken some mechanics before that.)
I took all my mechanics in-person, but I hear the Feynman Lectures are an excellent source. For differential equations, I used MIT’s lectures. You will need some linear algebra for differential equations (at least enough to not run away screaming at the mention of eigenvalues), though not necessarily on the first pass (some schools break it up into a first course without linear algebra and then a second course with it).
In principle, multivariate calculus is what makes linear algebra useful. Unfortunately, multivariate calculus courses in my experience are a grab-bag of topics, some which are quite useful, others of which are pretty narrow.
The topics in my ideal course in multivariate calculus would be:
- Tensor notation
- Tensor & matrix calculus
- Gradients & gradient descent optimization
- Hessians & Newton’s Method optimization
- Jacobians & Newton’s Method root finding
- Constrained optimization & Lagrange multipliers
- Jacobian determinants & multivariate coordinate transformations for integrals
- Wedge products
- Conservative vector fields & potentials
About half of these are covered very well in Boyd’s convex optimization course (see below). The rest you may have to pick up piecemeal:
- Tensor notation you can just adopt for yourself and practice; it’s very useful for ML, continuum mechanics, and general relativity
- Matrix calculus you’ll pick up if you need to hand-code fast gradient calculations for optimization or simulation problems
- Jacobian determinants will come up whenever a high-dimensional integral requires a coordinate change. Play around with it and then practice it when it’s needed.
- Wedge products are useful whenever an integral is over a multi-dimensional surface in some higher-dimensional space; when you write “dx dy dz” in an integral, that’s secretly a wedge product. Again, play around with it and then practice it when it’s needed.
- Conservative vector fields you’ll see a lot in electricity & magnetism (as well as specific techniques for them)
Linear algebra, as we use it today, is a relatively recent development:
The separate linear algebra course became a standard part of the college mathematics curriculum in the United States in the 1950s and 60s and some colleges and universities were still adding the course in the early 1970s. (source)
Fifty years ago, linear algebra was new. What new things today will be core technical classes in another fifty years, assuming a recognizable university system still exists?
I think convex optimization is one such topic.
Boyd is the professor to learn this from, and his lectures are excellent. This is one of my strongest not-already-standard recommendations in this post.
Another topic which is on the short list for “future STEM core”. I don’t have a 101-level intro which I can personally vouch for - Yudkowsky’s intro is popular, but you’ll probably need a full course in probability before diving into the more advanced stuff.
You can get away with a more traditional probability course and then reading Jaynes (see below), which is what I did, but a proper Bayesian probability course is preferred if you can find a good one.
Economics provides the foundations for a ton of agency models.
Any standard 101-level course is probably fine. Lean towards more math if possible; for someone doing all the other courses on this list, there’s little reason not to jump into the math.
Alignment theory involves proving things, so you definitely need to be comfortable writing proofs.
To the extent that proof-writing is taught, it’s unfortunately often taught in Analysis 1, which is mostly-useless in practice other than the proof skills. (There are lots of useful things in analysis, but mostly I recommend you skip the core “analysis” courses and learn the useful parts in other classes, like theoretical mechanics or math finance or PDEs or numerical analysis.) Pick up proof skills elsewhere if you can; you’ll have ample opportunity to practice in all the other classes on this list.
Agency and Alignment “Major”
AI & Related Topics
Mostly this course will provide a first exposure to stuff you’ll study more later. Pay attention to relaxation-based search in particular; it’s a useful unifying framework for a lot of other things.
Turns out we can deduce causality from correlation, it just requires more than two variables. More generally, causal models are the main “language” you need to speak in order to efficiently translate intuitions about the world into Bayesian probabilistic models.
Yudkowsky has a decent intro, although you definitely need more depth than that. Pearl’s books are canonical; Koller & Friedman are unnecessarily long but definitely cover all the key pieces. Koller has a coursera course covering similar material, which would probably be a good choice.
Jaynes’ Probability Theory: The Logic Of Science is a book for which I know no substitute. It is a book on Bayesian probability theory by the leading Bayesian probability theorist of the twentieth century; other books on the topic look sloppy by comparison. There are insights in this book which I have yet to find in any other book or course.
At the bare minimum, read chapters 1-4 and 20. I’ve read it cover-to-cover, and found it immensely valuable.
Information theory is a powerful tool for translating a variety of intuitions into math, especially agency-adjacent intuitions.
I don’t know of any really good source on information theory, but I do remember that there’s one textbook from about 50 years ago which is notoriously terrible. If you find yourself wading through lots of analysis, put the book down and find a different one.
I have used a set of “Information Theory and Entropy” lectures from MIT, which are long but have great coverage of topics, especially touching on more physics-flavored stuff. I also use Cover & Thomas as a reference, mainly because it has good chapters on Kelly betting and portfolio optimization.
Godel Escher Bach
Another book for which I know no substitute. Godel Escher Bach is… hard to explain. But it’s a fun read, you should read it cover-to-cover, and you will have much better conceptual foundations for thinking about self-reflection and agency afterwards.
Obviously some hands-on experience with ML is useful for anyone working on AI, even theoretical work - current systems are an important source of “data” on agency, same as biology and economics and psychology/neuroscience. Also, it’s one of those classes which brings together a huge variety of technical skills, so you can practice all that linear algebra and calculus and programming.
Unfortunately, these days there’s a flood of ML intros which don’t have any depth and just tell you how to call magic black-boxes. For theoretical agency/alignment work, that’s basically useless; understanding what goes on inside of these systems is where most of the value comes from. So look for a course/book which involves building as much as possible from scratch.
You might also consider an “old-school” ML course, from back before deep learning took off. I used Andrew Ng’s old lectures back in the day. A lot of the specific algorithms are outdated now, but there’s a lot of math done automagically now which we used to have to do by hand (e.g. backpropagating gradients). Understanding all that math is important for theory work, so doing it the old-fashioned way a few times can be useful.
Other than understanding the internals of deep learning algorithms, I’d also recommend looking into the new generation of probabilistic programming languages (e.g. Pyro), and how they work.
I’ve heard a saying that you can become a great programmer either by programming for ten years, or by programming for five years and taking an algorithms class. For theory work, a solid understanding of algorithms is even more important - we need to know what’s easy, what’s hard, and be able to recognize easy vs hard things in the wild.
Algorithms courses vary a lot in what they cover, but some key things which you definitely want:
- Dynamic programming. I’ve used one of Bellman’s books on the subject, which was excellent.
- NP-completeness & reductions. You need to be able to recognize the kinds-of-problems which are usually NP-complete, and be able to prove that they’re NP-complete if necessary.
- Relaxation-based search (i.e. A* search), if you haven’t already covered in depth in an intro AI course
Depending on how much depth you want on the more theoretical parts, Avi Wigderson has a book with ridiculously deep and up-to-date coverage, though the writing is often overly abstract.
Numerical algorithms are the sort of thing you use for simulating physical systems or for numerical optimization in ML. Besides the obvious object-level usefulness, many key ideas of numerical algorithms (like sparse matrix methods or condition numbers) are really more-general principles of world modelling, which for some reason people don’t talk about much until you’re up to your elbows in actual numerical code.
Courses under names like “numerical algorithms”, “numerical analysis”, or “scientific computing” cover various pieces of the relevant material; it’s kind of a grab-bag.
For purposes of agency and alignment work, biology is one of the main sources of evolved agenty systems. It’s a major source of intuitions and qualitative data for my work (and hopefully quantitative data, some day). Also, if you want to specialize in problems-we-don’t-understand more generally, biology will likely be pretty central.
The two most important books to read are Alon’s Design Principles of Biological Circuits, and the Bionumbers book. The former is about the surprising extent to which evolved biological systems have unifying human-legible design principles (I have a review here). The latter is an entire book of Fermi estimates, and will give you lots of useful intuitions and visualizations for what’s going on in cells.
I also strongly recommend a course in synthetic biology. I used a set of lectures which I think were a pilot for this course.
Like biology, economics is a major source of intuitions and data on agenty systems. Unlike biology, it’s also a major source of mathematical models for agenty systems. I think it is very likely that a successful theory of the foundations of agency will involve market-like structures and math.
I don’t know of any very good source on the “core” market models of modern economics beyond the 101 level. I suspect that Stokey, Lucas and Prescott does a good job (based on other work by the authors), but I haven’t read it myself. I believe you’d typically find this stuff in a first-year grad-school microeconomics course.
If you want to do this the hard way: first take convex optimization (see above), then try to solve the N Economists Problem.
N economists walk into a bar, each with a utility function and a basket of goods. Compute the equilibrium distribution of goods.
This requires making some reasonably-general standard economic assumptions (concave increasing utility functions, rational agents, common knowledge, Law of One Price).
Learning it the hard way takes a while.
Once you have the tools to solve the N Economists problem (whether from a book/course or by figuring it out the hard way), the next step along the path is “dynamic stochastic general equilibrium” models and “recursive macro”. (These links are to two books I happen to have, but there are others and I don’t have any reason to think these two are unusually good.) You probably do not need to go that far for alignment work, but if you want to specialize in problems-we-don’t-understand more generally, then these tools are the cutting-edge baseline for modelling markets (especially financial markets).
Game theory is the part of economics most directly relevant to alignment and agency, and largely independent of market models, so it gets its own section.
You might want to take an intro-level course if you don’t already know the basics (e.g. what a Nash equilibrium is), but you might just pick that up somewhere along the way. Once you know the very basics, I recommend two books. First, Games and Information by Eric Rasmussen. It’s all about games in which the players have different information - things like principal-agent problems, signalling, mechanism design, bargaining, etc. This is exactly the right set of topics to study, which largely makes up for a writing style which I don’t particularly love. (You might be able to find a course which covers similar material.)
Forget rationalist Judo: this is rationalist eye-gouging, rationalist gang warfare, rationalist nuclear deterrence. Techniques that let you win, but you don't want to look in the mirror afterward.
For this book, I don’t know of any good substitute.
Control systems are all over the place in engineered devices. Even your thermostat needs to not be too sensitive in blasting out hot/cold air in response to cold/hot temperatures, lest we get amplifying hot/cold cycles. It’s a simple model, but even complex AI systems (or biological systems, or economic systems) can be modeled as control systems.
You’ll probably pick up the basics of linear control theory in other courses on this list (especially linear dynamical systems). If you want more than that, one of Bellman’s books on dynamic programming and control theory is a good choice, and these lectures on underactuated control are really cool. This is another category where you only need the very basics for thinking about alignment and agency, but more advanced knowledge is often useful for a wide variety of problems.
Chaos is conceptually fundamental to all sorts of “complex systems”. It’s quite central to my own work on abstraction, and I wouldn’t be at all surprised if it has other important applications in the theory of agency.
There’s many different classes where you might pick up an understanding of chaos, but a course called “Nonlinear Dynamical Systems” (or something similar) is the most likely bet.
Probably my biggest mistake in terms of undergraduate coursework was not taking statistical mechanics. It’s an alternative viewpoint for all the probability theory and information theory stuff, and it’s a viewpoint very concretely applied in everyday situations. Some of it is physics-specific, but it’s an ongoing source of key ideas nonetheless.
If you can learn Bayesian stat mech, that’s ideal, although it’s not taught that way everywhere and I don’t know of a good textbook. (If you want a pretty advanced and dense book, Walter T Grandy is your guy, but that one is a bit over my head.)
In case nobody mentioned it yet, you probably want to read the sequences, including these two. They’re long, but they cover a huge amount of important conceptual material, and they’re much lighter reading than technical textbooks.
Useful In General, But Not So Much For Alignment
This section is intended for people who want to specialize in technical problems-we-don’t-understand more generally, beyond alignment. It contains courses which I’ve found useful for a fairly broad array of interesting problems, but less so for alignment specifically. I won’t go into as much depth on these, just a quick bullet list with one-sentence blurbs and links.
- Theoretical Mechanics. Using Newton’s laws for everything gets messy in more complicated systems; this course covers cleaner methods. Susskind’s lectures are good.
- Quantum. If you have an itching desire to know how it works, I strongly recommend The Quantum Challenge as a starting point. That book covers the conceptually-”weird” parts much better than most courses.
- Electromagnetism. This is the more theory-heavy part of E&M, circuits is more practical. Griffiths is the standard textbook, and is quite good.
- Electronic circuits. I used MIT’s 6.002 lectures, which were fun.
- Digital logic/VLSI/etc. This is the class where you design a simple computer CPU starting from transistors and wires.
- Systems programming. The gnarly parts of programming - dealing with the OS and low-level code, databases, networks, etc.
- Parallel/asynchronous programming. Self explanatory.
- SQL. Also self explanatory.
- Graphics (esp. Procedural Graphics). Games and animation are one of the places where people need really robust, fast, realistic simulations of all sorts of things, which makes it a really cool area to practice lots of technical skills.
- Robotics. Another fun area to practice lots of technical skills.
- Modular arithmetic, polynomial rings, and related algorithms (polynomial multipoint, GCD, Chinese remainder). Powerful tools for certain kinds of algorithmic problems; might be scattered across a few different classes.
- Materials 101. MIT has some really fun lectures.
- Continuum mechanics (i.e. Elastics & Fluid Mechanics). Core tools for modelling solids and fluids, respectively.
- Math Finance. Ito calculus in particular is a very useful tool. Hull is the standard text; any course using that text will likely cover similar material
- Fourier. Generally a useful tool for linear PDEs, and the backbone of fast convolutions (as in “convolutional neural network”). Somewhat old-school at this point.
- PDEs. Nonlinear PDEs and Numerical PDEs are usually separate classes, and are also quite useful (the former for qualitative understanding of nonlinear-specific phenomena like shocks, the latter for simulation).
- Complex analysis. These tools sure do seem powerful, but I haven’t gotten much use out of them in practice. Not sure if that’s just me or not.
That was a lot. It took me roughly eight hours of typing just to write it all out, and a lot longer than that to study it all.
With that in mind: you absolutely do not need to study all of this. It’s a sum, not a logical-and. The more you cover, the wider the range of ideas you’ll have to draw from. It’s not like everything will magically click when you study the last piece; it’s just a long gradual accumulation.
If there’s one thing which I don’t think this list conveys enough, it’s the importance of actually playing around with all the frames and tools and trying them out on problems of your own. See how they carry over to new applications; see how to use them. Most of the things on this list I studied because they were relevant to one problem or another I was interested in, and I practiced by trying them out on those problems. Follow things which seem interesting, things for which you already have applications in mind, and you’ll learn them better. More advanced projects will practice large chunks of this list all at once. In large part, the blurbs here were meant to help suggest possible applications and stoke your interest.
Oh, one more thing: practice writing clear explanations and distillations of technical ideas. It’s a pretty huge part of alignment and agency research in practice. I hear blog posts explaining the technical stuff you’re learning are pretty good for that - and also a good way to visibly demonstrate your own understanding.