RAISE AI Safety prerequisites map entirely in one post

[-]moridinamael6y220

I'm going to burn some social capital on asking a stupid question, because it's something that's been bothering me for a long time. The question is, why do we think we know that it's necessary to understand a lot of mathematics to productively engage in FAI research?

My first line of skepticism can perhaps be communicated with a simplified analogy: It's 10,000 BC and two people are watching a handful of wild sheep grazing. The first person wonders out loud if it would be possible to somehow teach the sheep to be more docile.

The second person scoffs, and explains that they know everything there is to know about training animals, and it's not in the disposition of sheep to be docile. They go on to elaborate all the known strategies for training dogs, and how none of them can really change the underlying temperament of the animal.

The first person has observed that certain personality traits seem to pass on from parent to child and from dog to puppy. In a flash of insight they conceive of the idea of intentional breeding.

They cannot powerfully articulate this insight at the level of genetics or breeding rules. They don't even know for a fact that sheep can be bred to be more docile. But nonetheless, in a flash, in something like one second of cognitive experience they've gone from not-knowing to knowing this important secret.

End of analogy. The point being: it is obviously possible to have true insights without having the full descriptive apparatus needed to precisely articulate and/or prove the truth of the insight. In fact I have a suspicion that most true, important insight comes in the form of new understandings that are not well-expressed by existing paradigms, and eventually necessitate a new communication idiom to express the new insight. Einstein invented Einstein notation because not just because it's succinct, but because it visually rearranges the information to emphasize what's actually important in the new concept he was communicating and working with.

So maybe my steelman of "why learn all this math" is something like "because it gives you the language that will help you construct/adapt the new language which will be required to express the breakthrough insight." But that doesn't actually seem like it would be important in being able to come up with that insight in the first place.

I will admit I feel a note of anxiety at the thought that people are looking at this list of "prerequisites" and thinking, wow, I'm never going to be useful in thinking about FAI. Thinking that because they don't know what Cantor's Diagonalization is and don't have the resources in terms of time to learn, their brainpower can't be productively applied to the problem. Whereas, in contrast, I will be shocked if the key, breakthrough insight that makes FAI possible is something that requires understanding Cantor's Diagonalization to grasp. In fact, I will be shocked if the key, breakthrough insight can't be expressed almost completely in 2-5 sentences of jargon-free natural language.

I have spent a lot of words here trying to point at the reason for my uncertainty that "learn all of mathematics" is a prerequisite for FAI research, and my concerns with what I perceive to be the unproven assumption that the pathway to the solution necessarily lies in mastering all these existing techniques. It seems likely that there is an answer here that will make me feel dumb, but if there is, it's not one that I've seen articulated clearly despite being around for a while.

[-]gwern6y190

As a historical fact, you certainly can invent selective breeding without knowing anything we would consider true: consider Robert Bakewell and the wildly wrong theories of heredity current when he invented line breeding and thus demonstrated that breeds could be created by artificial selection. (It's unclear what Bakewell and/or his father thought genetics was, but at least in practice, he seems to have acted similarly to modern breeding practices in selecting equally on mothers/fathers, taking careful measurements and taking into account offspring performance, preserving samples for long-term comparison, and improving the environment as much as possible to allow maximum potential to be reached.) More broadly, humans had no idea what they were doing when they were domesticated everything; if Richard Dawkins is to be trusted, it seems that the folk genetics belief was that traits are not inherited and everything regressed to an environmental mean, and so one might as well eat one's best plants/animals since it'll make no difference. And even more broadly, evolution has no idea what 'it' is doing for anything, of course.

The problem is, as Eliezer always pointed out, that selection is extremely slow and inefficient compared to design - the stupidest possible optimization process that'll still work within the lifetime of Earth - and comes with zero guarantees of any kind. Genetic drift might push harmful variants up, environmental fluctuations might extinguish lineages, reproductively fit changes which Goodhart the fitness function might spread, nothing stops a 'treacherous turn', evolved systems tend to have minimal modularity and are incomprehensible, evolution will tend to build in instrumental drives which are extremely dangerous if there is any alignment problem (which there will be), sexual selection can drive a species extinct, evolved replicators can be hijacked by replicators on higher levels like memetics, any effective AGI design process will need to learn inner optimizers/mesa-optimizers which will themselves be unpredictable and only weakly constrained by selection, and so on. If there's one thing that evolutionary computing teaches, it's that these are treacherous little buggers indeed (Lehman et al 2018). The optimization process gives you what you ask for, not what you wanted.

So, you probably can 'evolve' an AGI, given sufficient computing power. Indeed, considering how many things in DL or DRL right now take the form of 'we tried a whole bunch of things and X is what worked' (note that a lot of papers are misleading about how many things they tried, and tell little theoretical stories about why their final X worked, which are purely post hoc) and only much later do any theoreticians manage to explain why it (might) work, arguably that's how AI is proceeding right now. Things like doing population-based training for AlphaStar or NAS to invent EfficientNet are just conceding the obvious and replacing 'grad student descent' with gradient descent.

The problem is, we won't understand why they work, won't have any guarantees that they will be Friendly, and they almost certainly will have serious blindspots/flaws (like adversarial examples or AlphaGo's 'delusions' or how OA5/AlphaStar fell apart when they began losing despite playing apparently at pro level before). NNs don't know what they don't know, and neither do we.

Nor are these flaws easy to fix with just some more tinkering. Much like computer security, you can't simply patch your way around all the problems with software written in C (as several decades of endless CVEs has taught us); you need to throw it out and start with formal methods to make errors like buffer overflows impossible. Adversarial examples, for instance: I recall that one conference had something like 5 adversarial defenses, all defined heuristically without proof of efficacy, and all of them were broken between the time of submission and the actual conference. Or AlphaGo's delusions couldn't be fixed despite quite elaborate methods being used to produce Master (which at least had better ELO) until they switched to the rather different architecture of AlphaZero. Neither OA5 nor AlphaStar has been convincingly fixed that I know of, they simply got better to the point where human players couldn't exploit them without a lot of practice to find reproducible ways of triggering blindspots.

So, that's why you want all the math. So you can come up with provably Friendly architectures without hidden flaws which simply haven't been triggered yet.

[-]moridinamael6y70

To be clear, I didn't mean to say that I think AGI should be evolved. The analogy to breeding was merely to point out that you can notice a basically correct trick for manipulating a complex system without being able to prove that the trick works a priori and without understanding the mechanism by which it works. You notice the regularity on the level of pure conceptual thought, something closer to philosophy than math. Then you prove it afterward. As far as I'm aware, this is indeed how most truly novel discoveries are made.

You've forced me to consider, though, that if you know all the math, you're probably going to be much better and faster at spotting those hidden flaws. It may not take great mathematical knowledge to come up with a new and useful insight, but it may indeed require math knowledge to prove that the insight is correct, or to prove that it only applies in some specific cases, or to show that, hey, it wasn't actually that great after all.

[-]Pattern6y40

The problem is, we won't understand why they work, won't have any guarantees that they will be Friendly, and they almost certainly will have serious blindspots/flaws (like adversarial examples or AlphaGo's 'delusions' or how OA5/AlphaStar fell apart when they began losing despite playing apparently at pro level before). NNs don't know what they don't know, and neither do we.

I hadn't heard about that. I suppose that's what happens when you don't watch all the videos of their play.

[-]Rohin Shah6y100

Fwiw I also think it is not necessary to know lots of areas of math for AI safety research. Note that I do in fact know a lot of areas of math relatively shallowly.

I do think it is important to be able to do mathematical reasoning, which I can roughly operationalize as getting to the postrigorous stage in at least one area of math.

18

RAISE AI Safety prerequisites map entirely in one post

18

18

How to use this

Credits

Main path

Level 1. Basic logic

Level 2. Basic set theory

Level 3. Set Theoretic Relations and Enumerability

Level 4. Formal Semantics

Level 5. Formal Proof

Level 6. Turing Machines and the Halting Problem

Level 7. Equivalence Relations and Orderings

Level 8. Abacus Computability and Mathematical Proof by Induction

Level 9. The Natural Numbers in Set Theory and More Induction

Level 10. Recursive Functions

Level 11. Set Theoretic Recursion

Level 12. The Equivalence of Different Notions of Computability

Level 13. Isomorphisms

Level 14. Logic Review and The Relationship Between Computation and Logic

Level 15. Finite and Countable Sets

Level 16 (elective). Linear Orders and Completing the Real Numbers

Level 17. Basic Model Theory

Level 18 (elective). A quick look at cardinal and ordinal numbers

Level 19. Arithmetization and Representation of Recursive Functions

Level 20. Godel’s Incompleteness Theorems and Axiomatic ZFC

Logic and proof path

Level 1. Basic logic

Level 2. Quantified logic. Introduction to mathematical arguments

Level 3. Formal semantics basics

Level 4. Formal proofs

Level 5. Proof by induction

Set theory path

Level 1. Basic set theory

Level 2. Set theoretic relations

Level 3. Equivalence relations and orderings

Level 4. Natural numbers and induction

Level 5. Set theoretic recursion

Level 6. Operations, structures, isomorphism

Level 7. Cardinality. Finite and countable sets

Elective: Level 8. Linear orderings. Completeness. Uncountable sets

Elective: Level 9. Cardinal numbers

Elective: Level 10. Ordinal numbers. Axiom of replacement. Transfinite induction and recursion

Elective: Level 11. Axiom of choice

Level 12. ZF(C) set theory

Computability theory path

Level 1. Enumerability and diagonalization

Level 2. Turing machines and the halting problem

Level 3. Abacus computability

Level 4. Recursive functions

Level 5. The equivalence of different notions of computability

Level 6. First order logic

Level 7. Undecidability of first order logic

Level 8. Models, their existence. Proofs and completeness.

Level 9. Arithmetization. Representability of recursive functions

Level 10. Indefinability, undecidability, incompleteness. The unprovability of consistency.