This post is taken from a recent facebook conversation that included Wei Dai, Eliezer Yudkowsky, Vladimir Slepnev, Stuart Armstrong, Maxim Kesin, Qiaochu Yuan and Robby Bensinger, about the ability of academia to do the key intellectual progress required in AI alignment.
[The above people all gave permission to have their comments copied here. Some commenters requested their replies not be made public, and their comment threads were not copied over.]
Eliezer, can you give us your take on this discussion between me, Vladimir Slepnev, and Stuart Armstrong? I'm especially interested to know if you have any thoughts on what is preventing academia from taking or even recognizing certain steps in intellectual progress (e.g., inventing anything resembling Bitcoin or TDT/UDT) that non-academics are capable of. What is going on there and what do we need do to avoid possibly suffering the same fate? See this and this.
It's a deep issue. But stating the obvious is often a good idea, so to state the obvious parts, we're looking at a lot of principal-agent problems, Goodhart's Law, bad systemic incentives, hypercompetition crowding out voluntary contributions of real work, the blind leading the blind and second-generation illiteracy, etcetera. There just isn't very much in the academic system that does promote any kind of real work getting done, and a lot of other rewards and incentives instead. If you wanted to get productive work done inside academia, you'd have to ignore all the incentives pointing elsewhere, and then you'd (a) be leading a horrible unrewarded life and (b) you would fall off the hypercompetitive frontier of the major journals and (c) nobody else would be particularly incentivized to pay attention to you except under unusual circumstances. Academia isn't about knowledge. To put it another way, although there are deep things to say about the way in which bad incentives arise, the skills that are lost, the particular fallacies that arise, and so on, it doesn't feel to me like the *obvious* bad incentives are inadequate to explain the observations you're pointing to. Unless there's some kind of psychological block preventing people from seeing all the obvious systemic problems, it doesn't feel like the end result ought to be surprising.
Of course, a lot of people do seem to have trouble seeing what I'd consider to be obvious systemic problems. I'd chalk that up to not as much fluency with Moloch's toolbox, plus them not being status-blind and assigning non-zero positive status to academia that makes them emotionally reluctant to correctly take all the obvious problems at face value.
Eliezer Yudkowsky (cont.):
It seems to me that I've watched organizations like OpenPhil try to sponsor academics to work on AI alignment, and it seems to me that they just can't produce what I'd consider to be real work. The journal paper that Stuart Armstrong coauthored on "interruptibility" is a far step down from Armstrong's other work on corrigibility. It had to be dumbed way down (I'm counting obscuration with fancy equations and math results as "dumbing down") to be published in a mainstream journal. It had to be stripped of all the caveats and any mention of explicit incompleteness, which is necessary meta-information for any ongoing incremental progress, not to mention important from a safety standpoint. The root cause can be debated but the observable seems plain. If you want to get real work done, the obvious strategy would be to not subject yourself to any academic incentives or bureaucratic processes. Particularly including peer review by non-"hobbyists" (peer commentary by fellow "hobbyists" still being potentially very valuable), or review by grant committees staffed by the sort of people who are still impressed by academic sage-costuming and will want you to compete against pointlessly obscured but terribly serious-looking equations.
Eliezer Yudkowsky (cont.):
There's a lot of detailed stories about good practice and bad practice, like why mailing lists work better than journals because of that thing I wrote on FB somewhere about why you absolutely need 4 layers of conversation in order to have real progress and journals do 3 layers which doesn't work. If you're asking about those it's a lot of little long stories that add up.
Academia is capable of many deep and important results though, like complexity theory, public-key cryptography, zero knowledge proofs, vNM and Savage's decision theories, to name some that I'm familiar with. It seems like we need a theory that explains why it's able to take certain kinds of steps but not others, or maybe why the situation has gotten a lot worse in recent decades.
That academia may not be able to make progress on AI alignment is something that worries me and a major reason for me to be concerned about this issue now. If we had a better, more nuanced theory of what is wrong with academia, that would be useful for guiding our own expectations on this question and perhaps also help persuade people in charge of organizations like OpenPhil.
Public-key cryptography was invented by GCHQ first, right?
It was independently reinvented by academia, with only a short delay (4 years according to Wikipedia) using much less resources compared to the government agencies. That seems good enough to illustrate my point that academia is (or at least was) capable of doing good and efficient work.
I'm a little concerned about the use of the phrase "academia" in this conversation not cutting reality at the joints. Academia may simply not be very homogeneous over space and time - it certainly seems strange to me to lump von Neumann in with everyone else, for example.
Sure, part of my question here is how to better carve reality at the joints. What's the relevant difference between the parts (in space and/or time) of academia that are productive and the parts that are not?
Academia is often productive. I think the challenge is mainly getting it to be productive on the right problems.
Interesting, so maybe a better way to frame my question is, of the times that academia managed to focus on the right problems, what was responsible for that? Or, what is causing academia to not be able to focus on the right problems in certain fields now?
Things have certainly gotten a lot worse in recent decades. There's various stories I've theorized about that but the primary fact seems pretty blatant. Things might be different if we had the researchers and incentives from the 1940s, but modern academics are only slightly less likely to sprout wings than to solve real alignment problems as opposed to fake ones. They're still the same people and the same incentive structure that ignored the entire issue in the first place.
OpenPhil is better than most funding sources, but not close to adequate. I model them as having not seen past the pretend. I'm not sure that more nuanced theories are what they need to break free. Sure, I have a dozen theories about various factors. But ultimately, most human institutions through history haven't solved hard mental problems. Asking why modern academia doesn't UDT may be like asking why JC Penney doesn't. It's just not set up to do that. Nobody is being docked a bonus for writing papers about CDT instead. Feeling worried and like something is out of place about the College of Cardinals in the Catholic Church not inventing cryptocurrencies, suggests a basic mental tension that may not be cured by more nuanced theories of the sociology of religion. Success is unusual and calls for explanation, failure doesn't. Academia in a few colleges in a few countries used to be in a weird regime where it could solve hard problems, times changed, it fell out of that weird place.
It's not actually clear to me, even after all this discussion, that 1940s researchers had significantly better core mental habits / mindsets for alignment work than 2010s researchers. A few counter-points:
A lot of the best minds worked on QM in the early 20th century, but I don't see clear evidence that QM progressed differently than AI is progressing today; that is, I don't know of a clear case that falsifies the hypothesis "all the differences in output are due to AI and QM as cognitive problems happening to involve inherently different kinds and degrees of difficulty". In both cases, it seems like people did a good job of applying conventional scientific methods and occasionally achieving conceptual breakthroughs in conventional scientific ways; and in both cases, it seems like there's a huge amount of missing-the-forest-for-the-trees, not-seriously-thinking-about-the-implications-of-beliefs, and generally-approaching-philosophyish-questions-flippantly. It took something like 50 years to go from "Schrodinger's cat is weird" to "OK /maybe/ macroscopic superposition-ish things are real" in physics, and "maybe macroscopic superposition-ish things are real" strikes me as much more obvious and much less demanding of sustained theorizing than, e.g., 'we need to prioritize decision theory research ASAP in order to prevent superintelligent AI systems from destroying the world'. Even von Neumann had non-naturalist views about QM, and if von Neumann is a symptom of intellectual degeneracy then I don't know what isn't.
Ditto for the development of nuclear weapons. I don't see any clear examples of qualitatively better forecasting, strategy, outside-the-box thinking, or scientific productivity on this topic in e.g. the 1930s, compared to what I'd expect see today. (Though this comparison is harder to make because we've accumulated a lot of knowledge and hard experience with technological GCR as a result of this and similar cases.) The near-success of the secrecy effort might be an exception, since that took some loner agency and coordination that seems harder to imagine today. (Though that might also have been made easier by the smaller and less internationalized scientific community of the day, and by the fact that world war was on everyone's radar?)
Turing and I. J. Good both had enough puzzle pieces to do at least a little serious thinking about alignment, and there was no particular reason for them not to do so. The 1956 Dartmouth workshop shows "maybe true AI isn't that far off" was at least taken somewhat seriously by a fair number of people (though historians tend to overstate the extent to which this was true). If 1940s researchers were dramatically better than 2010s researchers at this kind of thing, and the decay after the 1940s wasn't instantaneous, I'd have expected at least a hint of serious thinking-for-more-than-two-hours about alignment from at least one person working in the 1950s-1960s (if not earlier).
Here's a different hypothesis: Human brains and/or all of the 20th century's standard scientific toolboxes and norms are just really bad at philosophical/conceptual issues, full stop. We're bad at it now, and we were roughly equally bad at it in the 1940s. A lot of fields have slowed down because we've plucked most of the low-hanging fruit that doesn't require deep philosophical/conceptual innovation, and AI in particular happens to be an area where the things human scientists have always been worst at are especially critical for success.
Ok, so the story I'm forming in my mind is that we've always been really bad at philosophical/conceptual issues, and past philosophical/conceptual advances just represent very low-hanging fruit that have been picked. When we invented mailing lists / blogs, the advantage over traditional academic communications allowed us to reach a little higher and pick up a few more fruits but progress is still very limited because we're still not able to reach very high in an absolute sense, and making progress this way depends on gathering together enough hobbyists with the right interests and resources which is a rare occurrence. Rob, I'm not sure how much of this you endorse, but it seems like the best explanation of all the relevant facts I've seen so far.
I think the object-level philosophical progress via mailing lists / blogs was tied to coming up with some good philosophical methodology. One simple narrative about the global situation (pretty close to the standard narrative) is that before 1880 or so, human inquiry was good at exploring weird nonstandard hypotheses, but bad at rigorously demanding testability and precision of those hypotheses. Human inquiry between roughly 1880 and 1980 solved that problem by demanding testability and precision in all things, which (combined with prosaic knowledge accumulation) let them grab a lot of low-hanging scientific fruit really fast, but caused them to be unnecessarily slow at exploring any new perspectives that weren't 100% obviously testable and precise in a certain naive sense (which led to lack-of-serious-inquiry into "weird" questions at the edges of conventional scientific activities, like MWI and Newcomb's problem).
Bayesianism, the cognitive revolution, the slow fade of positivism's influence, the random walk of academic one-upmanship, etc. eventually led to more sophistication in various quarters about what kind of testability and precision are important by the late 20th century, but this process of synthesizing 'explore weird nonstandard hypotheses' with 'demand testability and precision' (which are the two critical pieces of the puzzle for 'do unusually well at philosophy/forecasting/etc.') was very uneven and slow. Thus you get various little islands of especially good philosophy-ish thinking showing up at roughly the same time here and there, including parts of analytic philosophy (e.g., Drescher), mailing lists (e.g., Extropians), and psychology (e.g., Tetlock).
Eliezer, your position is very sharp. A couple questions then:
Do you think e.g. Scott Aaronson's work on quantum computing today falls outside the "weird regime where it could solve hard problems"?
Do you have a clear understanding why e.g. Nick Bostrom isn't excited about TDT/UDT?
Vladimir, can you clarify what you mean by "isn't excited"? Nick did write a few paragraphs about the relevance of decision theory to AI alignment in his Superintelligence, and cited TDT and UDT as "newer candidates [...] which are still under development". I'm not sure what else you'd expect, given that he hasn't specialized in decision theory in his philosophy work? Also, what's your own view of what's causing academia to not be able to make these "outsider steps"?
Wei, at some point you thought of UDT as the solution to anthropic reasoning, right? That's Bostrom's specialty. So if you are right, I'd expect more than a single superficial mention.
My view is that academia certainly tends to go off in wrong directions and it was always like that. But its direction can be influenced with enough effort and understanding, it's been done many times, and the benefits of doing that are too great to overlook.
I'm not sure, maybe he hasn't looked into UDT closely enough to understand the relevance to anthropics or he's committed to a probability view? Probably Stuart has a better idea of this than I do. Oh, I do recall that when I attended a workshop at FHI, he asked me some questions about UDT that seemed to indicate that he didn't understand it very well. I'm guessing he's probably just too busy to do object-level philosophical investigations these days.
Can you give some past examples of academia going off in the wrong direction, and that being fixed by outsiders influencing its direction?
Why do you need the "fixed by outsiders" bit? I think it's easier to change the direction of academia while being in academia, and that's been done many times.
Vladimir Slepnev The price of admission is pretty high for people who can do otherwise productive work, no? Especially since very few members of the club can have direction-changing impact. Something like finding and convincing existing high-standing members, preferably several of them seems like a better strategy than joining the club and doing it from the inside yourself.
Vladimir, on LW you wrote "More like a subset of steps in each field that need to be done by outsiders, while both preceding and following steps can be done by academia." If some academic field is going in a wrong direction because it's missing a step that needs to be done by outsiders, how can someone in academia change its direction? I'm confused... Are you saying outsiders should go into academia in order to change its direction, after taking the missing "outsider steps"? Or that there is no direct past evidence that outsiders can change academia's direction but there's evidence that insiders can and that serves as bayesian evidence that outsiders can too? Or something else?
I guess I shouldn't have called them "outsider steps", more like "newcomer steps". Does that make sense?
There's an old question, "What does the Bible God need to do for the Christians to say he is not good?" What would academia need to do before you let it go?
But I don't feel abused! My interactions with academia have been quite pleasant, and reading papers usually gives me nice surprises. When I read your negative comments about academia, I mostly just get confused. At least from what I've read in this discussion today, it seems like the mystical force that's stopping people like Bostrom from going fully on board with ideas like UDT is simple miscommunication on our part, not anything more sinister. If our arguments for using decisions over probabilities aren't convincing enough, perhaps we should work on them some more.
Vladimir, surely those academic fields have had plenty of infusion of newcomers in the form of new Ph.D. students, but the missing steps only got done when people tried do them while remaining entirely out of academia. Are you sure the relevant factor here is "new to the field" rather than "doing work outside of academia"?
Academic fields are often productive, but narrow. Saying "we should use decision theory instead of probability to deal with anthropics" falls outside of most of the relevant fields, so few academics are interested, because it doesn't solve the problems they are working on.
Vladimir, a lot of people on LW didn't have much trouble understanding UDT as informally presented there, or recognizing it as a step in the right direction. If joining academia makes somebody much less able to recognize progress in decision theory, that seems like a bad thing and we shouldn't be encouraging people to do that (at least until we figure out what exactly is causing the problem and how to fix or avoid it on an institutional or individual level).
I think it's not surprising that many LWers agreed with UDT, because most of them were introduced to the topic by Eliezer's post on Newcomb's problem, which framed the problem in a way that emphasized decisions over probabilities. (Eliezer, if you're listening, that post of yours was the single best example of persuasion I've seen in my life, and for a good goal too. Cheers!) So there's probably no statistical effect saying outsiders are better at grasping UDT on average. It's not that academia is lacking some decision theory skill, they just haven't bought our framing yet. When/if they do, they will be uniquely good at digging into this idea, just as with many other ideas.
If the above is true, then refusing to pay the fixed cost of getting our ideas into academia seems clearly wrong. What do you think?
Think the problem is a mix of specialisation and lack of urgency. If I'd been willing to adapt to the format, I'm sure I could have got my old pro-SIA arguments published. But anthropics wasn't ready for a "ignore the big probability debates you've been having; anthropic probability doesn't exist" paper. And those were interested in the fundamental interplay between probability and decision theory weren't interested in anthropics (and I wasn't willing to put the effort in to translate it into their language).
This is where the lack of urgency comes in. People found the paper interesting, I'd wager, but not saying anything about the questions they were interested in. And they had no real feeling that some questions were far more important than theirs.
I've presented the idea to Nick a few times, but he never seemed to get it fully. It's hard to ignore probabilities when you've spent your life with them.
I will mention for whatever it's worth that I don't think decision theory can eliminate anthropics. That's an intuition I still find credible and it's possible Bostrom felt the same. I've also seen Bostrom contribute at least one decision theory idea to anthropic problems, during a conversation with him by instant messenger, a division-of-responsibility principle that UDT later rendered redundant.
I also disagree with Eliezer about the use of the "interruptible agents" paper. The math is fun but ultimately pointless, and there is little mention of AI safety. However, it was immensely useful for me to write that paper with Laurent, as it taught me so much about how to model things, and how to try and translate those models into things that ML people like. As a consequence, I can now design indifference methods for practically any agent, which was not the case before.
And of course the paper wouldn't mention the hard AI safety problems - not enough people in ML are working on those. The aim was to 1) present part of the problem, 2) present part of the solution, and 3) get both of those sufficiently accepted that harder versions of the problem can then be phrased as "take known problem/solution X, and add an extra assumption..."
That rationale makes sense to me. I think the concern is: if the most visible and widely discussed papers in AI alignment continue to be ones that deliberately obscure their own significance in various ways, then the benefits from the slow build-up to being able to clearly articulate our actual views in mainstream outlets may be outweighed by the costs from many other researchers internalizing the wrong take-aways in the intervening time. This is particularly true if many different build-ups like this are occurring simultaneously, over many years of incremental progress toward just coming out and saying what we actually think.
I think this is a hard problem, and one MIRI's repeatedly had to deal with. Very few of MIRI's academic publications even come close to giving a full rationale for why we care about a given topic or result. The concern is with making it standard practice for high-visibility AI alignment papers to be at least somewhat misleading (in order to get wider attention, meet less resistance, get published, etc.), rather than with the interruptibility paper as an isolated case; and this seems like a larger problem for overstatements of significance than for understatements.
I don't know how best to address this problem. Two approaches MIRI has tried before, which might help FHI navigate this, are: (1) writing a short version of the paper for publication that doesn't fully explain the AI safety rationale, and a longer eprint of the same paper that does explain the rationale; and/or (2) explaining results' significance more clearly and candidly in the blog post announcing the paper.
To put this yet another way, most human bureaucracies and big organizations don't do science. They have incentives for the people inside them which get them to do things other than science. For example, in the FBI, instead of doing science, you can best advance your career by closing big-name murder cases... or whatever. In the field of psychology, instead of doing science, you can get a lot of undergraduates into a room and submit obscured-math impressive-sounding papers with a bunch of tables that claim a p-value greater than 0.05. Among the ways we know that this has little to do with science is that the papers don't replicate. P-values are rituals, and being surprised that the rituals don't go hand-in-hand with science says you need to adjust your intuitions about what is surprising. It's like being surprised that your prayers aren't curing cancer and asking how you need to pray differently.
Now, it may be that separately from the standard incentives, decades later, a few heroes get together and try to replicate some of the most prestigious papers. They are doing science. Maybe somebody inside the FBI is also doing science. Lots of people in Christian religious organizations, over the last few centuries, did some science, though fewer now than before. Maybe the public even lauded the science they did, and they got some rewards. It doesn't mean the Catholic Church is set up to teach people how to do real science, or that this is the primary way to get ahead in the Catholic Church such that status-seekers will be driven to seek their promotions by doing great science.
The people doing real science by trying to replicate psychology studies may report ritual p-values and submit for ritual peer-review-by-idiots. Similarly, some doctors in the past no doubt prayed while giving their patients antibiotics. It doesn't mean that prayer works some of the time. It means that these heroes are doing science, and separately, doing bureaucracy and a kind of elaborate ritual that is what our generation considers to be prestigious and mysterious witch doctery.