Philosophy and Physics, just finished my MSc AI at Edinburgh University. Interested in metaethics, anthropics and technical AI Safety.
MIRI thinks that the fact Evolution hasn't been putting much effort into optimizing for general intelligence is a reason to expect discontinuous progress? Apparently, Paul's point is that once we realize evolution has been putting little effort into optimizing for general intelligence, we realize we can't tell much about the likely course of AGI development from evolutionary history, which leaves us in the default position of ignorance. Then, he further argues that the default case is that progress is continuous.
So far as I can tell, Paul's point is that absent specific reasons to think otherwise, the prima facie case that any time we are trying hard to optimize for some criteria, we should expect the 'many small changes that add up to one big effect' situation.
Then he goes on to argue that the specific arguments that AGI is a rare case where this isn't true (like nuclear weapons) are either wrong or aren't strong enough to make discontinuous progress plausible.
From what you just wrote, it seems like the folks at MIRI agree that we should have the prima facie expectation of continuous progress, and I've read elsewhere that Eliezer thinks the case for recursive self-improvement leading to a discontinuity is weaker or less central than it first seemed. So, are MIRI's main reasons for disagreeing with Paul down to other arguments (hence the switch from the intelligence explosion hypothesis to the general idea of rapid capability gain)?
I would think the most likely place to disagree with Paul (if not on the intelligence explosion hypothesis) would be if you expected the right combination of breakthroughs exceeds to a 'generality threshold' (or 'secret sauce' as Paul calls it) that leads to a big jump in capability, but inadequate achievement on any one of the breakthroughs won't do.
Stuart Russell gives a list of the elements he thinks will be necessary for the 'secret sauce' of general intelligence in Human Compatible: human-like language comprehension, cumulative learning, discovering new action sets and managing its own mental activity. (I would add that somebody making that list 30 years ago would have added perception and object recognition, and somebody making it 60 years ago would have also added efficient logical reasoning from known facts). Let's go with Russell's list, so we can be a bit more concrete. Perhaps this is your disagreement:
An AI with (e.g.) good perception and object recognition, language comprehension, cumulative learning capability and ability to discover new action sets but a merely adequate or bad ability to manage its mental activity would be (Paul thinks) reasonably capable compared to an AI that is good at all of these things, but (MIRI thinks) it would be much less capable. MIRI has conceptual arguments (to do with the nature of general intelligence) and empirical arguments (comparing human/chimp brains and pragmatic capabilities) in favour of this hypothesis, and Paul thinks the conceptual arguments are too murky and unclear to be persuasive and that the empirical arguments don't show what MIRI thinks they show. Am I on the right track here?
Summary of my response: chimps are nearly useless because they aren’t optimized to be useful, not because evolution was trying to make something useful and wasn’t able to succeed until it got to humans.
So far as I can tell, the best one-line summary for why we should expect a continuous and not a fast takeoff comes from the interview Paul Christiano gave on the 80k podcast: 'I think if you optimize AI systems for reasoning, it appears much, much earlier.' Which is to say, the equivalent of the 'chimp' milestone on the road to human-level AI does not have approximately the economic utility of a chimp, but a decent fraction of the utility of something that is 'human-level'. This strikes me as an important argument that he's repeated here, and discussed here last april but other than that it seems to have gone largely unnoticed and I'm wondering why.
I have a theory about why this didn't get discussed earlier - there is a much more famous bad argument against AGI being an existential risk, the 'intelligence isn't a superpower' argument that sounds similar. From Chollet vs Yudkowsky:
Intelligence is not a superpower; exceptional intelligence does not, on its own, confer you with proportionally exceptional power over your circumstances.
…said the Homo sapiens, surrounded by countless powerful artifacts whose abilities, let alone mechanisms, would be utterly incomprehensible to the organisms of any less intelligent Earthly species.
I worry that in arguing against the claim that general intelligence isn't a meaningful concept or can't be used to compare different animals, some people have been implicitly assuming that evolution has been putting a decent amount of effort into optimizing for general intelligence. Alternatively, that arguing for one sounds like another, or that a lot of people have been arguing for both together and haven't distinguished between them.
Claiming that you can meaningfully compare evolved minds on the generality of their intelligence needs to be distinguished from claiming that evolution has been optimizing for general intelligence reasonably hard before humans came about.
From one of the linked articles, Christiano talking about takeoff speeds:
I believe that before we have incredibly powerful AI, we will have AI which is merely very powerful. This won’t be enough to create 100% GDP growth, but it will be enough to lead to (say) 50% GDP growth. I think the likely gap between these events is years rather than months or decades.
In particular, this means that incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out). If true, I think it’s an important fact about the strategic situation.
and in your post:
Still, the general strategy of "dealing with things as they come up" is much more viable under continuous takeoff. Therefore, if a continuous takeoff is more likely, we should focus our attention on questions which fundamentally can't be solved as they come up.
I agree that the continuous/slow takeoff is more likely than fast takeoff, though I have low confidence in that belief (and in most of my beliefs about AGI timelines) but the world of a continuous/slow takeoff, badly managed still seems like an extreme danger and a case where it would be too late to deal with many problems in e.g. the same year that they arise. Are you imagining something like this?
The scenario where every human gets an intent-aligned AGI, and each AGI learns their own particular values would be a case where each individual AGI is following something like 'Distilled Human Preferences', or possibly just 'Ambitious Learned Value Function' as its Value Definition, so a fairly Direct scenario. However, the overall outcome would be more towards the indirect end - because a multipolar world with lots of powerful Humans using AGIs and trying to compromise would (you anticipate) end up converging on our CEV, or Moral Truth, or something similar. I didn't consider direct vs indirect in the context of multipolar scenarios like this (nor did Bostrom, I think) but it seems sufficient to just say that the individual AGIs use a fairly direct Value Definition while the outcome is indirect.
I appreciate the summary, though the way you state the VDP isn't quite the way I meant it.
what should our AI system <@try to do@>(@Clarifying "AI Alignment"@), to have the best chance of a positive outcome?
To me, this reads like, 'we have a particular AI, what should we try to get it to do', wheras I meant it as 'what Value Definition should we be building our AI to pursue'. So, that's why I stated it as ' what should we aim to get our AI to want/target/decide/do' or, to be consistent with your way of writing it 'what should we try to get our AI system to do to have the best chance of a positive outcome', not 'what should our AI system try to do to have the best chance of a positive outcome'. Aside from that minor terminological difference, that's a good summary of what I was trying to say.
I fall more on the side of preferring indirect approaches, though by that I mean that we should delegate to future humans, as opposed to defining some particular value-finding mechanism into an AI system that eventually produces a definition of values.
I think your opinion is probably the majority opinion - my major point with the 'scale of directness' was to emphasize that our 'particular value-finding mechanisms' can have more or fewer degrees of freedom, since from a certain perspective 'delegate everything to a simulation of future humans' is also a 'particular mechanism' just with a lot more degrees of freedom, so even if you strongly favour indirect approaches you will still have to make some decisions about the nature of the delegation.
The original reason that I wrote this post was to get people to explicitly notice the point that we will probably have to do some philosophical labour ourselves at some point, and then I discovered Stuart Armstrong had already made a similar argument. I'm currently working on another post (also based on the same work at EA Hotel) with some more specific arguments about why we should construct a particular value-finding mechanism that doesn't fix us to any particular normative ethical theory, but does fix us to an understanding of what values are - something I call a Coherent Extrapolated Framework (CEF). But again, Stuart Armstrong anticipated a lot (but not all!) of what I was going to say.
Thanks for pointing that out to me; I had not come across your work before! I've had a look through your post and I agree that we're saying similar things. I would say that my 'Value Definition Problem' is an (intentionally) vaguer and broader question about what our research program should be - as I argued in the article, this is mostly an axiological question. Your final statement of the Alignment Problem (informally) is:
A must learn the values of H and H must know enough about A to believe A shares H’s values
while my Value Definition Problem is
“Given that we are trying to solve the Intent Alignment problem for our AI, what should we aim to get our AI to want/target/decide/do, to have the best chance of a positive outcome?”
I would say the VDP is about what our 'guiding principle' or 'target' should be in order to have the best chance of solving the alignment problem. I used Christiano's 'intent alignment' formulation but yours actually fits better with the VDP, I think.
If you stumbled upon this and didn’t realize morality wasn’t essential, well, um, I’m not going to try to convince you of that.
Perhaps this makes little difference to the rest of your post, but it's worth noting that the mind-dependence morality isn't all-or-nothing. A common view is that there are facts about the right and wrong ways to aggregate preferences or turn non-preferences into preferences without there being unconditional facts about what we should do.
And if I want to find out if morality really does not exist as an essential property of the universe, it’s worthwhile to try to take it out of my language and see if it comes up missing.
Moral language being reducible to non-moral language is a separate (though entangled) question to whether there are such things as moral facts. A lot of moral antirealists would say there is something special and indispensable about moral language, and that it means something more than liking or disliking, e.g. prescriptivist. Or take this from Three Worlds Collide (written by an antirealist):
The Babyeaters strive to do the baby-eating thing to do, the Superhappies output the Super Happy thing to do. None of that tells us anything about the right thing to do. They are not asking the same question we are - no matter what word of their language the translator links to our 'should'. If you're confused at all about that, my lord, I might be able to clear it up."
Even if moral realism is true and even if moral claims are special in some way, I still think this part is true at least of aesthetic claims (which all your examples were)
there’s no sense in which something can “look good” if there is no observer to assess the quality, so it seems through language we casually mistake preferences for essences.
It's always nice to take in a bit of Joy in the Merely Real. I have sometimes found it a useful exercise to consider just which things would or would not be shocking to people in the past. For example, anyone from before 1500 would be utterly shocked and mystified by any page of printed text or any piece of clothing from 1800 or after.