Rob Bensinger

Communications lead at MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer's.

Rob Bensinger's Comments

Is Rationalist Self-Improvement Real?

I'm confused about how manioc detox is more useful to the group than the individual - each individual self-interestedly would prefer to detox manioc, since they will die (eventually) if they don't.

Yeah, I was wrong about manioc.

Something about the "science is fragile" argument feels off to me. Perhaps it's that I'm not really thinking about RCTs; I'm looking at Archimedes, Newton, and Feynman, and going "surely there's something small that could have been tweaked about culture beforehand to make some of this low-hanging scientific fruit get grabbed earlier by a bunch of decent thinkers, rather than everything needing to wait for lone geniuses". Something feels off to me when I visualize a world where all the stupidly-simple epistemic-methods-that-are-instrumentally-useful fruit got plucked 4000 years ago, but where Feynman can see big gains from mental habits like "look at the water" (which I do think happened).

Your other responses make sense. I'll need to chew on your comments longer to see how much I end up updating overall toward your view.

Is Rationalist Self-Improvement Real?

I'm not sure how much we disagree; it sounds like I disagree with you, but maybe most of that is that we're using different framings / success thresholds.

Efficient markets. Rationalists developed rationalist self-help by thinking about it for a while. This implies that everyone else left a $100 bill on the ground for the past 4000 years. If there were techniques to improve your financial, social, and romantic success that you could develop just by thinking about them, the same people who figured out the manioc detoxification techniques, or oracle bone randomization for hunting, or all the other amazingly complex adaptations they somehow developed, would have come up with them.

If you teleported me 4000 years into the past and deleted all of modernity and rationalism's object-level knowledge of facts from my head, but let me keep as many thinking heuristics and habits of thought as I wanted, I think those heuristics would have a pretty large positive effect on my ability to pursue mundane happiness and success (compared to someone with the same object-level knowledge but more normal-for-the-time heuristics).

The way you described things here feels to me like it would yield a large overestimate of how much deliberate quality-adjusted optimization (or even experimentation and random-cultural-drift-plus-selection-for-things-rationalists-happen-to-value) human individuals and communities probably put into discovering, using, and propagating "rationalist skills that work" throughout all of human history.

Example: implementation intentions / TAPs are an almost comically simple idea. AFAIK, it has a large effect size that hasn't fallen victim to the replication crisis (yet!). Humanity crystallized this idea in 1999. A well-calibrated model of "how much optimization humanity has put into generating, using, and propagating rationality techniques" shouldn't strongly predict that an idea this useful and simple will reach fixation in any culture or group throughout human history before the 1990s, since this in fact never happened. But your paragraph above seems to me like it would predict that many societies throughout history would have made heavy use of TAPs.

I'd similarly worry that the "manioc detoxification is the norm + human societies are as efficient at installing mental habits and group norms as they are at detoxifying manioc" model should predict that the useful heuristics underlying the 'scientific method' (e.g., 'test literally everything', using controls, trying to randomize) reach fixation in more societies earlier.

Plausibly science is more useful to the group than to the individual; but the same is true for manioc detoxification. There's something about ideas like science that caused societies not to converge on them earlier. (And this should hold with even more force for any ideas that are hard to come up with, deploy, or detect-the-usefulness-of without science.)

Another thing that it sounds like your stated model predicts: "adopting prediction markets wouldn't help organizations or societies make money, or they'd already have been widely adopted". (Of course, what helps the group succeed might not be what helps the relevant decisionmakers in that organization succeed. But it didn't sound like you expected rationalists to outperform common practice or common sense on "normal" problems, even at the group level.)

Raemon's Scratchpad

I think I prefer bolding full lines b/c it makes it easier to see who authored what?

Raemon's Scratchpad

I'd be interested in trying it out. At a glance, it feels too much to me like it's trying to get me to read Everything, when I can tell from the titles and snippets that some posts aren't for me. If anything the posts I've already read are often ones I want emphasized more? (Because I'm curious to see if there are new comments on things I've already read, or I may otherwise want to revisit the post to link others to it, or finish reading it, etc.)

The bold font does look aesthetically fine and breaks things up in an interesting way, so I like the idea of maybe using it for more stuff?

Misconceptions about continuous takeoff

That part of the interview with Paul was super interesting to me, because the following were previously claims I'd heard from Nate and Eliezer in their explanations of how they think about fast takeoff:

[E]volution [hasn't] been putting a decent amount of effort into optimizing for general intelligence. [...]

'I think if you optimize AI systems for reasoning, it appears much, much earlier.'

Ditto things along the lines of this Paul quote from the same 80K interview:

It’s totally conceivable from our current perspective, I think, that an intelligence that was as smart as a crow, but was actually designed for doing science, actually designed for doing engineering, for advancing technologies rapidly as possible -- it is quite conceivable that such a brain would actually outcompete humans pretty badly at those tasks.


I think that’s another important thing to have in mind, and then when we talk about when stuff goes crazy, I would guess humans are an upper bound for when stuff goes crazy. That is we know that if we had cheap simulated humans, that technological progress would be much, much faster than it is today. But probably stuff goes crazy somewhat before you actually get to humans.

This is part of why I don't talk about "human-level" AI when I write things for MIRI.

If you think humans, corvids, etc. aren't well-optimized for economically/pragmatically interesting feats, this predicts that timelines may be shorter and that "human-level" may be an especially bad way of thinking about the relevant threshold(s).

There still remains the question of whether the technological path to "optimizing messy physical environments" (or "science AI", or whatever we want to call it) looks like a small number of "we didn't know how to do this at all, and now we do know how to do this and can suddenly take much better advantage of available compute" events, vs. looking like a large number of individually low-impact events spread out over time.

If no one event is impactful enough, then a series of numerous S-curves ends up looking like a smooth slope when you zoom out; and large historical changes are usually made of many small changes that add up to one big effect. We don't invent nuclear weapons, get hit by a super-asteroid, etc. every other day.

A list of good heuristics that the case for AI x-risk fails

This doesn't seem like it belongs on a "list of good heuristics", though!

A list of good heuristics that the case for AI x-risk fails

I helped make this list in 2016 for a post by Nate, partly because I was dissatisfied with Scott's list (which includes people like Richard Sutton, who thinks worrying about AI risk is carbon chauvinism):

Stuart Russell’s Cambridge talk is an excellent introduction to long-term AI risk. Other leading AI researchers who have expressed these kinds of concerns about general AI include Francesca Rossi (IBM), Shane Legg (Google DeepMind), Eric Horvitz (Microsoft), Bart Selman (Cornell), Ilya Sutskever (OpenAI), Andrew Davison (Imperial College London), David McAllester (TTIC), and Jürgen Schmidhuber (IDSIA).

These days I'd probably make a different list, including people like Yoshua Bengio. AI risk stuff is also sufficiently in the Overton window that I care more about researchers' specific views than about "does the alignment problem seem nontrivial to you?". Even if we're just asking the latter question, I think it's more useful to list the specific views and arguments of individuals (e.g., note that Rossi is more optimistic about the alignment problem than Russell), list the views and arguments of the similarly prominent CS people who think worrying about AGI is silly, and let people eyeball which people they think tend to produce better reasons.

Optimization Amplifies

One of the main explanations of the AI alignment problem I link people to.

Useful Does Not Mean Secure

Eliezer also strongly believes that discrete jumps will happen. But the crux for him AFAIK is absolute capability and absolute speed of capability gain in AGI systems, not discontinuity per se (and not particular methods for improving capability, like recursive self-improvement). Hence in So Far: Unfriendly AI Edition Eliezer lists his key claims as:

  • (1) "Orthogonality thesis",
  • (2) "Instrumental convergence",
  • (3) "Rapid capability gain and large capability differences",
  • (A) superhuman intelligence makes things break that don't break at infrahuman levels,
  • (B) "you have to get [important parts of] the design right the first time",
  • (C) "if something goes wrong at any level of abstraction, there may be powerful cognitive processes seeking out flaws and loopholes in your safety measures", and the meta-level
  • (D) "these problems don't show up in qualitatively the same way when people are pursuing their immediate incentives to get today's machine learning systems working today".

From Sam Harris' interview of Eliezer (emphasis added):

Eliezer: [...] I think that artificial general intelligence capabilities, once they exist, are going to scale too fast for that to be a useful way to look at the problem. AlphaZero going from 0 to 120 mph in four hours or a day—that is not out of the question here. And even if it’s a year, a year is still a very short amount of time for things to scale up.

[...] I’d say this is a thesis of capability gain. This is a thesis of how fast artificial general intelligence gains in power once it starts to be around, whether we’re looking at 20 years (in which case this scenario does not happen) or whether we’re looking at something closer to the speed at which Go was developed (in which case it does happen) or the speed at which AlphaZero went from 0 to 120 and better-than-human (in which case there’s a bit of an issue that you better prepare for in advance, because you’re not going to have very long to prepare for it once it starts to happen).

[...] Why do I think that? It’s not that simple. I mean, I think a lot of people who see the power of intelligence will already find that pretty intuitive, but if you don’t, then you should read my paper Intelligence Explosion Microeconomics about returns on cognitive reinvestment. It goes through things like the evolution of human intelligence and how the logic of evolutionary biology tells us that when human brains were increasing in size, there were increasing marginal returns to fitness relative to the previous generations for increasing brain size. Which means that it’s not the case that as you scale intelligence, it gets harder and harder to buy. It’s not the case that as you scale intelligence, you need exponentially larger brains to get linear improvements.

At least something slightly like the opposite of this is true; and we can tell this by looking at the fossil record and using some logic, but that’s not simple.

Sam: Comparing ourselves to chimpanzees works. We don’t have brains that are 40 times the size or 400 times the size of chimpanzees, and yet what we’re doing—I don’t know what measure you would use, but it exceeds what they’re doing by some ridiculous factor.

Eliezer: And I find that convincing, but other people may want additional details.

[...] AlphaZero seems to me like a genuine case in point. That is showing us that capabilities that in humans require a lot of tweaking and that human civilization built up over centuries of masters teaching students how to play Go, and that no individual human could invent in isolation… [...] AlphaZero blew past all of that in less than a day, starting from scratch, without looking at any of the games that humans played, without looking at any of the theories that humans had about Go, without looking at any of the accumulated knowledge that we had, and without very much in the way of special-case code for Go rather than chess—in fact, zero special-case code for Go rather than chess. And that in turn is an example that refutes another thesis about how artificial general intelligence develops slowly and gradually, which is: “Well, it’s just one mind; it can’t beat our whole civilization.”

I would say that there’s a bunch of technical arguments which you walk through, and then after walking through these arguments you assign a bunch of probability, maybe not certainty, to artificial general intelligence that scales in power very fast—a year or less. And in this situation, if alignment is technically difficult, if it is easy to screw up, if it requires a bunch of additional effort—in this scenario, if we have an arms race between people who are trying to get their AGI first by doing a little bit less safety because from their perspective that only drops the probability a little; and then someone else is like, “Oh no, we have to keep up. We need to strip off the safety work too. Let’s strip off a bit more so we can get in the front.”—if you have this scenario, and by a miracle the first people to cross the finish line have actually not screwed up and they actually have a functioning powerful artificial general intelligence that is able to prevent the world from ending, you have to prevent the world from ending. You are in a terrible, terrible situation. You’ve got your one miracle. And this follows from the rapid capability gain thesis and at least the current landscape for how these things are developing.

See also:

The question is simply "Can we do cognition of this quality at all?"[...] The speed and quantity of cognition isn't the big issue, getting to that quality at all is the question. Once you're there, you can solve any problem which can realistically be done with non-exponentially-vast amounts of that exact kind of cognition.

Load More