All of Rob Bensinger's Comments + Replies

Visible Thoughts Project and Bounty Announcement

In case you missed it: we now have an FAQ for this project, last updated Jan. 7.

Ngo and Yudkowsky on AI capability gains

(Daniel Dennett's book Darwin's Dangerous Idea does a good job I think of imparting intuitions about the 'Platonic inevitability' of it.)

Possibly when Richard says "evolutionary theory" he means stuff like 'all life on Earth has descended with modification from a common pool of ancestors', not just 'selection is a thing'? It's also an empirical claim that any of the differences between real-world organisms in the same breeding population are heritable.

1Kenny3dThat's pretty reasonable, but, yes, I might not have a good sense of what Richard means by "evolutionary theory". Yes! That's a good qualification and important for lots of things. But I think the claim that any/many differences are heritable was massively overdetermined by the time Darwin published his ideas/theory of evolution via natural selection. I think it's easy to overlook the extremely strong prior that "organisms in the same breeding population" produce offspring that is almost always , and obviously, member of the same class/category/population. That certainly seems to imply that a huge variety of possible differences are obviously heritable. I admit tho that it's very difficult (e.g. for me) to adopt a reasonable 'anti-perspective'. I also remember reading something not too long ago about how systematic animal breeding was extremely rare until relatively recently, so that's possibly not as extremely strong of evidence as it now seems like it might have been (with the benefit of hindsight).
Soares, Tallinn, and Yudkowsky discuss AGI cognition

how do you get some substance into every human's body within the same 1 second period? Aren't a bunch of people e.g. in the middle of some national park, away from convenient air vents? Is the substance somehow everywhere in the atmosphere all at once?

I think the intended visualization is simply that you create a very small self-replicating machine, and have it replicate enough times in the atmosphere that every human-sized organism on the planet will on average contain many copies of it.

One of my co-workers at MIRI comments:

(further conjunctive detail for

... (read more)
4DanielFilan4dAh, that makes sense - thanks!
Animal welfare EA and personal dietary options

When I look at factory-farmed animals, I feel awful for them. So coming into this, I have some expectation that my eventual understanding of consciousness, animal cognition, and morality (C/A/M) will add up to normalcy (i.e. not net positive for many animals).

But:

  • 'It all adds up to normality' doesn't mean 'you should assume your initial intuitions and snap judgments are correct even in cases where there's no evolutionary or physical reason for the intuition/judgment to be correct'. It means 'reductive explanations generally have to recapture the phenomenon
... (read more)
1GWS11dI avoid factory farmed pork because their existence seems net negative to me, but don’t do this for chickens. This is largely because I believe pigs have qualia similar enough to me that I don’t need to worry about the animal cognition part of c/a/m (I do want to note that you seem to be arguing from a perspective wherein pro-existence is the null, and so you need to reason yourself out of it to be anti-natalist for the animals). I find chickens difficult to model using the machinery I use for humans, but that machinery works okay on pigs (although this is largely through seeing videos of them instead of in person interaction, so it’s absolutely possible I’m mistaken). I’m not sure how to handle the “consciousness” part, since they cannot advocate for themselves or express preferences for or against existence in ways that are legible to me.
4TurnTrout11dI'm confused why you wrote "It doesn't mean 'you should assume your initial intuitions and snap judgments are correct'" when in the very next sentence I said "But maybe my gut reaction isn't that trustworthy—that's often the case in ethical dilemmas."? OK, but do you disagree with the claim 'Turntrout's gut reaction is that factory-farmed pigs would be better off not existing'? Because that's true for me, at least on my first consideration of the issue. [ETA: Removed superfluous reaction] Attempted restatement of my point: My gut reaction is evidence about what my implicit C/A/M theories predict, which I should take seriously to the extent that I have been actually ingraining all the thought experiments I've considered. And just because the reaction isn't subvocalized via a verbalized explicit theory, doesn't mean it's not important evidence. Similarly: When considering an action, I may snap-judge it to be squidgy and bad, even though I didn't yet run a full-blown game-theoretic analysis in my head. (Let me know if I also seem to be sliding off of your point!)
Animal welfare EA and personal dietary options

Note that there might be other crucial factors in assessing whether 'more factory farming' or 'less factory farming' is good on net — e.g., the effect on wild animals, including indirect effects like 'factory farming changes the global climate, which changes various ecosystems around the world, which increases/decreases the population of various species (or changes what their lives are like)'.

It then matters a lot how likely various wild animal species are to be moral patients, whether their lives tend to be 'worse than death' vs. 'better than death', etc.... (read more)

Animal welfare EA and personal dietary options

I'd guess the most controversial part of this post will be the claim 'it's not incredibly obvious that factory-farmed animals (if conscious) have lives that are worse than nonexistence'?

But I don't see why. It's hard to be confident of any view on this, when we understand so little about consciousness, animal cognition, or morality. Combining three different mysteries doesn't tend to create an environment for extreme confidence — rather, you end up even more uncertain in the combination than in each individual component.

And there are obvious (speciesist) r... (read more)

4Kaj_Sotala10dWhen you say that it could be true, do you mean that it could be true that the person themselves would judge their experience as better than nonexistence? (Your paragraph reads to me as implying that there could be some more objective answer to this separate from a person's own judgment of it, but it's hard for me to imagine what that would even mean.)

Pretty much all the writing I've read by Holocaust survivors says that this was not true, that the experience was unambiguously worse than being dead, and that the only thing that kept them going was the hope of being freed. (E.g. according to Victor Frankl in "Man's Search for Meaning", all the prisoners in his camp agreed that, not only was it worse than being dead, it was so bad that any good experiences after being freed could not make up for it how bad it was. Why they didn't kill themselves is an interesting question that he explores a bit in the book.) Are there any Holocaust survivors who claim otherwise?

I would guess that humans' nightmarish experience in concentration camps was usually better than nonexistence; and even if you suspect this is false, it seems easy to imagine how it could be true, because there's a lot more to human experience than 'pain, and beyond that pain, darkness'.

I can't really imagine this – at least for people in extermination camps, who weren't killed. I'd assume that, all else equal, the vast majority of prisoners would choose to skip that part of their life. But maybe I'm missing something or have unusual intuitions.

4TurnTrout11dWhen I look at factory-farmed animals, I feel awful for them. So coming into this, I have some expectation that my eventual understanding of consciousness, animal cognition, and morality (C/A/M) will add up to normalcy (i.e. not net positive for many animals). But maybe my gut reaction isn't that trustworthy—that's often the case in ethical dilemmas. I do think that that gut reaction is important information, even though I don't have a detailed model of C/A/M. (I think the main way I end up changing my mind here is being persuaded that my gut reaction is balking at their bad quality of life, but not actually considering the net-positive/negative question)
The Map-Territory Distinction Creates Confusion

I haven't done anything like a careful analysis, but at a guess, this shift has some promise for unifying the classical split between epistemic and instrumental rationality. Rationality becomes the art of seeking interaction with reality such that your anticipations keep synching up more and more exactly over time.

"Unifying epistemic and instrumental reality" doesn't seem desirable to me — winning and world-mapping are different things. We have to choose between them sometimes, which is messy, but such is the nature of caring about more than one thing in l... (read more)

What would you like from Microcovid.org? How valuable would it be to you?

I don't use microCOVID much. Two things I'd like from the site:

  • A simple, reasonable, user-friendly tool for non-rationalists I know who are more worried about COVID than me (e.g., family).
  • A tool I can use if a future strain arises that's a lot more scary. Something fast and early, that updates regularly as new information comes in.

The latter goal seems more useful in general, and my sense is that microCOVID isn't currently set up to do that kind of thing -- the site currently says "Not yet updated for the Omicron variant", over a month in.

For the latter go... (read more)

4jacobjacob18dI also tried and failed to get my family to use it :( Among other things, I think they bounced off particularly hard on the massive drop-down of 10 different risk categories of ppl and various levels of being in a bubble. I don't think the blocker here was fundamentally quantitative -- they think a bunch about personal finance and budgeting, so that metaphor made sense to them (and I actually expect this to be true for a lot of non-STEM ppl). Instead, I think UX improvements could go a long way.
4Elizabeth18dWhat kind of simplifications would you like to see, while keeping the product something that's still fundamentally microcovid?
6Raemon19dA thing that sticks out is that I don't actually know who has a good forecasting track record – I know some people have made predictions but those predictions aren't aggregated anywhere I know that makes it easy to check.
Quis cancellat ipsos cancellores?

Buddhism is a huge part of Joshin's life (which seems fine to note), but if there's an implied argument 'Buddhism is causally responsible for this style of discourse', 'all Buddhists tend to be like this', etc., you'll have to spell that out more.

5zerker20001moThe explicit argument I would make here is, the post makes some reference to the author being Buddhist, and therefore less likely to say things they can't verify. Or even things believed true that would cause drama. And then elaborates that the post will do both these things anyway, because there is a "conflict of interest" between speaking divisively against Aella, and speaking(?) divisively against those Jōshin seeks to warn away. It is my understanding that whatever value one assigns to whisper networks, cancellations, and so on, "devout buddhist" is a social role that practically defines itself as foregoing that value in favor of inner peace, and that this contradiction is what Duncan found perhaps worthy of scorn. (The particular one-word comment has been correctly downvoted as having no place on LessWrong, at least according to a discourse norms pledge Duncan himself authored and various other of his posts.) In Jōshin's shoes of having committed to X and finding ¬X to have profound importance to the safety of those around me, I would either try to fob off the publishing of the accusations onto someone else more comfortable with X to keep somewhat to the letter of the pledges, or lay out a stronger case for reneging on X in a separate post. At minimum, any "so normally I avoid doing this even when it seems like a good idea, but" normnotes ought to go in something like a footnote, not up-front to emphasize that because something was a significant update for you it ought to be a more significant update to the reader. Certainly for audiences already not sharing your priors, such attempted emphasis as we see falls flat.
2Duncan_Sabien1moThere is no such implied argument; the scare-quotes were an attempt to signal humor. Next time I'll try something like: " " Buddhists " " ... although hm, that might imply a criticism of the trueness/validity of Jōshin's claim to Buddhism, which would also not be intended (since I lack any expertise whatsoever by which to judge).
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

Firstly, I (partially?) agree that the current DL paradigm isn't strongly alignable (in a robust, high certainty paradigm), we may or may not agree to what extent it is approximately/weakly alignable.

I don't know what "strongly alignable", "robust, high certainty paradigm", or "approximately/weakly alignable" mean here. As I said in another comment:

There are two problems here:

  • Problem #1: Align limited task AGI to do some minimal act that ensures no one else can destroy the world with AGI.
  • Problem #2: Solve the full problem of using AGI to help us achieve an
... (read more)
3Steven Byrnes1moFor what it's worth I'm cautiously optimistic that "reverse-engineering the circuits underlying empathy/love/altruism/etc." is a realistic thing to do in years not decades, and can mostly be done in our current state of knowledge (i.e. before we have AGI-capable learning algorithms to play with—basically I think of AGI capabilities as largely involving learning algorithm development and empathy/whatnot as largely involving supervisory signals such as reward functions). I can share more details if you're interested.
2jacob_cannell1moNo I meant "merely as aligned as a human". Which is why I used "approximately/weakly" aligned - as the system which mostly aligns humans to humans is imperfect and not what I would have assumed you meant as a full Problem #2 type solution. Alright so now I'm guessing the crux is that you believe the DL based reverse engineered human empathy/altruism type solution I was alluding to - let's just call that DLA - may take subjective centuries, which thus suggests that you believe: * That DLA is significantly more difficult than DL AGI in general * That uploading is likewise significantly more difficult or perhaps * DLA isn't necessarily super hard, but irrelevant because non-DL AGI (for which DLA isn't effective) comes first Is any of that right?
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

Logical induction, Löbian cooperation, reflection in HOL, and functional decision theory are all results where researchers have expressed surprise to MIRI that the results were achievable even in principle.

I think a common culprit is people misunderstanding Gödel's theorems as blocking more things than they actually do. There's also field-specific folklore — e.g., a lot of traditional academic decision theorists seem to have somehow acquired the belief that you can't assign probabilities to your own actions, on pain of paradox.

6shminux1moHow many of those results are accepted as interesting and insighful outside MIRI?
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

I... think that makes more sense? Though Eliezer was saying the field's progress overall was insufficient, not saying 'decision theory good, ML bad'. He singled out eg Paul Christiano and Chris Olah as two of the field's best researchers.

In any case, thanks for explaining!

Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

I'd argue instead that MIRI bet heavily against connectivism/DL, and lost on that bet just as heavily.  

I think this is straightforwardly true in two different ways:

  • Prior to the deep learning revolution, Eliezer didn't predict that ANNs would be a big deal — he expected other, neither-GOFAI-nor-connectionist approaches to AI to be the ones that hit milestones like 'solve Go'.
  • MIRI thinks the current DL paradigm isn't alignable, so we made a bet on trying to come up with more alignable AI approaches (which we thought probably wouldn't succeed, but consi
... (read more)
6jacob_cannell1moThanks, strong upvote, this is especially clarifying. Firstly, I (partially?) agree that the current DL paradigm isn't strongly alignable (in a robust, high certainty paradigm), we may or may not agree to what extent it is approximately/weakly alignable. The weakly alignable baseline should be "marginally better than humans". Achieving that baseline as an MVP should be an emergency level high priority civilization project, even if risk of doom from DL AGI is only 1% (and to be clear, i'm quite uncertain, but it's probably considerably higher). Ideally we should always have an MVP alignment solution in place. My thoughts on your last question are probably best expressed in a short post rather than a comment thread, but in summary: DL methods are based on simple universal learning architectures (eg transformers, but AGI will probably be built on something even more powerful). The important properties of resulting agents are thus much more a function of the data / training environment rather than the architecture. You can rather easily limit an AGI's power by constraining it's environment. For example we have nothing to fear from AGI's trained solely in Atari. We have much more to fear from agents trained by eating the internet. Boxing is stupid, but sim sandboxing is key. As DL methods are already a success story in partial brain reverse engineering (explicitly in deepmind's case), there's hope for reverse engineering the circuits underlying empathy/love/altruism/etc in humans - ie the approximate alignment solution that evolution found. We can then improve and iterate on that in simulations. I'm somewhat optimistic that it's no more complex than other major brain systems we've already mostly reverse engineered. The danger of course is that testing and iterating could use enormous resources, past the point where you already have a dangerous architecture that could be extracted. Nonetheless, I think this approach is much better than nothing, and amenable to (pote
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

There are two problems here:

  • Problem #1: Align limited task AGI to do some minimal act that ensures no one else can destroy the world with AGI.
  • Problem #2: Solve the full problem of using AGI to help us achieve an awesome future.

Problem #1 is the one I was talking about in the OP, and I think of it as the problem we need to solve on a deadline. Problem #2 is also indispensable (and a lot more philosophically fraught), but it's something humanity can solve at its leisure once we've solved #1 and therefore aren't at immediate risk of destroying ourselves.

3Samuel Shadrach1moThanks, this makes sense.
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

The rhetorical approach of the comment is also weird to me. 'So you've never heard of CIRL?' surely isn't a hypothesis you'd give more weight to than 'You think CIRL wasn't a large advance', 'You think CIRL is MIRI-ish', 'You disagree with me about the size and importance of the alignment problem such that you think it should be a major civilizational effort', 'You think CIRL is cool but think we aren't yet hitting diminishing returns on CIRL-sized insights and are therefore liable to come up with a lot more of them in the future'. etc. So I assume the question is rhetorical; but then it's not clear to me what you believe about CIRL or what point you want to make with it.

(Ditto value learning, IRL, etc.)

I'd argue instead that MIRI bet heavily against connectivism/DL, and lost on that bet just as heavily.  

I think this is straightforwardly true in two different ways:

  • Prior to the deep learning revolution, Eliezer didn't predict that ANNs would be a big deal — he expected other, neither-GOFAI-nor-connectionist approaches to AI to be the ones that hit milestones like 'solve Go'.
  • MIRI thinks the current DL paradigm isn't alignable, so we made a bet on trying to come up with more alignable AI approaches (which we thought probably wouldn't succeed, but consi
... (read more)
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

So you haven't heard of IRL, CIRL, value learning, that whole DL safety track, etc? Or are you outright dismissing them? I'd argue instead that MIRI bet heavily against connectivism/DL, and lost on that bet just as heavily.

This comment and the entire conversation that spawned from it is weirdly ungrounded in the text — I never even mentioned DL. The thing I was expressing was 'relative to the capacity of the human race, and relative to the importance and (likely) difficulty of the alignment problem, very few research-hours have gone into the alignment prob... (read more)

3jacob_cannell1moSo I can see how that is a reasonable interpretation of what you were expressing. However, given the opening framing where you said you basically agreed with Eliezer's pessimistic viewpoint that seems to dismiss most alignment research, I hope you can understand how I interpreted you saying "People haven't tried very hard to find non-MIRI-ish approaches that might work" as dismissing ML-safety research like IRL,CIRL,etc.
7Rob Bensinger1moThe rhetorical approach of the comment is also weird to me. 'So you've never heard of CIRL?' surely isn't a hypothesis you'd give more weight to than 'You think CIRL wasn't a large advance', 'You think CIRL is MIRI-ish', 'You disagree with me about the size and importance of the alignment problem such that you think it should be a major civilizational effort', 'You think CIRL is cool but think we aren't yet hitting diminishing returns on CIRL-sized insights and are therefore liable to come up with a lot more of them in the future'. etc. So I assume the question is rhetorical; but then it's not clear to me what you believe about CIRL or what point you want to make with it. (Ditto value learning, IRL, etc.)
More Christiano, Cotra, and Yudkowsky on AI progress

Relative to what I mean by 'reasoning about messy physical environments at all', MuZero and Tesla Autopilot don't count. I could see an argument for GPT-3 counting, but I don't think it's in fact doing the thing.

6ESRogs1moGotcha, thanks for the follow-up. Btw, I just wrote up my current thoughts [https://www.lesswrong.com/posts/BdyRhPcQaxte8bc8y/esrogs-s-shortform?commentId=kk5s5gc94cJCJgwmS] on the path from here to AGI, inspired in part by this discussion. I'd be curious to know where others disagree with my model.
Biology-Inspired AGI Timelines: The Trick That Never Works

Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.

Sounds to me like one of the things Eliezer is pointing at in Hero Licensing:

Look, thinking things like that is just not how the inside of my head is organized. There’s just the book I have in my head and the question of whether I can translate that image into reality. My mental world is about the book, not about me.

You do want to train your brain, and you want to understand your strengths and weaknesses. But dwelling on your biases at the ex... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works

(I'm not sure whether your summary captures Eliezer's view, but strong-upvoted for what strikes me  as a reasonable attempt.)

More Christiano, Cotra, and Yudkowsky on AI progress

My Eliezer-model thinks that "there will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles" is far less than 30% likely, because it's so conjunctive:

  • It requires that there ever be a one-year interval in which the world output doubles.
  • It requires that there be a preceding four-year interval in which world output doubles.
    • So, it requires that the facts of CS be such that we can realistically get AI tech that capable before the world ends...
    • ... and separately, that this capability not ac
... (read more)
8Eliezer Yudkowsky1moAffirmed.
5Evan R. Murphy1moWell said! This resonates with my Eliezer-model too. Taking this into account I'd update my guess of Eliezer's position to: * Eliezer: 5% soft takeoff, 80% hard takeoff, 15% something else This last "something else" bucket added because "the Future is notoriously difficult to predict" (paraphrasing Eliezer).
Conversation on technology forecasting and gradualism

Is this 5 years of engineering effort and then humans leaving it alone with infinite compute?

Maybe something like '5 years of engineering effort to start automating work that qualitatively (but incredibly slowly and inefficiently) is helping with AI research, and then a few decades of throwing more compute at that for the AI to reach superintelligence'?

With infinite compute you could just recapitulate evolution, so I doubt Paul thinks there's a crux like that? But there could be a crux that's about whether GPT-3.5 plus a few decades of hardware progress achieves superintelligence, or about whether that's approximately the fastest way to get to superintelligence, or something.

More Christiano, Cotra, and Yudkowsky on AI progress

Do you think that human generality of thought requires a unique algorithm and/or brain structure that's not present in chimps? Rather than our brains just being scaled up chimp brains that then cross a threshold of generality (analogous to how GPT-3 had much more general capabilities than GPT-2)?

I think human brains aren't just bigger chimp brains, yeah.

(Though it's not obvious to me that this is a crux. If human brains were just scaled up chimp-brains, it wouldn't necessarily be the case that chimps are scaled-up 'thing-that-works-like-GPT' brains, or sca... (read more)

4ESRogs1moDo none of A) GPT-3 producing continuations [https://cs.nyu.edu/~davise/papers/GPT3CompleteTests.html] about physical environments, or B) MuZero learning a model [https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules] of the environment, or even C) a Tesla driving on Autopilot, count? It seems to me that you could consider these to be systems that reason about the messy physical world poorly, but definitely 'at all'. Is there maybe some kind of self-directedness or agenty-ness that you're looking for that these systems don't have? (EDIT: I'm digging in on this in part because it seems related to a potential crux that Ajeya and Nate noted here [https://www.lesswrong.com/posts/nPauymrHwpoNr6ipx/conversation-on-technology-forecasting-and-gradualism] .)
More Christiano, Cotra, and Yudkowsky on AI progress

I think I don't understand Carl's "separate, additional miracle" argument. From my perspective, the basic AGI argument is:

  1. "General intelligence" makes sense as a discrete thing you can invent at a particular time. We can think of it as: performing long chains of reasoning to steer messy physical environments into specific complicated states, in the way that humans do science and technology to reshape their environment to match human goals. Another way of thinking about it is 'AlphaGo, but the game environment is now the physical world rather than a Go boar
... (read more)
2PeterMcCluskey1moThis seems like a fairly important crux. I see it as something that has been developed via many steps that are mostly small.

Thanks for the in-depth response! I think I have a better idea now where you're coming from. A couple follow-up questions:

But I don't in fact believe on this basis that we already have baby AGIs. And if the argument isn't 'we already have baby AGIs' but rather 'the idea of "AGI" is wrong, we're going to (e.g.) gradually get one science after another rather than getting all the sciences at once', then that seems like directionally the wrong update to make from Atari, AlphaZero, GPT-3, etc

Do you think that human generality of thought requires a unique algori... (read more)

Leaving Orbit

"I suspect I would start to attach that same meaning to any code phrase" and "I think that even talking about either using a code phrase or to spell it out inevitably pushes toward that being a norm" are both concerns of mine, but I think I'm more optimistic than you that they just won't be big issues by default, and that we can deliberately avoid them if they start creeping in. I'm also perfectly happy in principle to euphemism-treadmill stuff and keep rolling out new terms, as long as the swap is happening (say) once every 15 years and not once every 2 years.

Leaving Orbit

Why not say something like "hey, I'm bowing out of this conversation now, but it's not intended to be any sort of reflection on you or the topic, I'm not making a statement, I'm just doing what's good for me and that's all"?

That seems fine too, if I feel like putting the effort into writing a long thing like that, customizing it for the particular circumstances, etc. But I've noticed many times that it's a surprisingly large effort to hit exactly the right balance of social signals in a case like this, given what an important and commonplace move it is. (A... (read more)

2Rob Bensinger1mo"I suspect I would start to attach that same meaning to any code phrase" and "I think that even talking about either using a code phrase or to spell it out inevitably pushes toward that being a norm" are both concerns of mine, but I think I'm more optimistic than you that they just won't be big issues by default, and that we can deliberately avoid them if they start creeping in. I'm also perfectly happy in principle to euphemism-treadmill stuff and keep rolling out new terms, as long as the swap is happening (say) once every 15 years and not once every 2 years.
Leaving Orbit

I think in most cases with public, online, asynchronous communication, it probably makes the most sense to just exit without a message about it.

In a minority of cases, though (e.g., where I've engaged in a series of back-and-forths and then abruptly stopped responding, or where someone asks me a direct Q or what-have-you), I find that I want an easy boilerplate way to notify others that I'm unlikely to respond more. I think "(Leaving orbit. 🙂)" or similar solves that specific problem for me.

Leaving Orbit

Yeah, I would favor "tapping out" if it felt more neutral to me. 'Tapping out', 'bowing out', etc. sound a little resentful/aggressive to my ear, like you're exiting an annoying scuffle that's beneath your time. Even the combat-ish associations are a thing I'd prefer to avoid, if possible.

2jmh1mo"Sorry, gotta go now."? Or perhaps a phrase the Koreans say "I'll leave first."
Biology-Inspired AGI Timelines: The Trick That Never Works

When I try to mentally simulate negative reader-reactions to the dialogue, I usually get a complicated feeling that's some combination of:

  • Some amount of conflict aversion: Harsh language feels conflict-y, which is inherently unpleasant.
  • Empathy for, or identification with, the people or views Eliezer was criticizing. It feels bad to be criticized, and it feels doubly bad to be told 'you are making basic mistakes'.
  • Something status-regulation-y: My reader-model here finds the implied threat to the status hierarchy salient (whether or not Eliezer is just tryin
... (read more)

I think part of what I was reacting to is a kind of half-formed argument that goes something like:

  • My prior credence is very low that all these really smart, carefully thought-through people are making the kinds of stupid or biased mistakes they are being accused of.
  • In fact, my prior for the above is sufficiently low that I suspect it's more likely that the author is the one making the mistake(s) here, at least in the sense of straw-manning his opponents.
  • But if that's the case then I shouldn't trust the other things he says as much, because it looks lik
... (read more)

I had mixed feelings about the dialogue personally. I enjoy the writing style and think Eliezer is a great writer with a lot of good opinions and arguments, which made it enjoyable.

But at the same time, it felt like he was taking down a strawman. Maybe you’d label it part of “conflict aversion”, but I tend to get a negative reaction to take-downs of straw-people who agree with me.

To give an unfair and exaggerated comparison, it would be a bit like reading a take-down of a straw-rationalist in which the straw-rationalist occasionally insists such things as ... (read more)

Solve Corrigibility Week

When I try to think of gift ideas for dolphins, am I failing to notice some way in which I'm "selfishly" projecting what I think dolphins should want onto them, or am I violating some coherence axiom?

I think it's rather that 'it's easy to think of ways to help a dolphin (and a smart AGI would presumably find this easy too), but it's hard to make a general intelligence that robustly wants to just help dolphins, and it's hard to safely coerce an AGI into helping dolphins in any major way if that's not what it really wants'.

I think the argument is two-part, a... (read more)

The Rationalists of the 1950s (and before) also called themselves “Rationalists”

No, 'rational' here is being used in opposition to 'irrational', 'religious', 'superstitious', etc., not in opposition to 'empirical'.

Quoting Wikipedia:

In politics, rationalism, since the Enlightenment, historically emphasized a "politics of reason" centered upon rational choice, deontology, utilitarianism, secularism, and irreligion[23] – the latter aspect's antitheism was later softened by the adoption of pluralistic reasoning methods practicable regardless of religious or irreligious ideology.[24][25] In this regard, the philosopher John Cottingham

... (read more)
9Owain_Evans1moThanks, Rob! I agree with this summary. It is unfortunate that "rationalism" has this standard usage in philosophy ("rationalist vs empiricist"). This usage is not completely unrelated to the "rational vs superstitious/irrational" distinction, which makes it more likely to confuse. That said, outside of the fields of philosophy and intellectual history, not many people are aware of the rationalist/empiricist distinction, and so I don't see it as a major problem.
Shulman and Yudkowsky on AI progress

Note: I've written up short summaries of each entry in this sequence so far on https://intelligence.org/late-2021-miri-conversations/,  and included links to audio recordings of most of the posts.

Biology-Inspired AGI Timelines: The Trick That Never Works

I've gotten one private message expressing more or less the same thing about this post, so I don't think this is a super unusual reaction.

Soares, Tallinn, and Yudkowsky discuss AGI cognition

I don't know Eliezer's view on this — presumably he either disagrees that the example he gave is "mundane AI safety stuff", or he disagrees that "mundane AI safety stuff" is widespread? I'll note that you're a MIRI research associate, so I wouldn't have auto-assumed your stuff is representative of the stuff Eliezer is criticizing.

Safety Interruptible Agents is an example Eliezer's given in the past of work that isn't "real" (back in 2017):

[...]

It seems to me that I've watched organizations like OpenPhil try to sponsor academics to work on AI alignment, and

... (read more)
5Vanessa Kosoy1moThere is ample discussion of distribution shifts ("seems to generalize to the more complicated and intelligent validation set, but which kills you on the test set") by other people. Random examples: Christiano [https://ai-alignment.com/some-thoughts-on-training-highly-reliable-models-2c78c17e266d] , Shah [https://www.alignmentforum.org/posts/nM99oLhRzrmLWozoM/an-134-underspecification-as-a-cause-of-fragility-to] , DeepMind [https://arxiv.org/pdf/2110.11328.pdf]. Maybe Eliezer is talking specifically about the context of transparency. Personally, I haven't worked much on transparency because IMO (i) even if we solve transparency perfectly but don't solve actual alignment, we are still dead, (ii) if we solve actual alignment without transparency, then theoretically we might succeed (although in practice it would sure help a lot to have transparency to catch errors in time) and (iii) there are less strong reasons to think transparency must be robustly solvable compared to reasons to think alignment must be robustly solvable. In any case, I really don't understand why Eliezer thinks the rest of AI safety are unaware of the type of attack vectors he describes. I agree that currently publishing in mainstream venues seems to require dumbing down, but IMO we should proceed by publishing dumbed-down versions in the mainstream + smarted-up versions/commentary in our own venues. And, not all of AI safety is focused on publishing in mainstream venues? There is plenty of stuff on the alignment forum, on various blogs etc. Overall I actually agree that lots of work by the AI safety community is unimpressive (tbh I wish MIRI would lead by example instead of going stealth-mode, but maybe I don't understand the considerations). What I'm confused by is the particular example in the OP. I also dunno about "fancy equations and math results", I feel like the field would benefit from getting a lot more mathy (ofc in meaningful ways rather than just using mathematical notation as dec
Christiano, Cotra, and Yudkowsky on AI progress

Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious

My Eliezer-model doesn't categorically object to this. See, e.g., Fake Causality:

[Phlogiston] feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may fe

... (read more)
Biology-Inspired AGI Timelines: The Trick That Never Works

(This post was partly written as a follow-up to Eliezer's conversations with Paul and Ajeya, so I've inserted it into the conversations sequence.)

It does fit well there, but I think it was more inspired by the person I met who thought I was being way too arrogant by not updating in the direction of OpenPhil's timeline estimates to the extent I was uncertain.

Christiano, Cotra, and Yudkowsky on AI progress

(I'll emphasize again, by the way, that this is a relative comparison of my model of Paul vs. Eliezer. If Paul and Eliezer's views on some topic are pretty close in absolute terms, the above might misleadingly suggest more disagreement than there in fact is.)

Christiano, Cotra, and Yudkowsky on AI progress

I would frame the question more as 'Is this question important for the entire chain of actions humanity needs to select in order to steer to good outcomes?', rather than 'Is there a specific thing Paul or Eliezer personally should do differently tomorrow if they update to the other's view?' (though the latter is an interesting question too).

Some implications of having a more Eliezer-ish view include:

  • In the Eliezer-world, humanity's task is more foresight-loaded. You don't get a long period of time in advance of AGI where the path to AGI is clear; nor do yo
... (read more)
5landfish1moThanks this is helpful! I'd be very curious to see where Paul agreed / disagree with the summary / implications of his view here.
Christiano, Cotra, and Yudkowsky on AI progress

My Eliezer-model is a lot less surprised by lulls than my Paul-model (because we're missing key insights for AGI, progress on insights is jumpy and hard to predict, the future is generally very unpredictable, etc.). I don't know exactly how large of a lull or winter would start to surprise Eliezer (or how much that surprise would change if the lull is occurring two years from now, vs. ten years from now, for example).

In Yudkowsky and Christiano Discuss "Takeoff Speeds", Eliezer says:

I have a rough intuitive feeling that it [AI progress] was going faster in

... (read more)
4paulfchristiano1moI generally expect smoother progress, but predictions about lulls are probably dominated by Eliezer's shorter timelines. Also lulls are generally easier than spurts, e.g. I think that if you just slow investment growth you get a lull and that's not too unlikely (whereas part of why it's hard to get a spurt is that investment rises to levels where you can't rapidly grow it further).
2Vanessa Kosoy1moMakes some sense, but Yudkowsky's prediction that TAI will arrive before AI has large economic impact does forbid a lot of plateau scenarios. Given a plateau that's sufficiently high and sufficiently long, AI will land in the market, I think. Even if regulatory hurdles are the bottleneck for a lot of things atm, eventually in some country AI will become important and the others will have to follow or fall behind.
Christiano, Cotra, and Yudkowsky on AI progress

Found two Eliezer-posts from 2016 (on Facebook) that I feel helped me better grok his perspective.

Sep. 14, 2016:

It is amazing that our neural networks work at all; terrifying that we can dump in so much GPU power that our training methods work at all; and the fact that AlphaGo can even exist is still blowing my mind. It's like watching a trillion spiders with the intelligence of earthworms, working for 100,000 years, using tissue paper to construct nuclear weapons.

And earlier, Jan. 27, 2016:

People occasionally ask me about signs that the remaining timeline

... (read more)
Soares, Tallinn, and Yudkowsky discuss AGI cognition

Minor note: This post comes earlier in the sequence than Christiano, Cotra, and Yudkowsky on AI progress. I posted the Christiano/Cotra/Yudkowsky piece sooner, at Eliezer's request, to help inform the ongoing discussion of "Takeoff Speeds".

Christiano, Cotra, and Yudkowsky on AI progress

To which my Eliezer-model's response is "Indeed, we should expect that the first AGI systems will be pathetic in relative terms, comparing them to later AGI systems. But the impact of the first AGI systems in absolute terms is dependent on computer-science facts, just as the impact of the first nuclear bombs was dependent on facts of nuclear physics. Nuclear bombs have improved enormously since Trinity and Little Boy, but there is no law of nature requiring all prototypes to have approximately the same real-world impact, independent of what the thing is a prototype of."

Ngo and Yudkowsky on alignment difficulty

Thanks for doing this, Kat! :)

I’ve listened to them as is and I find it pretty easy to follow, but if you’re interested in making it even easier for people to follow, these fine gentlemen have put up a ~$230 RFP/bounty for anybody who turns it into audio where each person has a different voice.  

That link isn't working for me; where's the bounty?

Edit: Bounty link is working now: https://twitter.com/lxrjl/status/1464119232749318155 

Christiano, Cotra, and Yudkowsky on AI progress

Transcript error fixed -- the line that previously read

[Yudkowsky][17:40]  

I expect it to go away before the end of days

but with there having been a big architectural innovation, not Stack More Layers

[Christiano][17:40]  

I expect it to go away before the end of days

but with there having been a big architectural innovation, not Stack More Layers

[Yudkowsky][17:40]  

if you name 5 possible architectural innovations I can call them small or large

should be

[Yudkowsky][17:40]  

I expect it to go away before the end of days

but with there having b

... (read more)
Yudkowsky and Christiano discuss "Takeoff Speeds"

It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%).

My model of Eliezer may be wrong, but I'd guess that this isn't a domain where he has many over-50% predictions of novel events at all? See also 'I don't necessarily expect self-driving cars before the apocalypse'.

My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like 'maybe we'll make insane progress on theorem-proving (or whatever) out of the bl... (read more)

2Rob Bensinger2mo(Ah, EY already replied.)
Christiano, Cotra, and Yudkowsky on AI progress

One may ask: why aren't elephants making rockets and computers yet?

But one may ask the same question about any uncontacted human tribe.

Seems more surprising for elephants, by default: elephants have apparently had similarly large brains for about 20 million years, which is far more time than uncontacted human tribes have had to build rockets. (~100x as long as anatomically modern humans have existed at all, for example.)

5RomanS2moI agree. Additionally, the life expectancy of elephants is significantly higher than of paleolithic humans (1 [https://genomics.senescence.info/species/entry.php?species=Loxodonta_africana], 2 [https://en.wikipedia.org/wiki/Life_expectancy#Variation_over_time]). Thus, individual elephants have much more time to learn stuff. In humans, technological progress is not a given. Across different populations, it seems to be determined by the local culture, and not by neurobiological differences. For example, the ancestors of Wernher von Braun have left their technological local minimum thousands of years later than Egyptians or Chinese. And the ancestors of Sergei Korolev lived their primitive lives well into the 8th century C.E. If a Han dynasty scholar had visited the Germanic and Slavic tribes, he would've described them as hopeless barbarians, perhaps even as inherently predisposed to barbarism. Maybe if we give elephants more time, they will overcome their biological limitations (limited speech, limited "hand", fewer neurons in neocortex etc), and will escape the local minimum. But maybe not.
Load More