Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a repository for miscellaneous short things I want to post. Other people are welcome to make top-level comments here if they want. (E.g., questions for me you'd rather discuss publicly than via PM; links you think will be interesting to people in this comment section but not to LW as a whole; etc.)

New Comment
79 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Shared with permission, a google doc exchange confirming Eliezer still finds the arguments for alignment optimism, slower takeoffs, etc. unconvincing:

Daniel Filan: I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is "from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1") and I still don't totally get why.

Eliezer Yudkowsky: My bitter take:  I tried cutting back on talking to do research; and so people talked a bunch about a different scenario that was nicer to think about, and ended up with their thoughts staying there, because that's what happens if nobody else is arguing them out of it.

That is: this social-space's thought processes are not robust enough against mildly adversarial noise, that trying a bunch of different arguments for something relatively nicer to believe, won't Goodhart up a plausible-to-the-social-space argument for the thing that's nicer to believe.  If you talk people out of one error, somebody else searches around in the space of plausible arguments and finds a new error.  I wasn't fighting a mistaken argument for why AI niceness isn't too intractable and takeoffs

... (read more)

FWIW, I think Yudkowsky is basically right here and would be happy to explain why if anyone wants to discuss. I'd likewise be interested in hearing contrary perspectives.

Which of the "Reasons to expect fast takeoff" from Paul's post do you find convincing, and what is your argument against what Paul says there? Or do you have some other reasons for expecting a hard takeoff? I've seen this post of yours, but as far as I know, you haven't said much about hard vs soft takeoff in general.
9Daniel Kokotajlo
It's a combination of not finding Paul+Katja's counterarguments convincing (AI Impacts has a slightly different version of the post, I think of this as the Paul+Katja post since I don't know how much each of them did), having various other arguments that they didn't consider, and thinking they may be making mistakes in how they frame things and what questions they ask. I originally planned to write a line-by-line rebuttal of the Paul+Katja posts, but instead I ended up writing a sequence of posts that collectively constitute my (indirect) response. If you want a more direct response, I can put it on my list of things to do, haha... sorry... I am a bit overwhelmed... OK here's maybe some quick (mostly cached) thoughts: 1. What we care about is point of no return, NOT GDP doubling in a year or whatever. 2. PONR seems not particularly correlated with GDP acceleration time or speed, and thus maybe Paul and I are just talking past each other -- he's asking and answering the wrong questions. 3. Slow takeoff means shorter timelines, so if our timelines are independently pretty short, we should update against slow takeoff. My timelines are independently pretty short. (See my other sequence.) Paul runs this argument in the other direction I think; since takeoff will be slow, and we aren't seeing the beginnings of it now, timelines must be long. (I don't know how heavily he leans on this argument though, probably not much. Ajeya does this too, and does it too much I think.) Also, concretely, if crazy AI stuff happens in <10 years, probably the EMH has failed in this domain and probably we can get AI by just scaling up stuff and therefore probably takeoff will be fairly fast (at least, it seems that way extrapolating from GPT-1, GPT-2, and GPT-3. One year apart, significantly qualitatively and quantitatively better. If that's what progress looks like when we are entering the "human range" then we will cross it quickly, it seems.) 4. Discontinuities totally do sometimes hap
Thanks! My understanding of the Bostrom+Yudkowsky takeoff argument goes like this: at some point, some AI team will discover the final piece of deep math needed to create an AGI; they will then combine this final piece with all of the other existing insights and build an AGI, which will quickly gain in capability and take over the world. (You can search "a brain in a box in a basement" on this page or see here for some more quotes.) In contrast, the scenario you imagine seems to be more like (I'm not very confident I am getting all of this right): there isn't some piece of deep math needed in the final step. Instead, we already have the tools (mathematical, computational, data, etc.) needed to build an AGI, but nobody has decided to just go for it. When one project finally decides to go for an AGI, this EMH failure allows them to maintain enough of a lead to do crazy stuff (conquistadors, persuasion tools, etc.), and this leads to DSA. Or maybe the EMH failure isn't even required, just enough of a clock time lead to be able to do the crazy stuff. If the above is right, then it does seem quite different from Paul+Katja, but also different from Bostrom+Yudkowsky, since the reason why the outcome is unipolar is different. Whereas Bostrom+Yudkowsky say the reason one project is ahead is because there is some hard step at the end, you instead say it's because of some combination of EMH failure and natural lag between projects.
6Daniel Kokotajlo
Ah, this is helpful, thanks -- I think we just have different interpretations of Bostrom+Yudkowsky. You've probably been around before I was and read more of their stuff, but I first got interested in this around 2013, pre-ordered Superintelligence and read it with keen interest, etc. and the scenario you describe as mine is what I always thought Bostrom+Yudkowsky believed was most likely, and the scenario you describe as theirs -- involving "deep math" and "one hard step at the end" is something I thought they held up as an example of how things could be super fast, but not as what they actually believed was most likely. From what I've read, Yudkowsky did seem to think there would be more insights and less "just make blob of compute bigger" about a decade or two ago, but he's long since updated towards "dear lord, people really are just going to make big blobs of inscrutable matrices, the fools!" and I don't think this counts as a point against his epistemics because predicting the future is hard and most everyone else around him did even worse, I'd bet.
Ok I see, thanks for explaining. I think what's confusing to me is that Eliezer did stop talking about the deep math of intelligence sometime after 2011 and then started talking about big blobs of matrices as you say starting around 2016, but as far as I know he has never gone back to his older AI takeoff writings and been like "actually I don't believe this stuff anymore; I think hard takeoff is actually more likely to be due to EMH failure and natural lag between projects". (He has done similar things for his older writings that he no longer thinks is true, so I would have expected him to do the same for takeoff stuff if his beliefs had indeed changed.) So I've been under the impression that Eliezer actually believes his old writings are still correct, and that somehow his recent remarks and old writings are all consistent. He also hasn't (as far as I know) written up a more complete sketch of how he thinks takeoff is likely to go given what we now know about ML. So when I see him saying things like what's quoted in Rob's OP, I feel like he is referring to the pre-2012 "deep math" takeoff argument. (I also don't remember if Bostrom gave any sketch of how he expects hard takeoff to go in Superintelligence; I couldn't find one after spending a bit of time.) If you have any links/quotes related to the above, I would love to know! (By the way, I was was a lurker on LessWrong starting back in 2010-2011, but was only vaguely familiar with AI risk stuff back then. It was only around the publication of Superintelligence that I started following along more closely, and only much later in 2017 that I started putting in significant amounts of my time into AI safety and making it my overwhelming priority. I did write several timelines though, and recently did a pretty thorough reading of AI takeoff arguments for a modeling project, so that is mostly where my knowledge of the older arguments comes from.)
4Daniel Kokotajlo
For all I know you are right about Yudkowsky's pre-2011 view about deep math. However, (a) that wasn't Bostrom's view AFAICT, and (b) I think that's just not what this OP quote is talking about. From the OP: It's Yudkowsky/Bostrom, not Yudkowsky. And it's WFLLp1, not p2. Part 2 is the one where the AIs do a treacherous turn; part 1 is where actually everything is fine except that "you get what you measure" and our dumb obedient AIs are optimizing for the things we told them to optimize for rather than for what we want. I am pretty confident that WFLLp1 is not the main thing we should be worrying about; WFLLp2 is closer, but even it involves this slow-takeoff view (in the strong sense, in which economy is growing fast before the point of no return) which I've argued against. I do not think that the reason people shifted from "yudkowsky/bostrom" (which in this context seems to mean "single AI project builds AI in the wrong way, AI takes over world" and to WFLLp1 is that people rationally considered all the arguments and decided that WFLLp1 was on balance more likely. I think instead that probably some sort of optimism bias was involved, and more importantly win by default (Yud + Bostrom stopped talking about their scenarios and arguing for them, whereas Paul wrote a bunch of detailed posts laying out his scenarios and arguments, and so in the absence of visible counterarguments Paul wins the debate by default). Part of my feeling about this is that it's a failure on my part; when Paul+Katja wrote their big post on takeoff speeds I disagreed with it and considered writing a big point-by-point response, but never did, even after various people posted questions asking "has there been any serious response to Paul+Katja?"
Re (a): I looked at chapters 4 and 5 of Superintelligence again, and I can kind of see what you mean, but I'm also frustrated that Bostrom seems really non-committal in the book. He lists a whole bunch of possibilities but then doesn't seem to actually come out and give his mainline visualization/"median future". For example he looks at historical examples of technology races and compares how much lag there was, which seems a lot like the kind of thinking you are doing, but then he also says things like "For example, if human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without even touching the intermediary rungs." which sounds like the deep math view. Another relevant quote: Re (b): I don't disagree with you here. (The only part that worries me is, I don't have a good idea of what percentage of "AI safety people" shifted from one view to the other, whether were were also new people with different views coming in to the field, etc.) I realize the OP was mainly about failure scenarios, but it did also mention takeoffs ("takeoffs won't be too fast") and I was most curious about that part.
4Daniel Kokotajlo
I also wish I knew what Bostrom's median future was like, though I perhaps understand why he didn't put it in his book -- the incentives all push against it. Predicting the future is hard and people will hold it against you if you fail, whereas if you never try at all and instead say lots of vague prophecies, people will laud you as a visionary prophet. Re (b) cool, I think we are on the same page then. Re takeoff being too fast--I think a lot of people these days think there will be plenty of big scary warning shots and fire alarms that motivate lots of people to care about AI risk and take it seriously. I think that suggests that a lot of people expect a fairly slow takeoff, slower than I think is warranted. Might happen, yes, but I don't think Paul & Katja's arguments are that convincing that takeoff will be this slow. It's a big source of uncertainty for me though.
4Matthew Barnett
I'd personally like to find some cruxes between us some time, though I don't yet know the best format to do that. I think I'll wait to see your responses to Issa's question first.
4Daniel Kokotajlo
Likewise! I'm up for a video call if you like. Or we could have a big LW thread, or an email chain. I think my preference would be a video call. I like Walled Garden, we could do it there and invite other people maybe. IDK.
Did this ever happen?
2Daniel Kokotajlo
I don't think so? It's possible that it did and I forgot.
A belief can be a negation in the sense of a contradiction , whilst not being a negation in the sense of a disproof. I dont think EY disproved RH's position. I dont think he is confident he did himself, since his summary was called "what I believe if not why I believe it". And I dont think lack of time was the problem, since the debate was immense.
4Daniel Kokotajlo
Interesting, yeah I wonder why he titled it that. Still though it seems like he is claiming here to have disproved RH's position to some extent at least. I for one think RH's position is pretty implausible, for reasons Yudkowsky probably mentioned (I don't remember exactly what Yud said).
Why the "seems"? A master rationalist should be able to state things clearly , surely?

Rolf Degen, summarizing part of Barbara Finlay's "The neuroscience of vision and pain":

Humans may have evolved to experience far greater pain, malaise and suffering than the rest of the animal kingdom, due to their intense sociality giving them a reasonable chance of receiving help.

From the paper:

Several years ago, we proposed the idea that pain, and sickness behaviour had become systematically increased in humans compared with our primate relatives, because human intense sociality allowed that we could ask for help and have a reasonable chance of receiving it. We called this hypothesis ‘the pain of altruism’ [68]. This idea derives from, but is a substantive extension of Wall’s account of the placebo response [43]. Starting from human childbirth as an example (but applying the idea to all kinds of trauma and illness), we hypothesized that labour pains are more painful in humans so that we might get help, an ‘obligatory midwifery’ which most other primates avoid and which improves survival in human childbirth substantially ([67]; see also [69]). Additionally, labour pains do not arise from tissue damage, but rather predict possible
... (read more)

[Epistemic status: Thinking out loud]

If the evolutionary logic here is right, I'd naively also expect non-human animals to suffer more to the extent they're (a) more social, and (b) better at communicating specific, achievable needs and desires.

There are reasons the logic might not generalize, though. Humans have fine-grained language that lets us express very complicated propositions about our internal states. That puts a lot of pressure on individual humans to have a totally ironclad, consistent "story" they can express to others. I'd expect there to be a lot more evolutionary pressure to actually experience suffering, since a human will be better at spotting holes in the narratives of a human who fakes it (compared to, e.g., a bonobo trying to detect whether another bonobo is really in that much pain).

It seems like there should be an arms race across many social species to give increasingly costly signals of distress, up until the costs outweigh the amount of help they can hope to get. But if you don't have the language to actually express concrete propositions like "Bob took care of me the last time I got sick, six months ago, and he can attest that I had a hard time walking that time too", then those costly signals might be mostly or entirely things like "shriek louder in response to percept X", rather than things like "internally represent a hard-to-endure pain-state so I can more convincingly stick to a verbal narrative going forward about how hard-to-endure this was".

[Epistemic status: Piecemeal wild speculation; not the kind of reasoning you should gamble the future on.]

Some things that make me think suffering (or 'pain-style suffering' specifically) might be surprisingly neurologically conditional and/or complex, and therefore more likely to be rare in non-human animals (and in subsystems of human brains, in AGI subsystems that aren't highly optimized to function as high-fidelity models of humans, etc.):

1. Degen and Finlay's social account of suffering above.

2. Which things we suffer from seems to depend heavily on mental narratives and mindset. See, e.g., Julia Galef's Reflections on Pain, from the Burn Unit.

Pain management is one of the main things hypnosis appears to be useful for. Ability to cognitively regulate suffering is also one of the main claims of meditators, and seems related to existential psychotherapy's claim that narratives are more important for well-being than material circumstances.

Even if suffering isn't highly social (pace Degen and Finlay), its dependence on higher cognition suggests that it is much more complex and conditional than it might appear on initial introspection, which on its own reduces the probability of it

... (read more)
4Rob Bensinger
Devoodooifying Psychology says "the best studies now suggest that the placebo effect is probably very weak and limited to controlling pain".
1Rudi C
How is the signal being kept “costly/honest” though? Is the pain itself the cost? That seems somewhat weird ...

Facebook comment I wrote in February, in response to the question 'Why might having beauty in the world matter?':

I assume you're asking about why it might be better for beautiful objects in the world to exist (even if no one experiences them), and not asking about why it might be better for experiences of beauty to exist.

[... S]ome reasons I think this:

1. If it cost me literally nothing, I feel like I'd rather there exist a planet that's beautiful, ornate, and complex than one that's dull and simple -- even if the planet can never be seen or visited by anyone, and has no other impact on anyone's life. This feels like a weak preference, but it helps get a foot in the door for beauty.

(The obvious counterargument here is that my brain might be bad at simulating the scenario where there's literally zero chance I'll ever interact with a thing; or I may be otherwise confused about my values.)

2. Another weak foot-in-the-door argument: People seem to value beauty, and some people claim to value it terminally. Since human value is complicated and messy and idiosyncratic (compare person-specific ASMR triggers or nostalgia triggers or culinary pref... (read more)

Somewhat more meta level: Heuristically speaking, it seems wrong and dangerous for the answer to "which expressed human preferences are valid?" to be anything other than "all of them". There's a common pattern in metaethics which looks like:

1. People seem to have preference X

2. X is instrumentally valuable as a source of Y and Z. The instrumental-value relation explains how the preference for X was originally acquired.

3. [Fallacious] Therefore preference X can be ignored without losing value, so long as Y and Z are optimized.

In the human brain algorithm, if you optimize something instrumentally for awhile, you start to value it terminally. I think this is the source of a surprisingly large fraction of our values.

2Rob Bensinger
Old discussion of this on LW:

Collecting all of the quantitative AI predictions I know of MIRI leadership making on Arbital (let me know if I missed any):

Some caveats:

  • Arbital predictions range from 1% to 99%.
  • I assume these are generally ~5 years old. Views may have shifted.
  • By default, I assume that the standard caveats for probabilities like these apply: I treat these as off-the-cuff ass numbers unless stated otherwise, products of 'thinking about the problem on and off for years and then querying my gut about what it expects to actually see', more so than of building Guesstimate models or trying to hard to make sure all the probabilities are perfectly coher
... (read more)

On my model, the point of ass numbers isn't to demand perfection of your gut (e.g., of the sort that would be needed to avoid multiple-stage fallacies when trying to conditionalize a lot), but to:

  1. Communicate with more precision than English-language words like 'likely' or 'unlikely' allow. Even very vague or uncertain numbers will, at least some of the time, be a better guide than natural-language terms that weren't designed to cover the space of probabilities (and that can vary somewhat in meaning from person to person).
  2. At least very vaguely and roughly bring your intuitions into contact with reality, and with each other, so you can more readily notice things like 'I'm miscalibrated', 'reality went differently than I expected', 'these two probabilities don't make sense together', etc.

It may still be a terrible idea to spend too much time generating ass numbers, since "real numbers" are not the native format human brains compute probability with, and spending a lot of time working in a non-native format may skew your reasoning.

(Maybe there's some individual variation here?)

But they're at least a good tool to use sometimes, for the sake of crisper communication, calibration practice (so you can generate non-awful future probabilities when you need to), etc.

Suppose most people think there's a shrew in the basement, and Richard Feynman thinks there's a beaver. If you're pretty sure it's not a shrew, two possible reactions include:

- 'Ah, the truth is probably somewhere in between these competing perspectives. So maybe it's an intermediate-sized rodent, like a squirrel.'

- 'Ah, Feynman has an absurdly good epistemic track record, and early data does indicate that the animal's probably bigger than a shrew. So I'll go with his guess and say it's probably a beaver.'

But a third possible response is:

- 'Ah, if Feynman's right, then a lot of people are massively underestimating the rodent's size. Feynman is a person too, and might be making the same error (just to a lesser degree); so my modal guess will be that it's something bigger than a beaver, like a capybara.'

In particular, you may want to go more extreme than Feynman if you think there's something systematically causing people to underestimate a quantity (e.g., a cognitive bias -- the person who speaks out first against a bias might still be affected by it, just to a lesser degree), or systematically causing people to make weaker claims than they really believe (e.g., maybe people don't want to sound extreme or out-of-step with the mainstream view).

This is true! But I think it's important to acknowledge that this depends a lot on details of Feynman's reasoning process, and it doesn't go in a consistent direction. If Feynman is aware of the bias, he may have already compensated for it in his own estimate, so compensating on his behalf would be double-counting the adjustment. And sometimes the net incentive is to overestimate, not to underestimate, because you're trying to sway the opinion of averagers, or because being more contrarian gets attention, or because shrew-thinkers feel like an outgroup. In the end, you can't escape from detail. But if you were to put full power into making this heuristic work, the way to do it would be to look at past cases of Feynman-vs-world disagreement (broadening the "Feynman" and "world" categories until there's enough training data), and try to get a distribution empirically.
2Rob Bensinger
2Thomas Kwa
Have you seen this ever work for an advance prediction? It seems like you need to be in a better epistemic position than Feynman, which is pretty hard.

From Twitter:

I am not sure I can write out the full AI x-risk scenario.

1. AI quickly becomes super clever

2. Alignment is hard, like getting your great x10 grandchildren to think you're a good person

3. The AI probably embarks on a big project which ignores us and accidentally kills us

Where am I wrong? Happy to be sent stuff to read.

I replied:

"1. AI quickly becomes super clever"

My AI risk model (which is not the same as everyone's) more specifically says:

1a. We'll eventually figure out how to make AI that's 'generally good at science' -- like how humans can do sciences that didn't exist when our brains evolved.

1b. AGI / STEM AI will have a large, fast, and discontinuous impact. Discontinuous because it's a new sort of intelligence (not just AlphaGo 2 or GPT-5); large and fast because STEM is powerful, plus humans suck at STEM and aren't cheap software that scales as you add hardware.

(Warning: argument is compressed for Twitter character count. There are other factors too, like recursive self-improvement.)

"2. Alignment is hard, like getting your great x10 grandchildren to think you're a good person"

I'd say it's hard like building a large/complex, novel software system that exhibits so

... (read more)

Chana Messinger, replying to Brandon Bradford:

I find this very deep

"Easy to make everything a conspiracy when you don't know how anything works."

Everything literally is a conspiracy (in some nonstandard technical sense), and if you don't know how anything works, then it's a secret conspiracy.

How does water get to your faucet? How many people are responsible for your internet? What set of events had to transpire to make you late for work? How does one build a microwave?

Something about this points at how complicated everything is and how little we individually know about it.

4Gordon Seidoh Worley
It's a conspiracy, man. You gotta pay for water that's just sitting there in the ground. It's the corporations tricking people into thinking they gotta drink water that came from their pipes because it's been "sanitized". Wake up sheeple, they're sanitizing your brains!!! And don't get me started on showing up later for work! We all know I could be there in just 15 minutes if there were no traffic, but you see the corporations make more money the longer you drive your car, so they make sure everyone has to be at work at the same time so there's a traffic jam. Then we all use more fuel, end up buying fast food and other junk because we lost so much time in traffic, and gotta put in extra effort at work because the boss is always mad about how we're showing up late. And to top it all of, all those exhaust fumes are causing ozone holes and global warming which is just what they want because then they can use it to sell you more stuff like sunscreen or to control your life to fight "climate change". As for the microwave? Well, let's just say it's not a coincidence they blow up if you put tin foil in them. Me? I'm keeping my tin foil right where intended: on my head to block out the thought rays and subliminal messages sent out by their so-called "internet".

From an April 2019 Facebook discussion:

Rob Bensinger: avacyn:

I think one strong argument in favor of eating meat is that beef cattle (esp. grass-fed) might have net positive lives. If this is true, then the utilitarian line is to 1) eat more beef to increase demand, 2) continue advocating for welfare reforms that will make cows' lives even more positive.

Beef cattle are different than e.g. factory farmed chicken in that they live a long time (around 3 years on average vs 6-7 weeks for broilers), and spend much of their lives grazing on stockers where they might have natural-ish lives.

Another argument in favor of eating beef is that it tends to lead to deforestation, which decreases total wild animal habitat, which one might think are worse than beef farms.

... I love how EA does veganism / animal welfare things. It's really good.

(From the comment section on

[... Note that in posting this I'm not intending] to advocate for a specific intervention; it's more that it makes me happy to see thorough and outside-the-box reasoning from folks who are trying to help others, whether or not they have the same backgr

... (read more)
I really like the FB crossposts here, and also really like this specific comment. Might be worth polishing it into a top-level post, either here or on the EA Forum sometime.
4Rob Bensinger
Thanks! :) I'm currently not planning to polish it; part of the appeal of cross-posting from Facebook for me is that I can keep it timeboxed by treating it as an artifact of something I already said. I guess someone else could cannibalize it into a prettier stand-alone post.

While your comment was clearly written in good faith, it seems to me like you're missing some context. You recommend that EY recommend that the detractors read books. EY doesn't just recommend people read books. He wrote the equivalent of like three books on the subjects relevant to this conversation in particular which he gives away for free. Also, most of the people in this conversation are already big into reading books.

It is my impression he also helped establish the Center for Applied Rationality, which has the explicit mission of training skills. (I'm not sure if he technically did but he was part of the community which did and he helped promote it in its early days.)

4Eli Tyre
Eliezer was involved with CFAR in the early days, but has not been involved since at least 2016.

From an April 2019 Facebook discussion:

Rob Bensinger:

Julia Galef: Another one of your posts that has stayed with me is a post in which you were responding to someone's question -- I think the question was, “What are your favorite virtues?” And you described three. They were compassion for yourself; creating conditions where you'll learn the truth; and sovereignty. [...] Can you explain briefly what sovereignty means?

Kelsey Piper: Yeah, so I characterize sovereignty as the virtue of believing yourself qualified to reason about your life, and to reason about the world, and to act based on your understanding of it.

I think it is surprisingly common to feel fundamentally unqualified even to reason about what you like, what makes you happy, which of several activities in front of you you want to do, which of your priorities are really important to you.

I think a lot of people feel the need to answer those questions by asking society what the objectively correct answer is, or trying to understand which answer won't get them in trouble. And so I think it's just really important to learn to answer those questions with what you actually want and what you actually care about. [...]

Julia Galef:

... (read more)
4Rob Bensinger
Rob Wiblin, August 2019: Cf. Brienne Yudkowsky on shame and the discussion on

Copied from some conversations on Twitter:

· · · · · · · · · · · · · · · 

Eric Rogstad: I think "illusionism" is a really misleading term. As far as I can tell, illusionists believe that consciousness is real, but has some diff properties than others believe.

It's like if you called Einstein an "illusionist" w.r.t. space or time.

See my comments here: 

Rob Bensinger: I mostly disagree. It's possible to define a theory-neutral notion of 'consciousness', but I think it's just true that 'there's no such thing as subjective awareness / qualia / etc.', and I think this cuts real dang deep into the heart of what most people mean by consciousness.

Before the name illusionism caught on, I had to use the term 'eliminativism', but I had to do a lot of work to clarify that I'm not like old-school eliminativists who think consciousness is obviously or analytically fake. Glad to have a clearer term now.

I think people get caught up in knots about the hard problem of consciousness because they try to gesture at 'the fact that they have subjective awareness', without realizing they're gesturing... (read more)

2Rob Bensinger
Hrothgar: What's your answer to the hard problem of consciousness? Rob Bensinger: The hard problem makes sense, and seems to successfully do away with 'consciousness is real and reducible'. But 'consciousness is real and irreducible' isn't tenable: it either implies violations of physics as we know it (interactionism), or implies we can't know we're conscious (epiphenomenalism). So we seem to be forced to accept that consciousness (of the sort cited in the hard problem) is somehow illusory. This is... very weird and hard to wrap one's head around. But some version of this view (illusionism) seems incredibly hard to avoid. (Note: This is a twitter-length statement of my view, so it leaves out a lot of details. E.g., I think panpsychist views must be interactionist or epiphenomenalist, in the sense that matters. But this isn't trivial to establish.) Hrothgar: What does "illusory" mean here? I think I'm interpreting as gesturing toward denying consciousness is happening, which is, like, the one thing that can't even be doubted (since the experience of doubt requires a conscious experiencer in the first place) Rob Bensinger: I think "the fact that I'm having an experience" seems undeniable. E.g., it seems to just be a fact that I'm experiencing this exact color of redness as I look at the chair next to me. There's a long philosophical tradition of treating experience as 'directly given', the foundation on which all our other knowledge is built. I find this super compelling and intuitive at a glance, even if I can't explain how you'd actually build a brain/computer that has infallible 'directly given' knowledge about some of its inner workings. But I think the arguments alluded to above ultimately force us to reject this picture, and endorse the crazy-sounding view 'the character of my own experiences can be illusory, even though it seems obviously directly given'. An attempt to clarify what this means:
Edit: What it implies is violations of physicalism. You can accept that physics is a map that predicts observations, without accepting that it is the map, to which all other maps must be reduced. The epiphenomenalist worry is that, if qualia are not denied entirely, they have no causal role to play, since physical causation already accounts for everything that needs to be accounted for. But physics is a set of theories and descriptions...a map. Usually, the ability of a map to explain and is not exclusive of another map's ability to do so on. We can explain the death of Mr Smith as the result of bullet entering his heart, or as the result of a finger squeezing a trigger, or a a result of the insurance policy recently taken out on his life, and so on. So why can't we resolve the epiphenomenal worry by saying that that physical causation and mental causation are just different, non rivalrous, maps? I screamed because my pain fibres fired" alongside -- not versus "I screamed becaue I felt a sharp pain". It is not the case that there is physical stuff that is doing all the causation, and mental stuff that is doing none of it: rather there is a physical view of what is going on, and a mentalistic view. Physicalists are reluctant to go down this route, because physicalism is based on the idea that there is something special about the physical map, which means it is not just another map. This special quality means that a physical explanation excludes others, unlike a typical map. But what is it? It's rooted in reductionism, the idea that every other map (that is, every theory of the special sciences) can or should reduce to the physical map. But the reducibility of consciousness is the center of the Hard Problem. If consciousness really is irreducible, and not just unreduced, then that is evidence against the reduction of everything to the physical, and, in turn, evidence against the special, exclusive nature of the physical map. So, without the reducibility of cons
2Rob Bensinger
If the physics map doesn't imply the mind map (because of the zombie argument, the Mary's room argument, etc.), then how do you come to know about the mind map? The causal process by which you come to know the physics map is easy to understand: What is the version of this story for the mind map, once we assume that the mind map has contents that have no causal effect on the physical world? (E.g., your mind map had absolutely no effect on the words you typed into the LW page.) At some point you didn't have a concept for "qualia"; how did you learn it, if your qualia have no causal effects? At some point you heard about the zombie argument and concluded "ah yes, my mental map must be logically independent of my physical map"; how did you do that without your mental map having any effects?   I can imagine an interactionist video game, where my brain has more processing power than the game and therefore can't be fully represented in the game itself. It would then make sense that I can talk about properties that don't exist within the game's engine: I myself exist outside the game universe, and I can use that fact to causally change the game's outcomes in ways that a less computationally powerful agent could not. Equally, I can imagine an epiphenomenal video game, where I'm strapped into a headset but forbidden from using the controls. I passively watch the events occurring in the game; but no event in the game ever reflects or takes note of the fact that I exist or have any 'unphysical' properties, and if there is an AI steering my avatar or camera's behavior, the AI knows zilch about  me. (You could imagine a programmer deliberately designing the game to have NPCs talk about entities outside the game world; but then the programmer's game-transcending cognitive capacities are not epiphenomenal relative to the game.) The thing that doesn't make sense is to import intuitions from the interactionist game to the epiphenomenal game, while insisting it's all still epip
Direct evidence. That's the starting point of the whole thing. People think that they have qualia because it seems to them that they do. Edit: In fact, it's the other way round: we are always using the mind map, but we remove the subjectivity, "warm fuzzies" from it to arrive at the physics map. Ho wdo we know that physics is the whole story, when we start with our experience, and make a subset of it? I'm not assuming that. I'm arguing against epiphenomenalism. So I am saying that the mental is causal, but I am not saying that it is a kind of physical causality, as per reductive physicalism. Reductive physicalism is false because consciousness is irreducible, as you agree. Since mental causation isn't a kind of physical causation, I don't have to give a physical account if it. And I am further not saying that the physical and mental are two separate ontologcal domains, two separate territories. I am talking about maps, not territories. Without ontological dualism, there are no issues of overdetermination or interaction.

It's apparently not true that 90% of startups fail. From Ben Kuhn:

Hot take: the outside view is overrated.

(“Outside view” = e.g. asking “what % of startups succeed?” and assuming that’s ~= your chance of success.)

In theory it seems obviously useful. In practice, it makes people underrate themselves and prematurely give up their ambition. 

One problem is that finding the right comparison group is hard.

For instance, in one commonly-cited statistic that “90% of startups fail,” (

... (read more)

I don't have a cite handy as it's memories from 2014 but when I looked into it I recall the 7 year failure rate excluding the obvious dumb stuff like restaurants was something like 70% but importantly the 70% number included acquisitions, so the actual failure rate was something like 60 ish.

A blurb for the book "The Feeling of Value":

This revolutionary treatise starts from one fundamental premise: that our phenomenal consciousness includes direct experience of value. For too long, ethical theorists have looked for value in external states of affairs or reduced value to a projection of the mind onto these same external states of affairs. The result, unsurprisingly, is widespread antirealism about ethics.

In this book, Sharon Hewitt Rawlette turns our metaethical gaze inward and dares us to consider that value, rather than being something “out t

... (read more)

Ben Weinstein-Raun wrote on social media:

It seems to me that the basic appeal of panpsychism goes like "It seems really weird that you can put together some apparently unfeeling pieces, and then out comes this thing that feels. Maybe those things aren't actually unfeeling? That would sort of explain where the feeling-ness comes from."

But this feels kind of analogous to a being that doesn't have a good theory about houses, but is aware that some things are houses and some things aren't, by their experiences of those things. Such a being might analogously re

... (read more)

I think panpsychism is outrageously false, and profoundly misguided as an approach to the hard problem.

What do you think of Brian Tomasik's flavor of panpsychism, which he says is compatible with (and, indeed, follows from) type-A materialism? As he puts it,

It's unsurprising that a type-A physicalist should attribute nonzero consciousness to all systems. After all, "consciousness" is a concept -- a "cluster in thingspace" -- and all points in thingspace are less than infinitely far away from the centroid of the "consciousness" cluster. By a similar argument, we might say that any system displays nonzero similarity to any concept (except maybe for strictly partitioned concepts that map onto the universe's fundamental ontology, like the difference between matter vs. antimatter). Panpsychism on consciousness is just one particular example of that principle.

4Rob Bensinger
I haven't read Brian Tomasik's thoughts on this, so let me know if you think I'm misunderstanding him / should read more. The hard problem of consciousness at least gives us a prima facie reason to consider panpsychism. (Though I think this ultimately falls apart when we consider 'we couldn't know about the hard problem of consciousness if non-interactionist panpsychism were true; and interactionist panpsychism would mean new, detectable physics'.) If we deny the hard problem, then I don't see any reason to give panpsychism any consideration in the first place. We could distinguish two panpsychist views here: 'trivial' (doesn't have any practical implications, just amounts to defining 'consciousness' so broadly as to include anything and everything); and 'nontrivial' (has practical implications, or at least the potential for such; e.g., perhaps the revelation that panpsychism is true should cause us to treat electrons as moral patients, with their own rights and/or their own welfare). But I see no reason whatsoever to think that electrons are moral patients, or that electrons have any other nontrivial mental property. The mere fact that we don't fully understand how human brains work is not a reason to ask whether there's some new undiscovered feature of particles ∼1031 times smaller than a human brain that explains the comically larger macro-process -- any more than limitations in our understanding of stomachs would be a reason to ask whether individual electrons have some hidden digestive properties.
(Brian Tomasik's view superficially sounds a lot like what Ben Weinstein-Raun is criticizing in his second paragraph, so I thought I'd add here the comment I wrote in response to Ben's post: I'm not sure if I should quote Ben's reply to me, since his post is not public, but he pretty much said that his original post was not addressing type-A physicalist panpsychism, although he finds this view unuseful for other reasons. )
Thanks for sharing. :) Yeah, it seems like most people have in mind type-F monism when they refer to panpsychism, since that's the kind of panpsychism that's growing in popularity in philosophy in recent years. I agree with Rob's reasons for rejecting that view.
There's another theory that isn't even on Chalmers's list: dual aspect neutral monism. This holds that the physical sciences are one possible map of territory which is not itself, intrinsically, physical (or, for that matter, mental). Consciousness is another map, or aspect. This approach has the advantage of dualism, in that there is no longer a need to explain the mental in terms of the physical, to reduce it to the physical, because the physical is no longer regarded as fundamental (nor is the mental, hence the "neutral"). Although an ontological identity between the physical and mental is accepted, the epistemic irreducibility of the mental to the physical is also accepted. Physicalism, in the sense that the physical sciences have a unique and priveleged explanatory role, is therefore rejected. Nonetheless, the fact that the physical sciences "work" in many ways, that the physical map can be accurate, is retained. Moreover, since Dual Aspect theory is not fully fledged dualism, it is able to sidestep most or all of the standard objections to dualism. To take one example, since the a conscious mental state and physical brain state are ultimately the same thing, the expected relationships hold between them. For instance, mental states cannot vary without some change in the physical state (supervenience follows directly from identity, without any special apparatus); furthermore, since mental states are ultimately identical to physical brain states, they share the causal powers of brain states (again without the need to posit special explanatory apparatus such as "psychophysical laws")This holds that the physical sciences are one possible map of territory which is not itself, intrinsically, physical (or, for that matter, mental). Consciousness is another map, or aspect. This approach has the advantage of dualism, in that there is no longer a need to explain the mental in terms of the physical, to reduce it to the physical, because the physical is no longer rega

[Epistemic status: Thinking out loud, just for fun, without having done any scholarship on the topic at all.]

It seems like a lot of horror games/movies are converging on things like 'old people', 'diseased-looking people', 'psychologically ill people', 'women', 'children', 'dolls', etc. as particularly scary.

Why would that be, from an evolutionary perspective? If horror is about fear, and fear is about protecting the fearful from threats, why would weird / uncanny / out-of-evolutionary-distribution threats have a bigger impact than e.g. 'lots of human warr

... (read more)
6Rob Bensinger
A second question is why horror films and games seem to be increasingly converging on the creepy/uncanny/mysterious cluster of things, rather than on the overtly physically threatening cluster -- assuming this is a real trend. Some hypotheses about the second question: * A: Horror games and movies are increasingly optimizing for dread instead of terror these days, maybe because it's novel -- pure terror feels overdone and out-of-fashion. Or because dread just lends itself to a more fun multi-hour viewing/playing experience, because it's more of a 'slow burn'. Or something else. * B: Horror games aren't optimizing for dread to the exclusion of terror; rather, they've discovered that dread is a better way to maximize terror. Why would B be true? One just-so story you could tell is that humans have multiple responses to possible dangers, ranging from 'do some Machiavellian scheming to undermine a political rival' to 'avoid eating that weird-smelling food' to 'be cautious near that precipice' to 'attack' to 'flee'. Different emotions correspond to different priors on 'what reaction is likeliest to be warranted here?', and different movie genres optimize for different sets of emotions. And optimizing for a particular emotion usually involves steering clear of things that prime a person to experience a different emotion -- people want a 'purer' experience. So one possibility is: big muscular agents, lion-like agents, etc. are likelier to be dangerous (in reality) than a decrepit corpse or a creepy child or a mysterious frail woman; but the correct response to hulking masculine agents is much more mixed between 'fight / confront' and 'run away / avoid', whereas the correct response to situations that evoke disgust, anxiety, uncertainty, and dread is a lot more skewed toward 'run away / avoid'. And an excess of jumpscare-ish, heart-pounding terror does tend to incline people more toward running away than toward fighting back, so it might be that both terror and dread

The wiki glossary for the sequences / Rationality: A-Z ( ) is updated now with the glossary entries from the print edition of vol. 1-2.

New entries from Map and Territory:

anthropics, availability heuristic, Bayes's theorem, Bayesian, Bayesian updating, bit, Blue and Green, calibration, causal decision theory, cognitive bias, conditional probability, confirmation bias, conjunction fallacy, deontology, directed acyclic graph, elan vital, Everett branch, expected value, Fermi paradox, foozality, hindsight bias,
... (read more)
6Said Achmiz
This reminds me of something I’ve been meaning to ask: Last I checked, the contents of the Less Wrong Wiki were licensed under the GNU Free Documentation License, which is… rather inconvenient. Is it at all possible to re-license it (ideally as CC BY-NC-SA, to match R:AZ itself)? (My interest in this comes from the fact that the Glossary is mirrored on, and I’d prefer not to have to deal with two different licenses, as I currently have to.)
I can reach out to Trike Apps about this, but can we actually do this? Seems plausible that we would have to ask for permission from all editors involved in a page before we can change the license.
4Said Achmiz
I have no idea; I cannot claim to really understand the GFDL well enough to know… but if doable, this seems worthwhile, as there’s a lot of material on the wiki which I and others could do various useful/interesting things with, if it were released under a convenient license.
4Rob Bensinger
Are there any other OK-quality rationalist glossaries out there? is the only one I know of. I vaguely recall there being one on at some point, but I might be misremembering.
It's optimized on a *very* different axis, but there's the Rationality Cardinality card database.
2Rob Bensinger
That counts! :) Part of why I'm asking is in case we want to build a proper LW glossary, and Rationality Cardinality could at least provide ideas for terms we might be missing.

Jeffrey Ladish asked on Twitter:

Do you think the singularity (technological singularity) is a useful term? I've been seeing it used less among people talking about the future of humanity and I don't understand why. Many people still think an intelligence explosion is likely, even if it's "slow"

I replied:

'Singularity' was vague ( and got too associated with Kurzweilian magical thinking, so MIRI switched to something like:

'rapid capability gain' = progress from pretty-low-impact AI to astro

... (read more)
I suspect there's a school of thought for which "singularity" was massively overoptimistic - is this what you mean by Kurzweilian magical thinking?  That it's a transition in a very short period of time from scarcity-based capitalism to post-scarcity utopia.   Rather than a simple destruction of most of humanity, and of the freedom and value of those remaining.
4Rob Bensinger
No, that part of Kurzweil's view is 100% fine. In fact, I believe I expect a sharper transition than Kurzweil expects. My objection to Kurzweil's thinking isn't 'realistic mature futurists are supposed to be pessimistic across the board', it's specific unsupported flaws in his arguments: * Rejection of Eliezer's five theses (which were written in response to Kurzweil): intelligence explosion, orthogonality, convergent instrumental goals, complexity of value, fragility of value. * Mystical, quasi-Hegelian thinking about surface trends like 'economic growth'. See the 'Actual Ray Kurzweil' quote in * Otherwise weird and un-Bayesian-sounding attitudes toward forecasting. Seems to think he has a crystal ball that lets him exactly time tech developments, even where he has no model of a causal path by which he could be entangled with evidence about that future development...?

From Facebook:

Mark Norris Lance: [...] There is a long history of differential evaluation of actions taken by grassroots groups and similar actions taken by elites or those in power. This is evident when we discuss violence. If a low-power group places someone under their control it is kidnapping. If they assess their crimes or punish them for it, it is mob justice or vigilanteism. [...]

John Maxwell: Does the low power group in question have a democratic process for appointing judges who then issue arrest warrants?

That's a key issue for me... "Mob rule" is

... (read more)


Why are online political discussions perceived to contain elevated levels of hostility compared to offline discussions? In this manuscript, we leverage cross-national representative surveys and online behavioral experiments to [test] the mismatch hypothesis regarding this hostility gap. The mismatch hypothesis entails that novel features of online communication technology induce biased behavior and perceptions such that ordinary people are, e.g., less able to regulate negative emotions in online

... (read more)

Yeah, I'm an EA: an Estimated-as-Effective-in-Expectation (in Excess of Endeavors with Equivalent Ends I've Evaluated) Agent with an Audaciously Altruistic Agenda.

5Rob Bensinger
This is being cute, but I do think parsing 'effective altruist' this way makes a bit more sense than tacking on the word 'aspiring' and saying 'aspiring EA'. (Unless you actually are a non-EA who's aspiring to become one.) I'm not an 'aspiring effective altruist'. It's not that I'm hoping to effectively optimize altruistic goals someday. It's that I'm already trying to do that, but I'm uncertain about whether I'm succeeding. It's an ongoing bet, not an aspiration to do something in the future.   'Aspiring rationalist' is better, but it feels at least a little bit artificial or faux-modest to me -- I'm not aspiring to be a rationalist, I'm aspiring to be rational. I feel like rationalism is weight-training, and rationality is the goal. If people are unhealthy, we might use 'health-ism' to refer to a community or a practice for improving health. If everyone is already healthy, it seems fine to say they're healthy but weird to say 'they're healthists'. Why is it an ism? Isn't it just a fact about their physiology?

How would you feel about the creation of a Sequence of Shortform Feeds? (Including this one?) (Not a mod.)

2Rob Bensinger
I can't speak for Rob but I'd be fine with my own shortform feed being included.

In the context of a conversation with Balaji Srinivasan about my AI views snapshot, I asked Nate Soares what sorts of alignment results would impress him, and he said:

example thing that would be relatively impressive to me: specific, comprehensive understanding of models (with the caveat that that knowledge may lend itself more (and sooner) to capabilities before alignment). demonstrated e.g. by the ability to precisely predict the capabilities and quirks of the next generation (before running it)

i'd also still be impressed by simple theories of aimable co

... (read more)

For being too indistinguishable from GPT-3.