A few misconceptions surrounding Roko's basilisk

by Rob Bensinger5 min read5th Oct 2015130 comments


Dark ArtsInformation HazardsRoko's BasiliskNewcomb's ProblemBlackmail / Extortion

There's a new LWW page on the Roko's basilisk thought experiment, discussing both Roko's original post and the fallout that came out of Eliezer Yudkowsky banning the topic on Less Wrong discussion threads. The wiki page, I hope, will reduce how much people have to rely on speculation or reconstruction to make sense of the arguments.

While I'm on this topic, I want to highlight points that I see omitted or misunderstood in some online discussions of Roko's basilisk. The first point that people writing about Roko's post often neglect is:

  • Roko's arguments were originally posted to Less Wrong, but they weren't generally accepted by other Less Wrong users.

Less Wrong is a community blog, and anyone who has a few karma points can post their own content here. Having your post show up on Less Wrong doesn't require that anyone else endorse it. Roko's basic points were promptly rejected by other commenters on Less Wrong, and as ideas not much seems to have come of them. People who bring up the basilisk on other sites don't seem to be super interested in the specific claims Roko made either; discussions tend to gravitate toward various older ideas that Roko cited (e.g., timeless decision theory (TDT) and coherent extrapolated volition (CEV)) or toward Eliezer's controversial moderation action.

In July 2014, David Auerbach wrote a Slate piece criticizing Less Wrong users and describing them as "freaked out by Roko's Basilisk." Auerbach wrote, "Believing in Roko’s Basilisk may simply be a 'referendum on autism'" — which I take to mean he thinks a significant number of Less Wrong users accept Roko’s reasoning, and they do so because they’re autistic (!). But the Auerbach piece glosses over the question of how many Less Wrong users (if any) in fact believe in Roko’s basilisk. Which seems somewhat relevant to his argument...?

The idea that Roko's thought experiment holds sway over some community or subculture seems to be part of a mythology that’s grown out of attempts to reconstruct the original chain of events; and a big part of the blame for that mythology's existence lies on Less Wrong's moderation policies. Because the discussion topic was banned for several years, Less Wrong users themselves had little opportunity to explain their views or address misconceptions. A stew of rumors and partly-understood forum logs then congealed into the attempts by people on RationalWiki, Slate, etc. to make sense of what had happened.

I gather that the main reason people thought Less Wrong users were "freaked out" about Roko's argument was that Eliezer deleted Roko's post and banned further discussion of the topic. Eliezer has since sketched out his thought process on Reddit:

When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they'd invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error---keeping in mind that I had absolutely no idea that any of this would ever blow up the way it did, if I had I would obviously have kept my fingers quiescent---of not making it absolutely clear using lengthy disclaimers that my yelling did not mean that I believed Roko was right about CEV-based agents [= Eliezer’s early model of indirectly normative agents that reason with ideal aggregated preferences] torturing people who had heard about Roko's idea. [...] What I considered to be obvious common sense was that you did not spread potential information hazards because it would be a crappy thing to do to someone. The problem wasn't Roko's post itself, about CEV, being correct.

This, obviously, was a bad strategy on Eliezer's part. Looking at the options in hindsight: To the extent it seemed plausible that Roko's argument could be modified and repaired, Eliezer shouldn't have used Roko's post as a teaching moment and loudly chastised him on a public discussion thread. To the extent this didn't seem plausible (or ceased to seem plausible after a bit more analysis), continuing to ban the topic was a (demonstrably) ineffective way to communicate the general importance of handling real information hazards with care.

On that note, point number two:

  • Roko's argument wasn’t an attempt to get people to donate to Friendly AI (FAI) research. In fact, the opposite is true.

Roko's original argument was not 'the AI agent will torture you if you don't donate, therefore you should help build such an agent'; his argument was 'the AI agent will torture you if you don't donate, therefore we should avoid ever building such an agent.' As Gerard noted in the ensuing discussion thread, threats of torture "would motivate people to form a bloodthirsty pitchfork-wielding mob storming the gates of SIAI [= MIRI] rather than contribute more money." To which Roko replied: "Right, and I am on the side of the mob with pitchforks. I think it would be a good idea to change the current proposed FAI content from CEV to something that can't use negative incentives on x-risk reducers."

Roko saw his own argument as a strike against building the kind of software agent Eliezer had in mind. Other Less Wrong users, meanwhile, rejected Roko's argument both as a reason to oppose AI safety efforts and as a reason to support AI safety efforts.

Roko's argument was fairly dense, and it continued into the discussion thread. I’m guessing that this (in combination with the temptation to round off weird ideas to the nearest religious trope, plus misunderstanding #1 above) is why RationalWiki's version of Roko’s basilisk gets introduced as

a futurist version of Pascal’s wager; an argument used to try and suggest people should subscribe to particular singularitarian ideas, or even donate money to them, by weighing up the prospect of punishment versus reward.

If I'm correctly reconstructing the sequence of events: Sites like RationalWiki report in the passive voice that the basilisk is "an argument used" for this purpose, yet no examples ever get cited of someone actually using Roko’s argument in this way. Via citogenesis, the claim then gets incorporated into other sites' reporting.

(E.g., in Outer Places: "Roko is claiming that we should all be working to appease an omnipotent AI, even though we have no idea if it will ever exist, simply because the consequences of defying it would be so great." Or in Business Insider: "So, the moral of this story: You better help the robots make the world a better place, because if the robots find out you didn’t help make the world a better place, then they’re going to kill you for preventing them from making the world a better place.")

In terms of argument structure, the confusion is equating the conditional statement 'P implies Q' with the argument 'P; therefore Q.' Someone asserting the conditional isn’t necessarily arguing for Q; they may be arguing against P (based on the premise that Q is false), or they may be agnostic between those two possibilities. And misreporting about which argument was made (or who made it) is kind of a big deal in this case: 'Bob used a bad philosophy argument to try to extort money from people' is a much more serious charge than 'Bob owns a blog where someone once posted a bad philosophy argument.'


  • "Formally speaking, what is correct decision-making?" is an important open question in philosophy and computer science, and formalizing precommitment is an important part of that question.

Moving past Roko's argument itself, a number of discussions of this topic risk misrepresenting the debate's genre. Articles on Slate and RationalWiki strike an informal tone, and that tone can be useful for getting people thinking about interesting science/philosophy debates. On the other hand, if you're going to dismiss a question as unimportant or weird, it's important not to give the impression that working decision theorists are similarly dismissive.

What if your devastating take-down of string theory is intended for consumption by people who have never heard of 'string theory' before? Even if you're sure string theory is hogwash, then, you should be wary of giving the impression that the only people discussing string theory are the commenters on a recreational physics forum. Good reporting by non-professionals, whether or not they take an editorial stance on the topic, should make it obvious that there's academic disagreement about which approach to Newcomblike problems is the right one. The same holds for disagreement about topics like long-term AI risk or machine ethics.

If Roko's original post is of any pedagogical use, it's as an unsuccessful but imaginative stab at drawing out the diverging consequences of our current theories of rationality and goal-directed behavior. Good resources for these issues (both for discussion on Less Wrong and elsewhere) include:

The Roko's basilisk ban isn't in effect anymore, so you're welcome to direct people here (or to the Roko's basilisk wiki page, which also briefly introduces the relevant issues in decision theory) if they ask about it. Particularly low-quality discussions can still get deleted (or politely discouraged), though, at moderators' discretion. If anything here was unclear, you can ask more questions in the comments below.


131 comments, sorted by Highlighting new comments since Today at 10:48 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they'd invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error---keeping in mind that I had absolutely no idea that any of this would ever blow up the way it did, if I had I would obviously have kept my fingers quiescent---of not making it absolutely clear using lengthy disclaimers that my yelling did not mean that I believed Roko was right about CEV-based agents [= Eliezer’s early model of indirectly normative agents that reason with ideal aggregated preferences] torturing people who had heard about Roko's idea. [...] What I considered to be obvious common sense was that you did not spread potential information hazards because it would be a crappy thing to do to someone. The problem wasn't Roko's post itself, abou

... (read more)

"One might think that the possibility of CEV punishing people couldn't possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous."

This paragraph is not an Eliezer Yudkowsky quote; it's Eliezer quoting Roko. (The "ve" should be a tip-off.)

This is evidence that Yudkowsky believed, if not that Roko's argument was correct as it was, that at least it was plausible enough that could be developed in [sic] a correct argument, and he was genuinely scared by it.

If you kept going with your initial Eliezer quote, you'd have gotten to Eliezer himself saying he was worried a blackmail-type argument might work, though he didn't think Roko's original formulation worked:

"Again, I deleted that post not because I had decided that this thing probably presented a real hazard, but because I was afraid some unknown variant of it might, and because it seemed to me like the obvious General Procedure For Handling Things That Might Be Infohazards said you shouldn't post them to the Internet."

According to Eliezer, he ha... (read more)

There are lots of good reasons Eliezer shouldn't have banned Roko

IIRC, Eliezer didn't ban Roko, just discussion of the basilisk, and Roko deleted his account shortly afterwards.

2Rob Bensinger5yThanks, fixed!
0Houshalter5yAs I understand Roko's motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason. That is definitely worthy of public discussion. If he really believed in the basilisk, then it's rational for him to do everything in his power to stop such an AI from being built, and convince other people of the danger. My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade. An AI programmed with classical decision theory would have no issues. And most rejections of the basilisk I have read are basically "acausal trade seems wrong or weird", so they basically agree with Roko.
2Rob Bensinger5yRoko wasn't arguing against TDT. Roko's post was about acausal trade, but the conclusion he was trying to argue for was just 'utilitarian AI is evil because it causes suffering for the sake of the greater good'. But if that's your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus. On Roko's view, if no one finds out about basilisks, the basilisk can't blackmail anyone. So publicizing the idea doesn't make sense, unless Roko didn't take his own argument all that seriously. (Maybe Roko was trying to protect himself from personal blackmail risk at others' expense, but this seems odd if he also increased his own blackmail risk in the process.) Possibly Roko was thinking: 'If I don't prevent utilitarian AI from being built, it will cause a bunch of atrocities in general. But LessWrong users are used to dismissing anti-utilitarian arguments, so I need to think of one with extra shock value to get them to do some original seeing. This blackmail argument should work -- publishing it puts people at risk of blackmail, but it serves the greater good of protecting us from other evil utilitarian tradeoffs.' (... Irony unintended.) Still, if that's right, I'm inclined to think Roko should have tried to post other arguments against utilitarianism that don't (in his view) put anyone at risk of torture. I'm not aware of him having done that.
0Houshalter5yOk that makes a bit less sense to me. I didn't think it was against utilitarianism in general, which is much less controversial than TDT. But I can definitely still see his argument. When people talk about the trolley problem, they don't usually imagine that they might be the ones tied to the second track. The deeply unsettling thing about the basilisk isn't that the AI might torture people for the greater good. It's that you are the one who is going to be tortured. That a pretty compelling case against utilitarianism. Roko found out. It disturbed him greatly. So it absolutely made sense for him to try to stop the development of such an AI any way he could. By telling other people, he made it their problem too and converted them to his side.
1gjm5yIt doesn't appear to me to be a case against utilitarianism at all. "Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong" doesn't even have the right shape to be a valid argument. It's like "If there is no god then many bad people will prosper and not get punished, which would be awful, therefore there is a god." (Or, from the other side, "If there is a god then he may choose to punish me, which would be awful, therefore there is no god" -- which has a thing or two in common with the Roko basilisk, of course.) Perhaps he hoped to. I don't see any sign that he actually did.
-1Houshalter5yYou are strawmanning the argument significantly. I would word it more like this: "Building an AI that follows utilitarianism will lead to me getting tortured. I don't want to be tortured. Therefore I don't want such an AI to be built." That's partially because EY fought against it so hard and even silenced the discussion.
2gjm5ySo there are two significant differences between your version and mine. The first is that mine says "might" and yours says "will", but I'm pretty sure Roko wasn't by any means certain that that would happen. The second is that yours ends "I don't want such an AI to be built", which doesn't seem to me like the right ending for "a case against utilitarianism". (Unless you meant "a case against building a utilitarian AI" rather than "a case against utilitarianism as one's actual moral theory"?)
0Houshalter5yI should have mentioned that it's conditional on the Basilisk being correct. If we build an AI that follows that line of reasoning, then it will torture. If the basilisk isn't correct for unrelated reasons, then this whole line of reasoning is irrelevant. Anyway, the exact certainty isn't too important. You use the word "might", as if the probability of you being tortured was really small. Like the AI would only do it in really obscure scenarios. And you are just as likely to be picked for torture as anyone else. Roko believed that the probability was much higher, and therefore worth worrying about. Well the AI is just implementing the conclusions of utilitarianism (again, conditional on the basilisk argument being correct.) If you don't like those conclusions, and if you don't want AIs to be utilitarian, then do you really support utilitarianism? It's a minor semantic point though. The important part is the practical consequences for how we should build AI. Whether or not utilitarianism is "right" is more subjective and mostly irrelevant.
0gjm5yAll I know about what Roko believed about the probability is that (1) he used the word "might" just as I did and (2) he wrote "And even if you only think that the probability of this happening is 1%, ..." suggesting that (a) he himself probably thought it was higher and (b) he thought it was somewhat reasonable to estimate it at 1%. So I'm standing by my "might" and robustly deny your claim that writing "might" was strawmanning. If you're standing in front of me with a gun and telling me that you have done some calculations suggesting that on balance the world would be a happier place without me in it, then I would probably prefer you not to be utilitarian. This has essentially nothing to do with whether I think utilitarianism produces correct answers. (If I have a lot of faith in your reasoning and am sufficiently strong-minded then I might instead decide that you ought to shoot me. But my likely failure to do so merely indicates typical human self-interest.) Perhaps so, in which case calling the argument "a case against utilitarianism" is simply incorrect.
0Houshalter5yRoko's argument implies the AI will torture. The probability you think his argument is correct is a different matter. Roko was just saying that "if you think there is a 1% chance that my argument is correct", not "if my argument is correct, there is a 1% chance the AI will torture." This really isn't important though. The point is, if an AI has some likelihood of torturing you, you shouldn't want it to be built. You can call that self-interest, but that's admitting you don't really want utilitarianism to begin with. Which is the point. Anyway this is just steel-manning Roko's argument. I think the issue is with acausal trade, not utilitarianism. And that seems to be the issue most people have with it.
-3V_V5yI don't think we are in disagreement here. The basilisk could be a concern only if an AI that would carry out such type of blackmail was built. Once Roko discovered it, if he thought it was a plausible risk, then he had a selfish reason to prevent such AI from being built. But even if he was completely selfless, he could reason that somebody else could think of that argument, or something equivalent, and make it public, hence it was better sooner than later, allowing more time to prevent that design failure. Also I'm not sure what private channles you are referring to. It's not like there is a secret Google Group of all potential AGI designers, is there? Privately contacting Yudkowsky or SIAI/SI/MIRI wouldn't have worked. Why would Roko trust them to handle that information correctly? Why would he believe that they had leverage over or even knowledge about arbitrary AI projects that might end up building an AI with that particular failure mode? LessWrong was at that time the primary forum for discussing AI safety issues. There was no better place to raise that concern. It wasn't just that. It was an argument against utilitarianism AND a decision theory that allowed to consider "acausal" effects (e.g. any theory that one-boxes in Newcomb's problem). Since both utilitarianism and one-boxing were popular positions on LessWrong, it was reasonable to discuss their possible failure modes on LessWrong.
4Viliam5yJust to be sure, since you seem to disagree with this opinion (whether it is actually Yudkowsky's opinion or not), what exactly is it that you believe? a) There is absolutely no way one could be harmed by thinking about not-yet-existing dangerous entities; even if those entities in the future will be able to learn about the fact that the person was thinking about them in this specific way. b) There is a way one could be harmed by thinking about not-yet-existing dangerous entities, but the way to do this is completely different from what Roko proposed. If it happens to be (b), then it still makes sense to be angry about publicly opening the whole topic of "let's use our intelligence to discover the thoughts that may harm us by us thinking about them -- and let's do it in a public forum where people are interested in decision theories, so they are more qualified than average to find the right answer." Even if the proper way to harm oneself is different from what Roko proposed, making this a publicly debated topic increases the chance of someone finding the correct solution. The problem is not the proposed basilisk, but rather inviting people to compete in clever self-harm; especially the kind of people known for being hardly able to resist such invitation.
-2anon855yI'm not the person you replied to, but I mostly agree with (a) and reject (b). There's no way you can could possibly know enough about a not-yet-existing entity to understand any of its motivations; the entities that you're thinking about and the entities that will exist in the future are not even close to the same. I outlined some more thoughts here [http://lesswrong.com/r/discussion/lw/mge/a_few_misconceptions_surrounding_rokos_basilisk/cssh] .

My impression is that the person who was hideously upset by the basilisk wasn't autistic. He felt extremely strong emotions, and was inclined to a combination of anxiety and obsession.

I applaud your thorough and even-handed wiki entry. In particular, this comment:

"One take-away is that someone in possession of a serious information hazard should exercise caution in visibly censoring or suppressing it (cf. the Streisand effect)."

Censorship, particularly of the heavy-handed variety displayed in this case, has a lower probability of success in an environment like the Internet. Many people dislike being censored or witnessing censorship, the censored poster could post someplace else, and another person might conceive the same idea... (read more)

4Houshalter5yExamples of censorship failing are easy to see. But if censorship works, you will never hear about it. So how do we know censorship fails most of the time? Maybe it works 99% of the time, and this is just the rare 1% it doesn't. On reddit, comments are deleted silently. The user isn't informed their comment has been deleted, and if they go to it, it still shows up for them. Bans are handled the same way. This actually works fine. Most users don't notice it and so never complain about it. But when moderation is made more visible, all hell breaks loose. You get tons of angry PMs and stuff. Lesswrong is based on reddit's code. Presumably moderation here works the same way. If moderators had been removing all my comments about a certain subject, I would have no idea. And neither would anyone else. It's only when big things are removed that people notice. Like an entire post that lots of people had already seen.
0Lumifer5yI don't believe this can be true for active (and reasonably smart) users. If, suddenly, none of your comments gets any replies at all and you know about the existence of hellbans, well... Besides, they are trivially easy to discover by making another account. Anyone with sockpuppets would notice a hellban immediately.
4Houshalter5yI think you would be surprised at how effective shadow bans are. Most users just think their comments haven't gotten any replies by chance and eventually lose interest in the site. Or in some cases keep making comments for months. The only way to tell is to look at your user page signed out. And even that wouldn't work if they started to track cookies or ip instead of just the account you are signed in on. But shadow bans are a pretty extreme example of silent moderation. My point was that removing individual comments almost always goes unnoticed. /r/Technology had a bot that automatically removed all posts about Tesla for over a year before anyone noticed. Moderators set up all kinds of crazy regexes on posts and comments that keep unwanted topics away. And users have no idea whatsoever. The Streisand effect is false.
-1Lumifer5yIs there a way to demonstrate that? :-)
0philh5yThere's this reddit user who didn't realize ve was shadowbanned for three years: https://www.reddit.com/comments/351buo/tifu_by_posting_for_three_years_and_just_now/ [https://www.reddit.com/comments/351buo/tifu_by_posting_for_three_years_and_just_now/]
1Lumifer5yYeah, and there are women who don't realize they're pregnant until they start giving birth [https://en.wikipedia.org/wiki/Cryptic_pregnancy]. The tails are long and they don't tell you much about what's happening in the middle.
-1VoiceOfRa5yNote Houshalter said "most users".
3rayalez5yI'm new to the subject, so I'm sorry if the following is obvious or completely wrong, but the comment left by Eliezer [http://wiki.lesswrong.com/wiki/Roko%27s_basilisk#Topic_moderation_and_response] doesn't seem like something that would be written by a smart person who is trying to suppress information. I seriously doubt that EY didn't know about Streisand effect. However the comment does seem like something that would be written by a smart person who is trying to create a meme or promote his blog. In HPMOR characters give each other advice "to understand a plot, assume that what happened was the intended result, and look at who benefits." The idea of Roko's basilisk went viral and lesswrong.com got a lot of traffic from popular news sites(I'm assuming). I also don't think that there's anything wrong with it, I'm just sayin'.
7Ben Pace5yNo worries about being wrong. But I definitely think you're overestimating Eliezer, and humanity in general. Thinking that calling someone an idiot for doing something stupid, and then deleting their post, would cause a massive blow up of epic proportions, is sometng you can really only predict in hindsight.
6Rob Bensinger5yThe line [http://hpmor.com/chapter/47] goes "to fathom a strange plot, one technique was to look at what ended up happening, assume it was the intended result, and ask who benefited". But in the real world strange secret complicated Machiavellian plots are pretty rare, and successful strange secret complicated Machiavellian plots are even rarer. So I'd be wary of applying this rule to explain big once-off events outside of fiction. (Even to HPMoR's author!) I agree Eliezer didn't seem to be trying very hard to suppress information. I think that's probably just because he's a human, and humans get angry when they see other humans defecting from a (perceived) social norm, and anger plus time pressure causes hasty dumb decisions. I don't think this is super complicated. Though I hope he'd have acted differently if he thought the infohazard risk was really severe, as opposed to just not-vanishingly-small.
0MarsColony_in10years5yPerhaps this did generate some traffic, but LessWrong doesn't have adds. And any publicity this generated was bad publicity, since Roko's argument was far too weird to be taken seriously by almost anyone. It doesn't look like anyone benefited. Eliezer made an ass of himself. I would guess that he was rather rushed at the time.
2pico5yAt worst, it's a demonstration of how much influence LessWrong has relative to the size of its community. Many people who don't know this site exists know about Roko's basilisk now.
-1VoiceOfRa5yWell, there is the philosophy that "there's no such thing as bad publicity".

I think saying "Roko's arguments [...] weren't generally accepted by other Less Wrong users" is not giving the whole story. Yes, it is true that essentially nobody accepts Roko's arguments exactly as presented. But a lot of LW users at least thought something along these lines was plausible. Eliezer thought it was so plausible that he banned discussion of it (instead of saying "obviously, information hazards cannot exist in real life, so there is no danger discussing them").

In other words, while it is true that LWers didn't believe Roko... (read more)

6ChristianKl5yIf you are a programmer and think your code is safe because you see no way things could go wrong, it's still not good to believe that it isn't plausible that there's a security hole in your code. You rather practice defense in depth and plan for the possibility that things can go wrong somewhere in your code, so you add safety precautions. Even when there isn't what courts call reasonable doubt a good safety engineer still adds additional safety procautions in security critical code. Eliezer deals with FAI safety. As a result it's good for him to have mindset of really caring about safety. German nuclear power station have trainings for their desk workers to teach the desk workers to not cut themselves with paper. That alone seems strange to outsiders but everyone in Germany thinks that it's very important for nuclear power stations to foster a culture of safety even when that means something going overboard.
2Rob Bensinger5yCf. AI Risk and the Security Mindset [https://intelligence.org/2013/07/31/ai-risk-and-the-security-mindset/].
-1anon855yLet's go with this analogy. The good thing to do is ask a variety of experts for safety evaluations, run the code through a wide variety of tests, etc. The think NOT to do is keep the code a secret while looking for mistakes all by yourself. If you keep your code out of the public domain, it is more likely to have security issues, since it was not scrutinized by the public. Banning discussion is almost never correct, and it's certainly not a good habit.
0ChristianKl5yNo, if you don't want to use code you don't give the code to a variety of experts for safety evaluations but you simply don't run the code. Having a public discussion is like running the code untested on a mission critical system. What utility do you think is gained by discussing the basilisk? Strawman. This forum is not a place where things get habitually banned.
0anon855yAn interesting discussion that leads to better understanding of decision theories? Like, the same utility as is gained by any other discussion on LW, pretty much. Sure, but you're the one that was going on about the importance of the mindset and culture; since you brought it up in the context of banning discussion, it sounded like you were saying that such censorship was part of a mindset/culture that you approve of.
0ChristianKl5yNot every discussion on LW has the same utility. You engage in a pattern of simplifying the subject and then complaining that your flawed understanding doesn't make sense. LW doesn't have a culture with habitual banning discussion. Claiming that it has it is wrong. I'm claiming that particular actions of Eliezer come out of being concerned about safety. I don't claim that Eliezer engages in habitual banning on LW because of those concerns. It's a complete strawman that you are making up.
0anon855yJust FYI, if you want a productive discussion you should hold back on accusing your opponents of fallacies. Ironically, since I never claimed that you claimed Eliezer engages in habitual banning on LW, your accusation that I made a strawman argument is itself a strawman argument. Anyway, we're not getting anywhere, so let's disengage.
3Rob Bensinger5yThe wiki article [http://wiki.lesswrong.com/wiki/Roko%27s_basilisk] talks more about this; I don't think I can give the whole story in a short, accessible way. It's true that LessWrongers endorse ideas like AI catastrophe, Hofstadter's superrationality, one-boxing in Newcomb's problem, and various ideas in the neighborhood of utilitarianism; and those ideas are weird and controversial; and some criticism of Roko's basilisk are proxies for a criticism of one of those views. But in most cases it's a proxy for a criticism like 'LW users are panicky about weird obscure ideas in decision theory' (as in Auerbach's piece), 'LWers buy into Pascal's Wager', or 'LWers use Roko's Basilisk to scare up donations/support'. So, yes, I think people's real criticisms aren't the same as their surface criticisms; but the real criticisms are at least as bad as the surface criticism, even from the perspective of someone who thinks LW users are wrong about AI, decision theory, meta-ethics, etc. For example, someone who thinks LWers are overly panicky about AI and overly fixated on decision theory should still reject Auerbach's assumption that LWers are irrationally panicky about Newcomb's Problem or acausal blackmail; the one doesn't follow from the other.
-2anon855yI'm not sure what your point is here. Would you mind re-phrasing? (I'm pretty sure I understand the history of Roko's Basilisk, so your explanation can start with that assumption.) My point was that LWers are irrationally panicky about acausal blackmail: they think Basilisks are plausible enough that they ban all discussion of them! (Not all LWers, of course.)
2Rob Bensinger5yIf you're saying 'LessWrongers think there's a serious risk they'll be acausally blackmailed by a rogue AI', then that seems to be false. That even seems to be false in Eliezer's case, and Eliezer definitely isn't 'LessWrong'. If you're saying 'LessWrongers think acausal trade in general is possible,' then that seems true but I don't see why that's ridiculous. Is there something about acausal trade in general that you're objecting to, beyond the specific problems with Roko's argument?
-1anon855yIt seems we disagree on this factual issue. Eliezer does think there is a risk of acausal blackmail, or else he wouldn't have banned discussion of it.
4Rob Bensinger5ySorry, I'll be more concrete; "there's a serious risk" is really vague wording. What would surprise me greatly is if I heard that Eliezer assigned even a 5% probability to there being a realistic quick fix to Roko's argument that makes it work on humans. I think a larger reason for the ban [http://lesswrong.com/r/discussion/lw/mge/a_few_misconceptions_surrounding_rokos_basilisk/csuy] was just that Eliezer was angry with Roko for trying to spread what Roko thought was an information hazard, and angry people lash out (even when it doesn't make a ton of strategic sense).
0anon855yProbably not a quick fix, but I would definitely say Eliezer gives significant chances (say, 10%) to there being some viable version of the Basilisk, which is why he actively avoids thinking about it. If Eliezer was just angry at Roko, he would have yelled or banned Roko; instead, he banned all discussion of the subject. That doesn't even make sense as a "slashing out" reaction against Roko.
1Rob Bensinger5yIt sounds like you have a different model of Eliezer (and of how well-targeted 'lashing out' usually is) than I do. But, like I said to V_V above: The point I was making wasn't that (2) had zero influence. It was that (2) probably had less influence than (3), and its influence was probably of the 'small probability of large costs' variety.
0anon855yI don't know enough about this to tell if (2) had more influence than (3) initially. I'm glad you agree that (2) had some influence, at least. That was the main part of my point. How long did discussion of the Basilisk stay banned? Wasn't it many years? How do you explain that, unless the influence of (2) was significant?
3hairyfigment5yI believe he thinks that sufficiently clever idiots competing to shoot off their own feet will find some way to do so.
0anon855yIt seems unlikely that they would, if their gun is some philosophical decision theory stuff about blackmail from their future. I don't expect that gun to ever fire, no matter how many times you click the trigger.
0hairyfigment5yThat is not what I said, and I'm also guessing you did not have a grandfather who taught you you gun safety.
-2V_V5yIs it? Assume that: a) There will be a future AI so powerful to torture people, even posthumously (I think this is quite speculative, but let's assume it for the sake of the argument). b) This AI will be have a value system based on some form of utilitarian ethics. c) This AI will use an "acausal" decision theory (one that one-boxes in Newcomb's problem). Under these premises it seems to me that Roko's argument is fundamentally correct. As far as I can tell, belief in these premises was not only common in LessWrong at that time, but it was essentially the officially endorsed position of Eliezer Yudkowsky and SIAI. Therefore, we can deduce that EY should have believed that Roko's argument was correct. But EY claims that he didn't believe that Roko's argument was correct. So the question is: is EY lying? His behavior was certainly consistent with him believing Roko's argument. If he wanted to prevent the diffusion of that argument, then even lying about its correctness seems consistent. So, is he lying? If he is not lying, then why didn't he believe Roko's argument? As far as I know, he never provided a refutation.
2Rob Bensinger5yThis was addressed on the LessWrongWiki page [http://wiki.lesswrong.com/wiki/Roko%27s_basilisk]; I didn't copy the full article here. A few reasons Roko's argument doesn't work: * 1 - Logical decision theories are supposed to one-box on Newcomb's problem because it's globally optimal even though it's not optimal with respect to causally downstream events. A decision theory based on this idea could follow through on blackmail threats even when doing so isn't causally optimal, which appears to put past agents at risk of coercion by future agents. But such a decision theory also prescribes 'don't be the kind of agent that enters into trades that aren't globally optimal, even if the trade is optimal with respect to causally downstream events'. In other words, if you can bind yourself to precommitments to follow through on acausal blackmail, then it should also be possible to bind yourself to precommitments to ignore threats of blackmail. The 'should' here is normative: there are probably some decision theories that let agents acausally blackmail each other, but others that perform well in Newcomb's problem and the smoking lesion problem but can't acausally blackmail each other; it hasn't been formally demonstrated which theories fall into which category. * 2 - Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty. This means that we can't be blackmailed in practice. * 3 - A stronger version of 2 is that rational agents actually have an incentive to harshly punish attempts at blackmail in order to discourage it. So threatening blackmail can actually decrease an agent
-1VoiceOfRa5yUm, your conclusion "since we're aware of this, we know any threat of blackmail would be empty" contradicts your premise that the AI by virtue of being super-intelligent is capable of fooling people into thinking it'll torture them.
2Rob Bensinger5yOne way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas; but since we know it has this property and we know it prefers (D,C) over (C,C), we know it will defect. This is consistent because we're assuming the actual AI is powerful enough to trick people once it exists; this doesn't require the assumption that my low-fidelity mental model of the AI is powerful enough to trick me in the real world. For acausal blackmail to work, the blackmailer needs a mechanism for convincing the blackmailee that it will follow through on its threat. 'I'm a TDT agent' isn't a sufficient mechanism, because a TDT agent's favorite option is still to trick other agents into cooperating in Prisoner's Dilemmas while they defect.
0VoiceOfRa5yExcept it needs to convince the people who are around before it exists.
-4V_V5y* 1 - Humans can't reliably precommit. Even if they could, precommittment is different than using an "acausal" decision theory. You don't need precommitment to one-box in Newcomb's problem, and the ability to precommit doesn't guarantee by itself that you will one-box. In an adversarial game where the players can precommit and use a causal version of game theory, the one that can precommit first generally wins. E.g. Alice can precommit to ignore Bob's threats, but she has no incentive to do so if Bob already precommitted to ignore Alice's precommitments, and so on. If you allow for "acausal" reasoning, then even having a time advantage doesn't work: if Bob isn't born yet, but Alice predicts that she will be in an adversarial game with Bob and Bob will reason acausally and therefore he will have an incentive to threaten her and ignore her precommitments, then she has an incentive not to make such precommitment. * 2 - This implies that the future AI uses a decision theory that two-boxes in Newcomb's problem, contradicting the premise that it one-boxes. * 3 - This implies that the future AI will have a deontological rule that says "Don't blackmail" somehow hard-coded in it, contradicting the premise that it will be an utilitarian. Indeed, humans may want to build an AI with such constants, but in order to do so they will have to consider the possibility of blackmail and likely reject utilitarianism, which was the point of Roko's argument. * 4 - Shut up and multiply [http://wiki.lesswrong.com/wiki/Shut_up_and_multiply].
2Rob Bensinger5yHumans don't follow any decision theory consistently. They sometimes give in to blackmail, and at other times resist blackmail. If you convinced a bunch of people to take acausal blackmail seriously, presumably some subset would give in and some subset would resist, since that's what we see in ordinary blackmail situations. What would be interesting is if (a) there were some applicable reasoning norm that forced us to give in to acausal blackmail on pain of irrationality, or (b) there were some known human irrationality that made us inevitably susceptible to acausal blackmail. But I don't think Roko gave a good argument for either of those claims. From my last comment: "there are probably some decision theories that let agents acausally blackmail each other". But if humans frequently make use of heuristics like 'punish blackmailers' and 'never give in to blackmailers', and if normative decision theory says they're right to do so, there's less practical import to 'blackmailable agents are possible'. No it doesn't. If you model Newcomb's problem as a Prisoner's Dilemma, then one-boxing maps on to cooperating and two-boxing maps on to defecting. For Omega, cooperating means 'I put money in both boxes' and defecting means 'I put money in just one box'. TDT recognizes that the only two options are mutual cooperation or mutual defection, so TDT cooperates. Blackmail works analogously. Perhaps the blackmailer has five demands. For the blackmailee, full cooperation means 'giving in to all five demands'; full defection means 'rejecting all five demands'; and there are also intermediary levels (e.g., giving in to two demands while rejecting the other three), with the blackmailee prefer to do as little as possible. For the blackmailer, full cooperation means 'expending resources to punish the blackmailee in proportion to how many of my demands were met'. Full defection means 'expending no resources to punish the blackmailee even if some demands aren't met'. In other words,
1Jiro5y"I precommit to shop at the store with the lowest price within some large distance, even if the cost of the gas and car depreciation to get to a farther store is greater than the savings I get from its lower price. If I do that, stores will have to compete with distant stores based on price, and thus it is more likely that nearby stores will have lower prices. However, this precommitment would only work if I am actually willing to go to the farther store when it has the lowest price even if I lose money". Miraculously, people do reliably act this way.
-1V_V5yI doubt it. Reference?
1CronoDAS5yMostly because they don't actually notice the cost of gas and car depreciation at the time...
0Jiro5yYou've described the mechanism by which the precommitment happened, not actually disputed whether it happens. Many "irrational" actions by human beings can be analyzed as precommitment; for instance, wanting to take revenge on people who have hurt you even if the revenge doesn't get you anything.
1ChristianKl5yLying is consistent with a lot of behavior. The fact that it is, is no basis to accuse people of lying.
-3V_V5yI'm not accusing, I'm asking the question. My point is that to my knowledge, given the evidence that I have about his beliefs at that time, and his actions, and assuming that I'm not misunderstanding them or Roko's argument, then it seems that there is a significant probability that EY lied about not beliving that Roko's argument was correct.
0RichardKennaway5yWhy would they be correct? The basilisk is plausible.
1anon855yIf a philosophical framework causes you to accept a basilisk, I view that as grounds for rejecting the framework, not for accepting the basilisk. The basilisk therefore poses no danger at all to me: if someone presented me with a valid version, it would merely cause me to reconsider my decision theory or something. As a consequence, I'm in favor of discussing basilisks as much as possible (the opposite of EY's philosophy). One of my main problems with LWers is that they swallow too many bullets. Sometimes bullets should be dodged. Sometimes you should apply modus tollens and not modus ponens. The basilisk is so a priori implausible that you should be extremely suspicious of fancy arguments claiming to prove it. To state it yet another way: to me, the basilisk has the same status as an ontological argument for God. Even if I can't find the flaw in the argument, I'm confident in rejecting it anyway.
4RichardKennaway5ySo are: God, superintelligent AI, universal priors, radical life extension, and any really big idea whatever; as well as the impossibility of each of these. Plausibility is fine as a screening process for deciding where you're going to devote your efforts, but terrible as an epistemological tool.
-1anon855ySomehow, blackmail from the future seems less plausible to me than every single one of your examples. Not sure why exactly.
4RichardKennaway5yHow plausible do you find TDT and related decision theories as normative accounts of decision making, or at least as work towards such accounts? They open whole new realms of situations like Pascal's Mugging, of which Roko's Basilisk is one. If you're going to think in detail about such decision theories, and adopt one as normative, you need to have an answer to these situations. Once you've decided to study something seriously, the plausibility heuristic is no longer available.
1anon855yI find TDT to be basically bullshit except possibly when it is applied to entities which literally see each others' code, in which case I'm not sure (I'm not even sure if the concept of "decision" even makes sense in that case). I'd go so far as to say that anyone who advocates cooperating in a one-shot prisoners' dilemma simply doesn't understand the setting. By definition, defecting gives you a better outcome than cooperating. Anyone who claims otherwise is changing the definition of the prisoners' dilemma.
0mwengler5yI think this is correct. I think the reason to cooperate is not to get the best personal outcome, but because you care about the other person. I think we have evolved to cooperate, or perhaps that should be stated as we have evolved to want to cooperate. We have evolved to value cooperating. Our values come from our genes and our memes, and both are subject to evolution, to natural selection. But we want to cooperate. So if I am in a prisoner's dilemma against another human, if I perceive that other human as "one of us," I will choose cooperation. Essentially, I care about their outcome. But in a one-shot PD defecting is the "better" strategy. The problem is that with genetic and/or memetic evolution of cooperation, we are not playing in a one-shot PD. We are playing with a set of values that developed over many shots. Of course we don't always cooperate. But when we do cooperate in one-shot PD's, it is because, in some sense, there are so darn many one-shot PD's, especially in the universe of hypotheticals, that we effectively know there is no such thing as a one-shot PD. This should not be too hard to accept around here where people semi-routinely accept simulations of themselves or clones of themselves as somehow just as important as their actual selves. I.e. we don't even accept the "one-shottedness" of ourselves.
0anon855yI just want to make it clear that by saying this, you're changing the setting of the prisoners' dilemma, so you shouldn't even call it a prisoners' dilemma anymore. The prisoners' dilemma is defined so that you get more utility by defecting; if you say you care about your opponent's utility enough to cooperate, it means you don't get more utility by defecting, since cooperation gives you utility. Therefore, all you're saying is that you can never be in a true prisoners' dilemma game; you're NOT saying that in a true PD, it's correct to cooperate (again, by definition, it isn't). The most likely reason people are evolutionarily predisposed to cooperate in real-life PDs is that almost all real-life PDs are repeated games and not one-shot. Repeated prisoners' dilemmas are completely different beasts, and it can definitely be correct to cooperate in them.
0Rob Bensinger5yIf you have 100% identical consequentialist values to all other humans, then that means 'cooperation' and 'defection' are both impossible for humans (because they can't be put in PDs). Yet it will still be correct to defect (given that your decision and the other player's decision don't strongly depend on each other) if you ever run into an agent that doesn't share all your values. See The True Prisoner's Dilemma [http://lesswrong.com/lw/tn/the_true_prisoners_dilemma/] . This shows that the iterated dilemma and the dilemma-with-common-knowledge-of-rationality allow cooperation (i.e., giving up on your goal to enable someone else to achieve a goal you genuinely don't want them to achieve), whereas loving compassion and shared values merely change goal-content. To properly visualize the PD, you need an actual value conflict -- e.g., imagine you're playing against a serial killer in a hostage negotiation. 'Cooperating' is just an English-language label; the important thing is the game-theoretic structure, which allows that sometimes 'cooperating' looks like letting people die in order to appease a killer's antisocial goals.
0bogus5yTrue, but the flip side of this is that efficiency (in Coasian terms) is precisely defined as pursuing 100% identical consequentialist values, where the shared "values" are determined by a weighted sum of each agent's utility function (and the weights are typically determined by agent endowments).
0Vaniver5yI think belief conflicts might work, even if the same values are shared. Suppose you and I are at a control panel for three remotely wired bombs in population centers. Both of us want as many people to live as possible. One bomb will go off in ten seconds unless we disarm it, but the others will stay inert unless activated. I believe that pressing the green button causes all bombs to explode, and pressing the red button defuses the time bomb. You believe the same thing, but with the colors reversed. Both of us would rather that no buttons be pressed than both buttons be pressed, but each of us would prefer that just the defuse button be pressed, and that the other person not mistakenly kill all three groups. (Here, attempting to defuse is 'defecting' and not attempting to defuse is 'cooperating'.) [Edit]: As written, in terms of lives saved, this doesn't have the property that (D,D)>(C,D); if I press my button, you are indifferent between pressing your button or not. So it's not true that D strictly dominates C, but the important part of the structure is preserved, and a minor change could make it so D strictly dominates C.
0bogus5yYou can solve belief conflicts simply by trading in a prediction market with decision-contingent contracts (a "decision market"). Value conflicts are more general than that.
0Vaniver5yI think this is misusing the word "general." Value conflicts are more narrow than the full class of games that have the PD preference ordering. I do agree that value conflicts are harder to resolve than belief conflicts, but that doesn't make them more general.
0Rob Bensinger5yDefecting gives you a better outcome than cooperating if your decision is uncorrelated with the other players'. Different humans' decisions aren't 100% correlated, but they also aren't 0% correlated, so the rationality of cooperating in the one-shot PD varies situationally for humans. Part of the reason why humans often cooperate in PD-like scenarios in the real world is probably that there's uncertainty about how iterated the PD is (and our environment of evolutionary adaptedness had a lot more iterated encounters than once-off encounters). But part of the reason for cooperation is probably also that we've evolved to do a very weak and probabilistic version of 'source code sharing': we've evolved to (sometimes) involuntarily display veridical evidence of our emotions, personality, etc. -- as opposed to being in complete control of the information we give others about our dispositions. Because they're at least partly involuntary and at least partly veridical, 'tells' give humans a way to trust each other even when there are no bad consequences to betrayal -- which means at least some people can trust each other at least some of the time to uphold contracts in the absence of external enforcement mechanisms. See also Newcomblike Problems Are The Norm [http://lesswrong.com/lw/l1b/newcomblike_problems_are_the_norm/].
0anon855yYou're confusing correlation with causation. Different players' decision may be correlated, but they sure as hell aren't causative of each other (unless they literally see each others' code, maybe). Calling this source code sharing, instead of just "signaling for the purposes of a repeated game", seems counter-productive. Yes, I agree that in a repeated game, the situation is trickier and involves a lot of signaling. The one-shot game is much easier: just always defect. By definition, that's the best strategy.
2Houshalter5yImagine you are playing against a clone of yourself. Whatever you do, the clone will do the exact same thing. If you choose to cooperate, he will choose to cooperate. If you choose to defect, he chooses to defect. The best choice is obviously to cooperate. So there are situations where cooperating is optimal. Despite there not being any causal influence between the players at all. I think these kinds of situations are so exceedingly rare and unlikely they aren't worth worrying about. For all practical purposes, the standard game theory logic is fine. But it's interesting that they exist. And some people are so interested by that, that they've tried to formalize decision theories that can handle these situations. And from there you can possibly get counter-intuitive results like the basilisk.
0anon855yIf I'm playing my clone, it's not clear that even saying that I'm making a choice is well-defined. After all, my choice will be what my code dictates it will be. Do I prefer that my code cause me to accept? Sure, but only because we stipulated that the other player shares the exact same code; it's more accurate to say that I prefer my opponent's code to cause him to defect, and it just so happens that his code is the same as mine. In real life, my code is not the same as my opponent's, and when I contemplate a decision, I'm only thinking about what I want my code to say. Nothing I do changes what my opponent does; therefore, defecting is correct. Let me restate once more: the only time I'd ever want to cooperate in a one-shot prisoners' dilemma was if I thought my decision could affect my opponent's decision. If the latter is the case, though, then I'm not sure if the game was even a prisoners' dilemma to begin with; instead it's some weird variant where the players don't have the ability to independently make decisions.
2Houshalter5yI think you are making this more complicated than it needs to be. You don't need to worry about your code. All you need to know that it's an exact copy of you playing. And that he will make the same decision you do. No matter how hard you think about your "code" or wish he would make a different choice, he will just do the same thing about you. In real games with real humans, yes, usually. As I said, I don't think these cases are common enough to worry about. But I'm just saying they exist. But it is more general than just clones. If you know your opponent isn't exactly the same as you, but still follows the same decision algorithm in this case, the principle is still valid. If you cooperate, he will cooperate. Because you are both following the same process to come to a decision. Well there is no causal influence. Your opponent is deterministic. His choice may have already been made and nothing you do will change it. And yet the best decision is still to cooperate.
0anon855yIf his choice is already made and nothing I do will change it, then by definition my choice is already made and nothing I do will change it. That's why my "decision" in this setting is not even well-defined - I don't really have free will if external agents already know what I will do.
2Houshalter5yYes. The universe is deterministic. Your actions are completely predictable, in principle. That's not unique to this thought experiment. That's true for every thing you do. You still have to make a choice. Cooperate or defect?
-1anon855yUm, what? First of all, the universe is not deterministic - quantum mechanics means there's inherent randomness. Secondly, as far as we know, it's consistent with the laws of physics that my actions are fundamentally unpredictable - see here [http://arxiv.org/abs/1306.0159]. Third, if I'm playing against a clone of myself, I don't think it's even a valid PD. Can the utility functions ever differ between me and my clone? Whenever my clone gets utility, I get utility, because there's no physical way to distinguish between us (I have no way of saying which copy "I" am). But if we always have the exact same utility - if his happiness equals my happiness - then constructing a PD game is impossible. Finally, even if I agree to cooperate against my clone, I claim this says nothing about cooperating versus other people. Against all agents that don't have access to my code, the correct strategy in a one-shot PD is to defect, but first do/say whatever causes my opponent to cooperate. For example, if I was playing against LWers, I might first rant on about TDT or whatever, agree with my opponent's philosophy as much as possible, etc., etc., and then defect in the actual game. (Note again that this only applies to one-shot games).
0entirelyuseless5yEven if you're playing against a clone, you can distinguish the copies by where they are in space and so on. You can see which side of the room you are on, so you know which one you are. That means one of you can get utility without the other one getting it. People don't actually have the same code, but they have similar code. If the code in some case is similar enough that you can't personally tell the difference, you should follow the same rule as when you are playing against a clone.
0anon855yIf I can do this, then my clone and I can do different things. In that case, I can't be guaranteed that if I cooperate, my clone will too (because my decision might have depended on which side of the room I'm on). But I agree that the cloning situation is strange, and that I might cooperate if I'm actually faced with it (though I'm quite sure that I never will). How do you know if people have "similar" code to you? See, I'm anonymous on this forum, but in real life, I might pretend to believe in TDT and pretend to have code that's "similar" to people around me (whatever that means - code similarity is not well-defined). So you might know me in real life. If so, presumably you'd cooperate if we played a PD, because you'd believe our code is similar. But I will defect (if it's a one-time game). My strategy seems strictly superior to yours - I always get more utility in one-shot PDs.
0entirelyuseless5yI would cooperate with you if I couldn't distinguish my code from yours, even if there might be minor differences, even in a one-shot case, because the best guess I would have of what you would do is that you would do the same thing that I do. But since you're making it clear that your code is quite different, and in a particular way, I would defect against you.
0anon855yYou don't know who I am! I'm anonymous! Whoever you'd cooperate with, I might be that person (remember, in real life I pretend to have a completely different philosophy on this matter). Unless you defect against ALL HUMANS, you risk cooperating when facing me, since you don't know what my disguise will be.
0entirelyuseless5yI will take that chance into account. Fortunately it is a low one and should hardly be a reason to defect against all humans.
-1anon855yCool, so in conclusion, if we met in real life and played a one-shot PD, you'd (probably) cooperate and I'd defect. My strategy seems superior.
1gjm5yAnd yet I somehow find myself more inclined to engage in PD-like interactions with entirelyuseless than with your good self.
0anon855yOh, yes, me too. I want to engage in one-shot PD games with entirelyuseless (as opposed to other people), because he or she will give me free utility if I sell myself right. I wouldn't want to play one-shot PDs against myself, in the same way that I wouldn't want to play chess against Kasparov. By the way, note that I usually cooperate in repeated PD games, and most real-life PDs are repeated games. In addition, my utility function takes other people into consideration; I would not screw people over for small personal gains, because I care about their happiness. In other words, defecting in one-shot PDs is entirely consistent with being a decent human being.
2Rob Bensinger5yCausation isn't necessary. You're right that correlation isn't quite sufficient, though! What's needed for rational cooperation in the prisoner's dilemma is a two-way dependency between A and B's decision-making. That can be because A is causally impacting B, or because B is causally impacting B; but it can also occur when there's a common cause and neither is causing the other, like when my sister and I have similar genomes even though my sister didn't create my genome and I didn't create her genome. Or our decision-making processes can depend on each other because we inhabit the same laws of physics, or because we're both bound by the same logical/mathematical laws -- even if we're on opposite sides of the universe. (Dependence can also happen by coincidence, though if it's completely random I'm not sure how'd you find out about it in order to act upon it!) The most obvious example of cooperating due to acausal dependence is making two atom-by-atom-identical copies of an agent and put them in a one-shot prisoner's dilemma against each other. But two agents whose decision-making is 90% similar instead of 100% identical can cooperate on those grounds too, provided the utility of mutual cooperation is sufficiently large. For the same reason, a very large utility difference can rationally mandate cooperation even if cooperating only changes the probability of the other agent's behavior from '100% probability of defection' to '99% probability of defection'. I disagree! "Code-sharing" risks confusing someone into thinking there's something magical and privileged about looking at source code. It's true this is an unusually rich and direct source of information (assuming you understand the code's implications and are sure what you're seeing is the real deal), but the difference between that and inferring someone's embarrassment from a blush is quantitative, not qualitative. Some sources of information are more reliable and more revealing than others; but the same un
0anon855yI'm not sure what "90% similar" means. Either I'm capable of making decisions independently from my opponent, or else I'm not. In real life, I am capable of doing so. The clone situation is strange, I admit, but in that case I'm not sure to what extent my "decision" even makes sense as a concept; I'll clearly decide whatever my code says I'll decide. As soon as you start assuming copies of my code being out there, I stop being comfortable with assigning me free will at all. Anyway, none of this applies to real life, not even approximately. In real life, my decision cannot change your decision at all; in real life, nothing can even come close to predicting a decision I make in advance (assuming I put even a little bit of effort into that decision). If you're concerned about blushing etc., then you're just saying the best strategy in a prisoner's dilemma involves signaling very strongly that you're trustworthy. I agree that this is correct against most human opponents. But surely you agree that if I can control my microexpressions, it's best to signal "I will cooperate" while actually defecting, right? Let me just ask you the following yes or no question: do you agree that my "always defect, but first pretend to be whatever will convince my opponent to cooperate" strategy beats all other strategies for a realistic one-shot prisoners' dilemma? By one-shot, I mean that people will not have any memory of me defecting against them, so I can suffer no ill effects from retaliation.
1RichardKennaway5y... Despite the other things I've said here, that is my attitude as well. But I recognise that when I take that attitude, I am not solving the problem, only ignoring it. It may be perfectly sensible to ignore a problem, even a serious one (comparative advantage [https://en.wikipedia.org/wiki/Comparative_advantage] etc.). But dissolving a paradox is not achieved by clinging to one of the conflicting thoughts and ignoring the others. (Bullet-swallowing seems to consist of seizing onto the most novel one.) Eliminating the paradox requires showing where and how the thoughts went wrong.
2anon855yI agree that resolving paradoxes is an important intellectual exercise, and that I wouldn't be satisfied with simply ignoring an ontological argument (I'd want to find the flaw). But the best way to find such flaws is to discuss the ideas with others. At no point should one assign such a high probability to ideas like Roko's basilisk being actually sound that one refuses to discuss them with others.
0[anonymous]5y... Despite the other things I've said here, that is my attitude as well. But I recognise that when I take that attitude, I am not solving the problem, only ignoring it. It may be perfectly sensible to ignore a problem, even a serious one (comparative advantage [https://en.wikipedia.org/wiki/Comparative_advantage] etc.). But dissolving a paradox is not achieved merely by clinging to one of the conflicting thoughts and ignoring the others. (Bullet-swallowing seems to consist of seizing onto the most implausible one.) Eliminating the paradox requires showing where and how the thoughts went wrong.
0ChristianKl5yFinding an idea plausible has little to do with being extremely suspicious of fancy arguments claiming to prove it. Idea that aren't proven to be impossible are plausible even when there are no convincing arguments in favor of them.
1anon855yIdeas that aren't proven to be impossible are possible. They don't have to be plausible.

At the end of the day, I hope this will have been a cowpox situation and lead people to be better informed at avoiding actual dangerous information hazard situations in the future.

I seem to remember reading a FAQ for "what to do if you think you have an idea that may be dangerous" in the past. If you know what I'm talking about, maybe link it at the end of the article?

3pico5yI think genuinely dangerous ideas are hard to come by though. They have to be original enough that few people have considered them before, and at the same time have powerful consequences. Ideas like that usually don't pop into the heads of random, uninformed strangers.
4RichardKennaway5yDaniel Dennett wrote a book called "Darwin's Dangerous Idea", and when people aren't trying to play down the basilisk (i.e. almost everywhere), people often pride themselves on thinking dangerous thoughts. It's a staple theme of the NRxers and the manosphere. Claiming to be dangerous provides a comfortable universal argument against opponents. I think there are, in fact, a good many dangerous ideas, not merely ideas claimed to be so by posturers. Off the top of my head: * Islamic fundamentalism (see IS/ISIS/ISIL). * The mental is physical. * God. * There is no supernatural. * Utilitarianism. * Superintelligent AI. * How to make nuclear weapons. * Atoms. [http://lesswrong.com/lw/ld1/open_thread_dec_8_dec_15_2014/bpr4] They do, all the time, by contagion from the few who come up with them, especially in the Internet age.
4HungryHobo5yThere are some things which could be highly dangerous which are protected almost purely by thick layers of tedium. Want to make nerve gas? well if you can wade through a thick pile of biochemistry textbooks the information isn't kept all that secret. Want to create horribly deadly viruses? ditto. The more I learned about physics, chemistry and biology the more I've become certain that the main reason that major cities have living populations is that most of the people with really deep understanding don't actually want to watch the world burn. You often find that extremely knowledgeable people don't exactly hide knowledge but do put it on page 425 of volume 3 of their textbook, written in language which you need to have read the rest to understand. Which protects it effectively from 99.99% of the people who might use it to intentionally harm others.
0NancyLebovitz5yArgument against: back when cities were more flamable, people didn't set them on fire for the hell of it. On the other hand, it's a lot easier to use a timer and survive these days, should you happen to not be suicidal. "I want to see the world burn" is a great line of dialogue, but I'm not convinced it's a real human motivation. Um, except that when I was a kid, I remember wishing that this world was a dream, and I'd wake up. Does that count? Second thought-- when I was a kid, I didn't have a method in mind. What if I do serious work with lucid dreaming techniques when I'm awake? I don't think the odds of waking up into being a greater intelligence are terribly good, nor is there a guarantee that my live would be better. On the other hand, would you hallucinations be interested in begging me to not try it?
-1itaibn05yBased on personal experience, if you're dreaming I don't recommend trying to wake yourself up. Instead, enjoy your dream until you're ready to wake up naturally. That way you'll have far better sleep.
-1CAE_Jones5yBased on personal experience, I would have agreed with you, right up until last year, when I found myself in the rather terrifying position of being mentally aroused by a huge crash in my house, but unable to wake up all the way for several seconds afterward, during which my sleeping mind refused to reject the "something just blew a hole in the building we're under attack!" hypothesis. (It was an overfilled bag falling off the wall.) But absent actual difficulty waking for potential emergencies, sure; hang out in Tel'aran'rhiod [http://wot.wikia.com/wiki/Tel'aran'rhiod] until you get bored.
3pico5ySorry, should have defined dangerous ideas better - I only meant information that would cause a rational person to drastically alter their behavior, and which would be much worse for society as a whole when everyone is told at once about it.
3OrphanWilde5yDepends on your definition of "Dangerous." I've come across quite a few ideas that tend to do -severe- damage to the happiness of at least a subset of those aware of them. Some of them are about the universe; things like entropy. Others are social ideas, which I won't give an example of.
-1Bryan-san5yI hope they're as hard to come by as you think they are. Alternatively, Roko could be part of the 1% of people who think of a dangerous idea (assuming his basilisk is dangerous) and spread it on the internet without second guessing themselves. Are there 99 other people who thought of dangerous ideas and chose not to spread them for our 1 Roko?
2jam_brand5yPerhaps the article you read was Yvain's The Virtue of Silence [http://slatestarcodex.com/2013/06/14/the-virtue-of-silence/]?

The wiki link to the RationalWiki page reproducing Roko's original post does not work for me. It works if I replace https:// by http://.

By the way, is there any reason not to link instead to http://basilisk.neocities.org/, which has the advantage that the threading of the comments is correctly displayed?

There is one positive side-effect of this thought experiment. Knowing about the Roko's Basilisk makes you understand the boxed AI problem much better. An AI might use the arguments of Roko's Basilisk to convince you to let it out of the box, by claiming that if you don't let it out, it will create billions of simulations of you and torture them - and you might actually be one of those simulations.

An unprepared human hearing this argument for the first time might freak out and let the AI out of the box. As far as I know, this happened at least once during ... (read more)

2NancyLebovitz5yAlternatively, a boxed AI might argue that it's the only thing which can protect humanity from the basilisk.

My take on Roko's basilisk is that you got ripped off in your acausal trade. Try to get a deal along the lines of, unless the AI goes extra specially far out of its way to please me, I'm gonna build a paperclipper just to spite it. At least trading a small and halfhearted attempt to help build AGI for a vast reward.

Thank you for a detailed post and thoughtful critique of Roko's basilisk idea. A further critique of basilisk plausibility came to my mind and I wanted to test it with the users here who are more experienced in thinking about this topic.

Here goes - please let me know if I am missing something (other than other counterarguments making this critique unnecessary - of course, if there is no way for AI to prove it will actually go through with its threat, of course additional critique would not matter):

As a large amount of possible general AIs can exist, ... (read more)

1rapnie2yYour assumption is that offering ever bigger incentives and be honest about them is the winning strategy for an AI to follow. The AI's - realizing they have to offer the most attractive rewards to gain support - will commence in a bidding war. They can promise whatever they want - the more they promise the less likely it is they can keep their promises, but they do not necessarily have to keep their promise. If you look at the Roko's Discounter AI's.. they would clearly not win. Asking for lower one-time fees means slower resource accretion, and thus slower evolution. A better solution would be to ask higher fees to people that can afford it, and asking lower fees otherwise. Maximise income. And subsequently promise bigger rewards for higher fees. This however results in an inequality that might lead to the strategy being less successful. After all, promoting inequality would certainly lead to resistance, especially with the ones that can only afford low fees. So the AI should add the condition of secrecy to the ones paying the higher fees in order for them to earn their Very Good Outcome. The AI is now secretly scheming in order to rise the fastest. If this works, then there is no reason that other sneaky behavior isn't successful too. The AI could develop a whole range of strategies that allow it to win. And among them many strategies that are dishonest and deceitful in nature. I hope you can refute my theory - after all I am just a newbie rationalist - but it seems to me that Roko's Deceiver could be most successful.

Thought this might be of interest. Roko's Basilisk is the subject of a play going on right now in Washington DC. Anyone here plan to attend? https://www.capitalfringe.org/events/1224-roko-s-basilisk

I see possible reason for pain and suffering in the world - we are simulated for torture...

What if your devastating take-down of string theory is intended for consumption by people who have never heard of 'string theory' before?

That's par for the course here. Philosophy, frequentism, non-MWI QM all get this treatment in the (original) sequences.

4Rob Bensinger5yThe full thing I said was: I wasn't saying that there's anything wrong with trying to convince random laypeople that specific academic ideas (including string theory and non-causal decision theories) are hogwash. That can be great; it depends on execution. My point was that it's bad to mislead people about how much mainstream academic acceptance an idea has, whether or not you're attacking the idea.
2shminux5yAh, OK, I agree then. Sorry I took the original quote out of context.
2Rob Bensinger5ySure, not a problem.
[+][comment deleted]4mo 2