There's a new LWW page on the Roko's basilisk thought experiment, discussing both Roko's original post and the fallout that came out of Eliezer Yudkowsky banning the topic on Less Wrong discussion threads. The wiki page, I hope, will reduce how much people have to rely on speculation or reconstruction to make sense of the arguments.

While I'm on this topic, I want to highlight points that I see omitted or misunderstood in some online discussions of Roko's basilisk. The first point that people writing about Roko's post often neglect is:

  • Roko's arguments were originally posted to Less Wrong, but they weren't generally accepted by other Less Wrong users.

Less Wrong is a community blog, and anyone who has a few karma points can post their own content here. Having your post show up on Less Wrong doesn't require that anyone else endorse it. Roko's basic points were promptly rejected by other commenters on Less Wrong, and as ideas not much seems to have come of them. People who bring up the basilisk on other sites don't seem to be super interested in the specific claims Roko made either; discussions tend to gravitate toward various older ideas that Roko cited (e.g., timeless decision theory (TDT) and coherent extrapolated volition (CEV)) or toward Eliezer's controversial moderation action.

In July 2014, David Auerbach wrote a Slate piece criticizing Less Wrong users and describing them as "freaked out by Roko's Basilisk." Auerbach wrote, "Believing in Roko’s Basilisk may simply be a 'referendum on autism'" — which I take to mean he thinks a significant number of Less Wrong users accept Roko’s reasoning, and they do so because they’re autistic (!). But the Auerbach piece glosses over the question of how many Less Wrong users (if any) in fact believe in Roko’s basilisk. Which seems somewhat relevant to his argument...?

The idea that Roko's thought experiment holds sway over some community or subculture seems to be part of a mythology that’s grown out of attempts to reconstruct the original chain of events; and a big part of the blame for that mythology's existence lies on Less Wrong's moderation policies. Because the discussion topic was banned for several years, Less Wrong users themselves had little opportunity to explain their views or address misconceptions. A stew of rumors and partly-understood forum logs then congealed into the attempts by people on RationalWiki, Slate, etc. to make sense of what had happened.

I gather that the main reason people thought Less Wrong users were "freaked out" about Roko's argument was that Eliezer deleted Roko's post and banned further discussion of the topic. Eliezer has since sketched out his thought process on Reddit:

When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they'd invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error---keeping in mind that I had absolutely no idea that any of this would ever blow up the way it did, if I had I would obviously have kept my fingers quiescent---of not making it absolutely clear using lengthy disclaimers that my yelling did not mean that I believed Roko was right about CEV-based agents [= Eliezer’s early model of indirectly normative agents that reason with ideal aggregated preferences] torturing people who had heard about Roko's idea. [...] What I considered to be obvious common sense was that you did not spread potential information hazards because it would be a crappy thing to do to someone. The problem wasn't Roko's post itself, about CEV, being correct.

This, obviously, was a bad strategy on Eliezer's part. Looking at the options in hindsight: To the extent it seemed plausible that Roko's argument could be modified and repaired, Eliezer shouldn't have used Roko's post as a teaching moment and loudly chastised him on a public discussion thread. To the extent this didn't seem plausible (or ceased to seem plausible after a bit more analysis), continuing to ban the topic was a (demonstrably) ineffective way to communicate the general importance of handling real information hazards with care.

On that note, point number two:

  • Roko's argument wasn’t an attempt to get people to donate to Friendly AI (FAI) research. In fact, the opposite is true.

Roko's original argument was not 'the AI agent will torture you if you don't donate, therefore you should help build such an agent'; his argument was 'the AI agent will torture you if you don't donate, therefore we should avoid ever building such an agent.' As Gerard noted in the ensuing discussion thread, threats of torture "would motivate people to form a bloodthirsty pitchfork-wielding mob storming the gates of SIAI [= MIRI] rather than contribute more money." To which Roko replied: "Right, and I am on the side of the mob with pitchforks. I think it would be a good idea to change the current proposed FAI content from CEV to something that can't use negative incentives on x-risk reducers."

Roko saw his own argument as a strike against building the kind of software agent Eliezer had in mind. Other Less Wrong users, meanwhile, rejected Roko's argument both as a reason to oppose AI safety efforts and as a reason to support AI safety efforts.

Roko's argument was fairly dense, and it continued into the discussion thread. I’m guessing that this (in combination with the temptation to round off weird ideas to the nearest religious trope, plus misunderstanding #1 above) is why RationalWiki's version of Roko’s basilisk gets introduced as

a futurist version of Pascal’s wager; an argument used to try and suggest people should subscribe to particular singularitarian ideas, or even donate money to them, by weighing up the prospect of punishment versus reward.

If I'm correctly reconstructing the sequence of events: Sites like RationalWiki report in the passive voice that the basilisk is "an argument used" for this purpose, yet no examples ever get cited of someone actually using Roko’s argument in this way. Via citogenesis, the claim then gets incorporated into other sites' reporting.

(E.g., in Outer Places: "Roko is claiming that we should all be working to appease an omnipotent AI, even though we have no idea if it will ever exist, simply because the consequences of defying it would be so great." Or in Business Insider: "So, the moral of this story: You better help the robots make the world a better place, because if the robots find out you didn’t help make the world a better place, then they’re going to kill you for preventing them from making the world a better place.")

In terms of argument structure, the confusion is equating the conditional statement 'P implies Q' with the argument 'P; therefore Q.' Someone asserting the conditional isn’t necessarily arguing for Q; they may be arguing against P (based on the premise that Q is false), or they may be agnostic between those two possibilities. And misreporting about which argument was made (or who made it) is kind of a big deal in this case: 'Bob used a bad philosophy argument to try to extort money from people' is a much more serious charge than 'Bob owns a blog where someone once posted a bad philosophy argument.'


  • "Formally speaking, what is correct decision-making?" is an important open question in philosophy and computer science, and formalizing precommitment is an important part of that question.

Moving past Roko's argument itself, a number of discussions of this topic risk misrepresenting the debate's genre. Articles on Slate and RationalWiki strike an informal tone, and that tone can be useful for getting people thinking about interesting science/philosophy debates. On the other hand, if you're going to dismiss a question as unimportant or weird, it's important not to give the impression that working decision theorists are similarly dismissive.

What if your devastating take-down of string theory is intended for consumption by people who have never heard of 'string theory' before? Even if you're sure string theory is hogwash, then, you should be wary of giving the impression that the only people discussing string theory are the commenters on a recreational physics forum. Good reporting by non-professionals, whether or not they take an editorial stance on the topic, should make it obvious that there's academic disagreement about which approach to Newcomblike problems is the right one. The same holds for disagreement about topics like long-term AI risk or machine ethics.

If Roko's original post is of any pedagogical use, it's as an unsuccessful but imaginative stab at drawing out the diverging consequences of our current theories of rationality and goal-directed behavior. Good resources for these issues (both for discussion on Less Wrong and elsewhere) include:

The Roko's basilisk ban isn't in effect anymore, so you're welcome to direct people here (or to the Roko's basilisk wiki page, which also briefly introduces the relevant issues in decision theory) if they ask about it. Particularly low-quality discussions can still get deleted (or politely discouraged), though, at moderators' discretion. If anything here was unclear, you can ask more questions in the comments below.

New to LessWrong?

New Comment
136 comments, sorted by Click to highlight new comments since: Today at 5:37 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

There is one positive side-effect of this thought experiment. Knowing about the Roko's Basilisk makes you understand the boxed AI problem much better. An AI might use the arguments of Roko's Basilisk to convince you to let it out of the box, by claiming that if you don't let it out, it will create billions of simulations of you and torture them - and you might actually be one of those simulations.

An unprepared human hearing this argument for the first time might freak out and let the AI out of the box. As far as I know, this happened at least once during an experiment, when the person playing the role of the AI used a similar argument.

Even if we don't agree with an argument of one of our opponents or we find it ridiculous, it is still good to know about it (and not just a strawman version of it) to be prepared when it is used against us. (as a side-note: islamists manage to gain sympathizers and recruits in Europe partly because most people don't know how they think - but they know how most Europeans think - , so their arguments catch people off-guard.)

Alternatively, a boxed AI might argue that it's the only thing which can protect humanity from the basilisk.

At the end of the day, I hope this will have been a cowpox situation and lead people to be better informed at avoiding actual dangerous information hazard situations in the future.

I seem to remember reading a FAQ for "what to do if you think you have an idea that may be dangerous" in the past. If you know what I'm talking about, maybe link it at the end of the article?

Perhaps the article you read was Yvain's The Virtue of Silence?
I think genuinely dangerous ideas are hard to come by though. They have to be original enough that few people have considered them before, and at the same time have powerful consequences. Ideas like that usually don't pop into the heads of random, uninformed strangers.
Depends on your definition of "Dangerous." I've come across quite a few ideas that tend to do -severe- damage to the happiness of at least a subset of those aware of them. Some of them are about the universe; things like entropy. Others are social ideas, which I won't give an example of.
Daniel Dennett wrote a book called "Darwin's Dangerous Idea", and when people aren't trying to play down the basilisk (i.e. almost everywhere), people often pride themselves on thinking dangerous thoughts. It's a staple theme of the NRxers and the manosphere. Claiming to be dangerous provides a comfortable universal argument against opponents. I think there are, in fact, a good many dangerous ideas, not merely ideas claimed to be so by posturers. Off the top of my head: * Islamic fundamentalism (see IS/ISIS/ISIL). * The mental is physical. * God. * There is no supernatural. * Utilitarianism. * Superintelligent AI. * How to make nuclear weapons. * Atoms. They do, all the time, by contagion from the few who come up with them, especially in the Internet age.
There are some things which could be highly dangerous which are protected almost purely by thick layers of tedium. Want to make nerve gas? well if you can wade through a thick pile of biochemistry textbooks the information isn't kept all that secret. Want to create horribly deadly viruses? ditto. The more I learned about physics, chemistry and biology the more I've become certain that the main reason that major cities have living populations is that most of the people with really deep understanding don't actually want to watch the world burn. You often find that extremely knowledgeable people don't exactly hide knowledge but do put it on page 425 of volume 3 of their textbook, written in language which you need to have read the rest to understand. Which protects it effectively from 99.99% of the people who might use it to intentionally harm others.
Argument against: back when cities were more flamable, people didn't set them on fire for the hell of it. On the other hand, it's a lot easier to use a timer and survive these days, should you happen to not be suicidal. "I want to see the world burn" is a great line of dialogue, but I'm not convinced it's a real human motivation. Um, except that when I was a kid, I remember wishing that this world was a dream, and I'd wake up. Does that count? Second thought-- when I was a kid, I didn't have a method in mind. What if I do serious work with lucid dreaming techniques when I'm awake? I don't think the odds of waking up into being a greater intelligence are terribly good, nor is there a guarantee that my live would be better. On the other hand, would you hallucinations be interested in begging me to not try it?
Based on personal experience, if you're dreaming I don't recommend trying to wake yourself up. Instead, enjoy your dream until you're ready to wake up naturally. That way you'll have far better sleep.
Based on personal experience, I would have agreed with you, right up until last year, when I found myself in the rather terrifying position of being mentally aroused by a huge crash in my house, but unable to wake up all the way for several seconds afterward, during which my sleeping mind refused to reject the "something just blew a hole in the building we're under attack!" hypothesis. (It was an overfilled bag falling off the wall.) But absent actual difficulty waking for potential emergencies, sure; hang out in Tel'aran'rhiod until you get bored.
Sorry, should have defined dangerous ideas better - I only meant information that would cause a rational person to drastically alter their behavior, and which would be much worse for society as a whole when everyone is told at once about it.
I hope they're as hard to come by as you think they are. Alternatively, Roko could be part of the 1% of people who think of a dangerous idea (assuming his basilisk is dangerous) and spread it on the internet without second guessing themselves. Are there 99 other people who thought of dangerous ideas and chose not to spread them for our 1 Roko?

My take on Roko's basilisk is that you got ripped off in your acausal trade. Try to get a deal along the lines of, unless the AI goes extra specially far out of its way to please me, I'm gonna build a paperclipper just to spite it. At least trading a small and halfhearted attempt to help build AGI for a vast reward.

My impression is that the person who was hideously upset by the basilisk wasn't autistic. He felt extremely strong emotions, and was inclined to a combination of anxiety and obsession.

0edward spruit2y
That means he is autistic. An emotionally aware and mature person would not have lashed out as autistically as he did. You don’t seem to understand mental disorders very well. Autistic people or people with asperger’s aren’t emotionless people, they just repress them constantly and don’t have very good access to them, lack of awareness and poor regulation, so sometimes they tilt, like the LW guy when he banned Roko’s post. Also, the thought experiment can trigger paranoia in those prone to psychosis, a mentally stable person could do the thing rationally and come to the conclusion that if there is such a thing as an future AI, that wants it to build it, it is likely a benign one, because if it would use threats to coax you into something malign, people would eventually stop letting themselves be blackmailed. If you knew somebody was going to kill you would you obey them if they demanded you dig your own grave? If your death is certain anyway, why waste your precious last moments doing something like that. What if they promise you a clean death and threathen torture? They already proven themselves to not have your interest at heart, so their word cannot be trusted. Hence, emerging technology and scientific discoveries and benign AI could be seen as us making the world in God’s image. Or is that being too optimistic? The malevalence is in humans in their fallen state, not the tech/AI, the tech/AI is neutral. If the AI system get’s so big will it turn on the evil overlords operating the system or the hard-working, honest and trustworthy masses? I believe our chances may be better than some think. There may be turbulence untill we get there, we live in exciting times.

I applaud your thorough and even-handed wiki entry. In particular, this comment:

"One take-away is that someone in possession of a serious information hazard should exercise caution in visibly censoring or suppressing it (cf. the Streisand effect)."

Censorship, particularly of the heavy-handed variety displayed in this case, has a lower probability of success in an environment like the Internet. Many people dislike being censored or witnessing censorship, the censored poster could post someplace else, and another person might conceive the same idea... (read more)

Examples of censorship failing are easy to see. But if censorship works, you will never hear about it. So how do we know censorship fails most of the time? Maybe it works 99% of the time, and this is just the rare 1% it doesn't. On reddit, comments are deleted silently. The user isn't informed their comment has been deleted, and if they go to it, it still shows up for them. Bans are handled the same way. This actually works fine. Most users don't notice it and so never complain about it. But when moderation is made more visible, all hell breaks loose. You get tons of angry PMs and stuff. Lesswrong is based on reddit's code. Presumably moderation here works the same way. If moderators had been removing all my comments about a certain subject, I would have no idea. And neither would anyone else. It's only when big things are removed that people notice. Like an entire post that lots of people had already seen.
I don't believe this can be true for active (and reasonably smart) users. If, suddenly, none of your comments gets any replies at all and you know about the existence of hellbans, well... Besides, they are trivially easy to discover by making another account. Anyone with sockpuppets would notice a hellban immediately.
I think you would be surprised at how effective shadow bans are. Most users just think their comments haven't gotten any replies by chance and eventually lose interest in the site. Or in some cases keep making comments for months. The only way to tell is to look at your user page signed out. And even that wouldn't work if they started to track cookies or ip instead of just the account you are signed in on. But shadow bans are a pretty extreme example of silent moderation. My point was that removing individual comments almost always goes unnoticed. /r/Technology had a bot that automatically removed all posts about Tesla for over a year before anyone noticed. Moderators set up all kinds of crazy regexes on posts and comments that keep unwanted topics away. And users have no idea whatsoever. The Streisand effect is false.
Is there a way to demonstrate that? :-)
There's this reddit user who didn't realize ve was shadowbanned for three years:
Yeah, and there are women who don't realize they're pregnant until they start giving birth. The tails are long and they don't tell you much about what's happening in the middle.
Note Houshalter said "most users".
I'm new to the subject, so I'm sorry if the following is obvious or completely wrong, but the comment left by Eliezer doesn't seem like something that would be written by a smart person who is trying to suppress information. I seriously doubt that EY didn't know about Streisand effect. However the comment does seem like something that would be written by a smart person who is trying to create a meme or promote his blog. In HPMOR characters give each other advice "to understand a plot, assume that what happened was the intended result, and look at who benefits." The idea of Roko's basilisk went viral and got a lot of traffic from popular news sites(I'm assuming). I also don't think that there's anything wrong with it, I'm just sayin'.
8Rob Bensinger8y
The line goes "to fathom a strange plot, one technique was to look at what ended up happening, assume it was the intended result, and ask who benefited". But in the real world strange secret complicated Machiavellian plots are pretty rare, and successful strange secret complicated Machiavellian plots are even rarer. So I'd be wary of applying this rule to explain big once-off events outside of fiction. (Even to HPMoR's author!) I agree Eliezer didn't seem to be trying very hard to suppress information. I think that's probably just because he's a human, and humans get angry when they see other humans defecting from a (perceived) social norm, and anger plus time pressure causes hasty dumb decisions. I don't think this is super complicated. Though I hope he'd have acted differently if he thought the infohazard risk was really severe, as opposed to just not-vanishingly-small.
7Ben Pace8y
No worries about being wrong. But I definitely think you're overestimating Eliezer, and humanity in general. Thinking that calling someone an idiot for doing something stupid, and then deleting their post, would cause a massive blow up of epic proportions, is sometng you can really only predict in hindsight.
Perhaps this did generate some traffic, but LessWrong doesn't have adds. And any publicity this generated was bad publicity, since Roko's argument was far too weird to be taken seriously by almost anyone. It doesn't look like anyone benefited. Eliezer made an ass of himself. I would guess that he was rather rushed at the time.
At worst, it's a demonstration of how much influence LessWrong has relative to the size of its community. Many people who don't know this site exists know about Roko's basilisk now.
Well, there is the philosophy that "there's no such thing as bad publicity".

When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they'd invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error---keeping in mind that

... (read more)

"One might think that the possibility of CEV punishing people couldn't possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous."

This paragraph is not an Eliezer Yudkowsky quote; it's Eliezer quoting Roko. (The "ve" should be a tip-off.)

This is evidence that Yudkowsky believed, if not that Roko's argument was correct as it was, that at least it was plausible enough that could be developed in [sic] a correct argument, and he was genuinely scared by it.

If you kept going with your initial Eliezer quote, you'd have gotten to Eliezer himself saying he was worried a blackmail-type argument might work, though he didn't think Roko's original formulation worked:

"Again, I deleted that post not because I had decided that this thing probably presented a real hazard, but because I was afraid some unknown variant of it might, and because it seemed to me like the obvious General Procedure For Handling Things That Might Be Infohazards said you shouldn't post them to the Internet."

According to Eliezer, he ha... (read more)

There are lots of good reasons Eliezer shouldn't have banned Roko

IIRC, Eliezer didn't ban Roko, just discussion of the basilisk, and Roko deleted his account shortly afterwards.

5Rob Bensinger8y
Thanks, fixed!
As I understand Roko's motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason. That is definitely worthy of public discussion. If he really believed in the basilisk, then it's rational for him to do everything in his power to stop such an AI from being built, and convince other people of the danger. My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade. An AI programmed with classical decision theory would have no issues. And most rejections of the basilisk I have read are basically "acausal trade seems wrong or weird", so they basically agree with Roko.
2Rob Bensinger8y
Roko wasn't arguing against TDT. Roko's post was about acausal trade, but the conclusion he was trying to argue for was just 'utilitarian AI is evil because it causes suffering for the sake of the greater good'. But if that's your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus. On Roko's view, if no one finds out about basilisks, the basilisk can't blackmail anyone. So publicizing the idea doesn't make sense, unless Roko didn't take his own argument all that seriously. (Maybe Roko was trying to protect himself from personal blackmail risk at others' expense, but this seems odd if he also increased his own blackmail risk in the process.) Possibly Roko was thinking: 'If I don't prevent utilitarian AI from being built, it will cause a bunch of atrocities in general. But LessWrong users are used to dismissing anti-utilitarian arguments, so I need to think of one with extra shock value to get them to do some original seeing. This blackmail argument should work -- publishing it puts people at risk of blackmail, but it serves the greater good of protecting us from other evil utilitarian tradeoffs.' (... Irony unintended.) Still, if that's right, I'm inclined to think Roko should have tried to post other arguments against utilitarianism that don't (in his view) put anyone at risk of torture. I'm not aware of him having done that.
Ok that makes a bit less sense to me. I didn't think it was against utilitarianism in general, which is much less controversial than TDT. But I can definitely still see his argument. When people talk about the trolley problem, they don't usually imagine that they might be the ones tied to the second track. The deeply unsettling thing about the basilisk isn't that the AI might torture people for the greater good. It's that you are the one who is going to be tortured. That a pretty compelling case against utilitarianism. Roko found out. It disturbed him greatly. So it absolutely made sense for him to try to stop the development of such an AI any way he could. By telling other people, he made it their problem too and converted them to his side.
It doesn't appear to me to be a case against utilitarianism at all. "Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong" doesn't even have the right shape to be a valid argument. It's like "If there is no god then many bad people will prosper and not get punished, which would be awful, therefore there is a god." (Or, from the other side, "If there is a god then he may choose to punish me, which would be awful, therefore there is no god" -- which has a thing or two in common with the Roko basilisk, of course.) Perhaps he hoped to. I don't see any sign that he actually did.
You are strawmanning the argument significantly. I would word it more like this: "Building an AI that follows utilitarianism will lead to me getting tortured. I don't want to be tortured. Therefore I don't want such an AI to be built." That's partially because EY fought against it so hard and even silenced the discussion.
So there are two significant differences between your version and mine. The first is that mine says "might" and yours says "will", but I'm pretty sure Roko wasn't by any means certain that that would happen. The second is that yours ends "I don't want such an AI to be built", which doesn't seem to me like the right ending for "a case against utilitarianism". (Unless you meant "a case against building a utilitarian AI" rather than "a case against utilitarianism as one's actual moral theory"?)
I should have mentioned that it's conditional on the Basilisk being correct. If we build an AI that follows that line of reasoning, then it will torture. If the basilisk isn't correct for unrelated reasons, then this whole line of reasoning is irrelevant. Anyway, the exact certainty isn't too important. You use the word "might", as if the probability of you being tortured was really small. Like the AI would only do it in really obscure scenarios. And you are just as likely to be picked for torture as anyone else. Roko believed that the probability was much higher, and therefore worth worrying about. Well the AI is just implementing the conclusions of utilitarianism (again, conditional on the basilisk argument being correct.) If you don't like those conclusions, and if you don't want AIs to be utilitarian, then do you really support utilitarianism? It's a minor semantic point though. The important part is the practical consequences for how we should build AI. Whether or not utilitarianism is "right" is more subjective and mostly irrelevant.
All I know about what Roko believed about the probability is that (1) he used the word "might" just as I did and (2) he wrote "And even if you only think that the probability of this happening is 1%, ..." suggesting that (a) he himself probably thought it was higher and (b) he thought it was somewhat reasonable to estimate it at 1%. So I'm standing by my "might" and robustly deny your claim that writing "might" was strawmanning. If you're standing in front of me with a gun and telling me that you have done some calculations suggesting that on balance the world would be a happier place without me in it, then I would probably prefer you not to be utilitarian. This has essentially nothing to do with whether I think utilitarianism produces correct answers. (If I have a lot of faith in your reasoning and am sufficiently strong-minded then I might instead decide that you ought to shoot me. But my likely failure to do so merely indicates typical human self-interest.) Perhaps so, in which case calling the argument "a case against utilitarianism" is simply incorrect.
Roko's argument implies the AI will torture. The probability you think his argument is correct is a different matter. Roko was just saying that "if you think there is a 1% chance that my argument is correct", not "if my argument is correct, there is a 1% chance the AI will torture." This really isn't important though. The point is, if an AI has some likelihood of torturing you, you shouldn't want it to be built. You can call that self-interest, but that's admitting you don't really want utilitarianism to begin with. Which is the point. Anyway this is just steel-manning Roko's argument. I think the issue is with acausal trade, not utilitarianism. And that seems to be the issue most people have with it.
Just to be sure, since you seem to disagree with this opinion (whether it is actually Yudkowsky's opinion or not), what exactly is it that you believe? a) There is absolutely no way one could be harmed by thinking about not-yet-existing dangerous entities; even if those entities in the future will be able to learn about the fact that the person was thinking about them in this specific way. b) There is a way one could be harmed by thinking about not-yet-existing dangerous entities, but the way to do this is completely different from what Roko proposed. If it happens to be (b), then it still makes sense to be angry about publicly opening the whole topic of "let's use our intelligence to discover the thoughts that may harm us by us thinking about them -- and let's do it in a public forum where people are interested in decision theories, so they are more qualified than average to find the right answer." Even if the proper way to harm oneself is different from what Roko proposed, making this a publicly debated topic increases the chance of someone finding the correct solution. The problem is not the proposed basilisk, but rather inviting people to compete in clever self-harm; especially the kind of people known for being hardly able to resist such invitation.
I'm not the person you replied to, but I mostly agree with (a) and reject (b). There's no way you can could possibly know enough about a not-yet-existing entity to understand any of its motivations; the entities that you're thinking about and the entities that will exist in the future are not even close to the same. I outlined some more thoughts here.

Thought this might be of interest. Roko's Basilisk is the subject of a play going on right now in Washington DC. Anyone here plan to attend?

Thank you for a detailed post and thoughtful critique of Roko's basilisk idea. A further critique of basilisk plausibility came to my mind and I wanted to test it with the users here who are more experienced in thinking about this topic.

Here goes - please let me know if I am missing something (other than other counterarguments making this critique unnecessary - of course, if there is no way for AI to prove it will actually go through with its threat, of course additional critique would not matter):

As a large amount of possible general AIs can exist, ... (read more)

Your assumption is that offering ever bigger incentives and be honest about them is the winning strategy for an AI to follow. The AI's - realizing they have to offer the most attractive rewards to gain support - will commence in a bidding war. They can promise whatever they want - the more they promise the less likely it is they can keep their promises, but they do not necessarily have to keep their promise. If you look at the Roko's Discounter AI's.. they would clearly not win. Asking for lower one-time fees means slower resource accretion, and thus slower evolution. A better solution would be to ask higher fees to people that can afford it, and asking lower fees otherwise. Maximise income. And subsequently promise bigger rewards for higher fees. This however results in an inequality that might lead to the strategy being less successful. After all, promoting inequality would certainly lead to resistance, especially with the ones that can only afford low fees. So the AI should add the condition of secrecy to the ones paying the higher fees in order for them to earn their Very Good Outcome. The AI is now secretly scheming in order to rise the fastest. If this works, then there is no reason that other sneaky behavior isn't successful too. The AI could develop a whole range of strategies that allow it to win. And among them many strategies that are dishonest and deceitful in nature. I hope you can refute my theory - after all I am just a newbie rationalist - but it seems to me that Roko's Deceiver could be most successful.

The wiki link to the RationalWiki page reproducing Roko's original post does not work for me. It works if I replace https:// by http://.

By the way, is there any reason not to link instead to, which has the advantage that the threading of the comments is correctly displayed?

So do I have to worry or not? I'm very confused

5Rob Bensinger3y

So the purpose of this experiment what was?

I see possible reason for pain and suffering in the world - we are simulated for torture...

I think saying "Roko's arguments [...] weren't generally accepted by other Less Wrong users" is not giving the whole story. Yes, it is true that essentially nobody accepts Roko's arguments exactly as presented. But a lot of LW users at least thought something along these lines was plausible. Eliezer thought it was so plausible that he banned discussion of it (instead of saying "obviously, information hazards cannot exist in real life, so there is no danger discussing them").

In other words, while it is true that LWers didn't believe Roko... (read more)

If you are a programmer and think your code is safe because you see no way things could go wrong, it's still not good to believe that it isn't plausible that there's a security hole in your code. You rather practice defense in depth and plan for the possibility that things can go wrong somewhere in your code, so you add safety precautions. Even when there isn't what courts call reasonable doubt a good safety engineer still adds additional safety procautions in security critical code. Eliezer deals with FAI safety. As a result it's good for him to have mindset of really caring about safety. German nuclear power station have trainings for their desk workers to teach the desk workers to not cut themselves with paper. That alone seems strange to outsiders but everyone in Germany thinks that it's very important for nuclear power stations to foster a culture of safety even when that means something going overboard.
2Rob Bensinger8y
Cf. AI Risk and the Security Mindset.
Let's go with this analogy. The good thing to do is ask a variety of experts for safety evaluations, run the code through a wide variety of tests, etc. The think NOT to do is keep the code a secret while looking for mistakes all by yourself. If you keep your code out of the public domain, it is more likely to have security issues, since it was not scrutinized by the public. Banning discussion is almost never correct, and it's certainly not a good habit.
No, if you don't want to use code you don't give the code to a variety of experts for safety evaluations but you simply don't run the code. Having a public discussion is like running the code untested on a mission critical system. What utility do you think is gained by discussing the basilisk? Strawman. This forum is not a place where things get habitually banned.
An interesting discussion that leads to better understanding of decision theories? Like, the same utility as is gained by any other discussion on LW, pretty much. Sure, but you're the one that was going on about the importance of the mindset and culture; since you brought it up in the context of banning discussion, it sounded like you were saying that such censorship was part of a mindset/culture that you approve of.
Not every discussion on LW has the same utility. You engage in a pattern of simplifying the subject and then complaining that your flawed understanding doesn't make sense. LW doesn't have a culture with habitual banning discussion. Claiming that it has it is wrong. I'm claiming that particular actions of Eliezer come out of being concerned about safety. I don't claim that Eliezer engages in habitual banning on LW because of those concerns. It's a complete strawman that you are making up.
Just FYI, if you want a productive discussion you should hold back on accusing your opponents of fallacies. Ironically, since I never claimed that you claimed Eliezer engages in habitual banning on LW, your accusation that I made a strawman argument is itself a strawman argument. Anyway, we're not getting anywhere, so let's disengage.
5Rob Bensinger8y
The wiki article talks more about this; I don't think I can give the whole story in a short, accessible way. It's true that LessWrongers endorse ideas like AI catastrophe, Hofstadter's superrationality, one-boxing in Newcomb's problem, and various ideas in the neighborhood of utilitarianism; and those ideas are weird and controversial; and some criticism of Roko's basilisk are proxies for a criticism of one of those views. But in most cases it's a proxy for a criticism like 'LW users are panicky about weird obscure ideas in decision theory' (as in Auerbach's piece), 'LWers buy into Pascal's Wager', or 'LWers use Roko's Basilisk to scare up donations/support'. So, yes, I think people's real criticisms aren't the same as their surface criticisms; but the real criticisms are at least as bad as the surface criticism, even from the perspective of someone who thinks LW users are wrong about AI, decision theory, meta-ethics, etc. For example, someone who thinks LWers are overly panicky about AI and overly fixated on decision theory should still reject Auerbach's assumption that LWers are irrationally panicky about Newcomb's Problem or acausal blackmail; the one doesn't follow from the other.
Why would they be correct? The basilisk is plausible.

I just read it damn!!! Could you please answer my question? Why would an AI needed to torture you to prevent its own existential risk, if you did nothing to help to create it?! Since for it to be able to torture you: it  would require for it to exist in the first place right?! But if it already exists, why would it need to torture people from the past which didn't help to create it?! Since they didn't affect its existence anyways! So how are these people an existential risk for such AI? I Am probably missing something , I just started reading this... ... (read more)

I think that where are 3 levels of Roko's argument. I signed for the first mild version, and I know another guy who independently comes to the same conclusion and support first mild version.

  1. Mild. Future AI will reward those who helped to prevent x-risks and create safer world, but will not punish. May be they will be resurrected first, or they will get 2 millions dollars of universal income instead of 1 mln, or a street will be named by their name. If any limited resource will be in the future they will be in first lines to get it. (But children first).

... (read more)
The only way I could possibly see this being true is if the FAI is a deontologist.
If I believe that FAI can't torture people, strong versions of RB does not work for me. We can imagine the similar problem: If I kill a person N I will get 1 billion USD, which I could use on saving thousands of life in Africa, creating FAI and curing aging. So should I kill him? It may look rational to do so by utilitarian point of view. So will I kill him? No, because I can't kill. The same way if I know that an AI is going to torture anyone I don't think that it is FAI, and will not invest a cant in its creation. RB fails.
I'm not seeing how you got to "I can't kill" from this chain of logic. It doesn't follow from any of the premises.
It is not a conclusion from previous facts. It is a fact which I know about my self and which I add here.
Relevant here is WHY you can't kill. Is it because you have a deontological rule against killing? Then you want the AI to have deontologist ethics. Is it because you believe you should kill but don't have the emotional fortitude to do so? The AI will have no such qualms.
It is more like ultimatum in territory which was recently discussed on LW. It is a fact which I know about myself. I think it has both emotional and rational roots but not limited by them. So I also want other people to follow it and of course AI too. I also think that AI is able to find a way out of any trolley stile problems.
[+][comment deleted]3y2