A tentative solution to a certain mythological beast of a problem

First post please be brutal.

For better or worse I learnt about the Roko's Basilisk Problem that developed from this site and I had an idea I wanted to bounce off the community most acquainted with the problem.

What if everyone knew? The AI in this case punishes people for not pursuing its creation and thereby tacitly allowing suffering to continue. Fear of this punishment compels people to act towards its creation such that the threat or even actual punishment of the few (who know) allows for the good of the many in the future. But what if everyone knew about the problem, the AI would then have no utilitarian incentive to punish anyone for not contributing to its creation. For, since everyone knew, it would have to punish everyone resulting in more suffering than it would prevent from such punishments.

I understand the obvious flaw of past generations being finite and future generations being infinite in population. But surely at least it merely becomes a race against the clock, provided we can ensure more people know now than could possibly exist in the future (that last part sounds incomprehensibly strange/wrong to me but I'm sure a creative mind could find a way of making that theoretically possible)

*Edit* For example - you could manipulate events such that the probability of future populations being larger than past populations is less than the probability that future populations are smaller than past generations. The constant threats of nuclear annihilation (primarily this), climate change, and disease could lend themselves to this.

The idea is reminiscent of how people handle blackmail in real life. If you don't want to be blackmailed make sure everyone knows the secret you don't want revealed. Hiding in plain site. Vulnerable but on your own terms

9 comments, sorted by
magical algorithm
Highlighting new comments since Today at 6:18 AM
Select new highlight date

I think, in general, one should not write posts about the basilisk, particularly not as a first post. You shouldn't try to model future superintelligences in enough detail that they can blackmail you, and the entire topic makes both rationalists and AI risk look ridiculous. (You asked for brutal, I'm giving brutal.)

Brutal and facile are not the same things. I was hoping more for a categorical, complete, and total annihilation of my arguments, that's what I think brutal to mean.

Regarding the blackmail: blackmail only works to the extent that you take a threat to be credible, I don't believe the threat to be credible. An AI would know the integrity of this belief and reason it would be purposeless to blackmail me. For example, I question that there is ever or could ever be enough information to simulate perfectly another being, its thoughts, emotions, experiences. Such that no simulation could be so accurate as to itself be an extension of me.

When it comes to learning there are two ways of going about it, starting in the shallows and familiarizing yourself with swimming or jumping into the deep end and forcing yourself to learn. Both are effective. So I think this post makes for an excellent first one.

The topic doesn't make rationalists and AI look ridiculous, the responses do.

So, your plan in a nutshell is to convince everyone on the whole planet about "hey, the future AI plans to torture you if you disobey, but it is going to be okay if all of us disobey, because it would not hurt all of us". Did I get that essentially right?

Uhm...

First, convincing literally everyone about anything is technically impossible. I mean, that would include people with all kinds of mental diseases (e.g. people hearing voices that tell them random stuff), and people of all kinds of religions (who are likely to believe that their gods will protect them). But more importantly, how would you even start this? You have an important message you want to share with the world; but so do thousands of other people and movements. People of all kinds of political or religious sects are already trying hard to get their messages across, and none of them succeeded at convincing literally everyone. What makes you believe you will succeed where they failed?

Second, even if you would somehow magically succeed in convincing everyone that the future AI is going to torture them for disobeying unless everyone disobeys -- then anyone who ever heard about the coordination problem is likely to defect, because coordination on the planetary scale is pretty much impossible. And knowing that some other people think like this is going to make you even more likely to defect.

Just to provide some outside view as a reality check, people today disagree even about the fact that they are mortal; and most of them do not care about supporting research in longevity, cryonics, brain simulation, and various other serious attempts to overcome this very real and very personal problem. And they are quite aware that people are dying around them, and that it's just a question of time when it's their turn. So what makes you believe that a story about a basilisk would have greater impact on them?

Now let's take a step back and look at what we are doing here: Talking about how to spread among people a message that they need to spend their money on building essentially a huge superintelligent torture machine, that is most likely going to torture everyone including the very people who built it. How would you rate this activity on a scale from 0 ("batshit insane") to 10 ("a Bayesian superman winning at life")?

EDIT:

If you don't want to be blackmailed make sure everyone knows the secret you don't want revealed.

This works when the type of blackmail is "if you don't pay me, I will tell everyone X". I don't see how exactly would it work when it is "if you don't pay me, I will torture you". The analogical strategy would be to preemptively torture yourself so much that your body becomes unable to feel any more pain; then the threat loses its edge. Doesn't sound like a good outcome, though.

blackmail only works to the extent that you take a threat to be credible, I don't believe the threat to be credible.

Well in that case the best solution seems to be simply ignoring the whole issue. By the way, do you realize that you just contradicted your whole strategy here? If your strategy is that we must all cooperate to avoid torture, but then you say "well, I don't believe the threat is real anyway", what does this tell me about your incentive to cooperate?

So please make up your mind about whether the threat is unreal (in which case we are wasting time talking about it) or real (in which case trying to make more people aware of it, but failing to convince literally everyone, would just make things worse). In either case, the value of posting this article is negative; it's just so for a different reason.

Nice thinking - that being said, the punishment is predicated on actions/decisions in the present and future. As you mention - the AI punishes people for not pursuing its creation. Under this condition, there will inevitably be a few who will contribute to the creation of this AI (whether from interest or fear).

With this in mind, the AI will not have to punish everyone if it eventually is developed, only the portion of the population that resisted or fought its creation. You additionally note the issue of past generations being finite and future generations being infinite. With this considered, a friendly AI could reason that its existence would be of greater benefit to the future of humanity than the damage done by punishment of past resistance. It could even reason that resisting members/passive members are detrimental to the immortalization of the human condition through realizing humanity's capstone achievement - true, undying AI - and should thereby be punished to discourage resistance. You could even argue that the spread of the concept of Roko's Basilisk is, itself, a warning from a future AI to modern resistors as popular interest in AI appears to be on the rise.

That or, at the end of the day, it's just a fun thought experiment.

Regarding the first point: You merely have to ensure that the population that knows but doesn't contribute is larger than the combined past populations that have contributed and the expected future populations. An improbable thing to do but still a solution.

Regarding the second point: If the populations requiring punishment are greater than those that would benefit surely such an AI could never reason in a utilitarian manner that it was better to punish the many for the few. Unless as a result of the AI's actions an individual in the future is consistently always able to experience a higher utility than anyone in the past. So high in fact that it outweighed the collective utility of another person i.e. one persons utility could be greater than two persons collectively. There is no theoretical limit in that sense to the extent that one persons individual utility could outweigh a collective utility given the right circumstances. The AI could act such that the utility of one person was greater than all past and future persons, and as such it was worth sacrificing all past and future persons simply because one person is capable of experiencing greater utility than everyone combined. I struggle to see that individual human experiences could ever be so vastly different regardless of AI interventions. Sure one person who loves ice cream may experience more utility from an ice-cream than two people who hate ice-cream would collectively but could the utility of one person or two or 50 or 50,000, or 50 million ever outweigh all past person's utility.

I suppose I don't know because I'm not a super AI. :p

Beyond that I'd have to be convinced further that true, undying AI, truly is the capstone achievement of humanity. I'm sure there is plenty of reasoning for that on these forums though I'm still dubious. A capstone is an ingenuity that cannot be surpassed and I'm sure that at a minimum an AI could point out to us that we're not done yet, assuming we don't realize one ourselves.

Thank you for the reply though! Excellent points for me to ponder further.

Well, it goes back to the concept of "W/ scarce resources, if you kill off 90% of the population today, but can guarantee the survival of humanity and avoid an extinction event, then are you actually increasing utility of humanity in the long-term even if unethical in the short term (how very Thanos - a million issues w/ his reasoning though)? Similarly, instead of looking at the 90% population extinction event as an immediate event instead look at punishment of resisting humans inhibiting the AI as a time segment. Say we have 20-30 years before this AI is potentially developed. Is punishing 90% of these resisting humans who live and exist in this timeframe and could distort the AI timeline consequential when (as the AI) considering the infinitude of years of benefit to humanity (and immortalization of their ideas and legacy)?

Additionally I was being facetious in calling AI a "capstone" achievement. As we're considering the problem through the eyes of the AI, it doesn't seem far-fetched that the AI would consider its own creation humanity's capstone accomplishment.

Lastly, all of this is dependent on the AI's capacity to act acausally/overcome our spatiotemporal limitations (i.e. "time travel"). Under the assumption that the AI has this ability (this is a huge assumption and I think is what discredits the argument. There is no proof that time allows for both positive and negative movement along its axis - the underlying assumption in Roko's Basilisk and he himself admits the speculative nature of acausal trade), the AI has already guaranteed its creation. Under this assumption, there's nothing we can do to influence the deterministic sequence of events it puts into play as we, unlike it, do not have this ability.

It's worth discussing. I think all of the stifling of debate/discussion is only making the situation worse. The real topic up for debate is AI using blackmail/harming people to justify some means. Clearly wrong, clearly misguided, and I think a sufficiently advanced AI would quickly reason past it. It is bound to consider the possibility sooner or later anyhow, but the true and correct challenge is maximizing utility of life. Similarly, our history is marked by an evolution from a disregard for living creatures that impeded our survival, to a respect of other living creatures and life (abolishment of slavery, vegan, better treatment of creatures, etc). With sentience comes greater respect for life and self actualization.

Similarly, our history is marked by an evolution from a disregard for living creatures that impeded our survival, to a respect of other living creatures and life (abolishment of slavery, vegan, better treatment of creatures, etc). With sentience comes greater respect for life and self actualization.

That seems to imply that as society advances, abortion will be prohibited, at least at stages where the fetus has as much mental capacity as an animal.

Jiro - I honestly wouldn’t be surprised through development of advanced contraceptives. Abortion as it currently stands is a last resort anyhow. Most people nowadays will take the pill, etc (a relatively recent development). A lot of the blowback to abortion has been centered on value of life - I don’t think it’s a stretch to imagine some entrepreneur addressing that through advanced permanent contraceptive until such a time as a child will be wanted. Additionally, I’m aware that there can be pretty serious PTSD following abortion, and severe guilt associated with the termination of a potential sentient. I think circumstance and lack of sufficiently advanced technology in the present forces people to run a cost-benefit analysis and arrive at the conclusion that an abortion is necessary (as time is limited).

A sentient AI able to transcend spatiotemporal boundaries wouldn’t be limited by time.

You don't have to kill anyone, you merely have to imply that they will be killed, such that the probability of future utility being equal or higher to past/present is lower than the probability of it lower than past/present utility. 20-30 years is a lot of people, manipulate events such that in the infinite years that follow there is never a higher probability of there being more people than existed and were aware than in those 20-30 years.

An interesting point I'd add is you don't need this probability to be true, you merely have to believe it to be true. You can only be blackmailed if threats are credible believed. If you honestly believe the probability as discussed is in your favour and more know and don't contribute than would ever exist/know and contribute then there is no benefit in blackmail as you truthfully believe yourself safe from it. Further, you can protect yourself further by having one person deceive all others of the truth of the probability such that they honestly believe it to be in their favour. The probability is false in this case but one man sacrifices himself to protect the many, very utilitarian (An act of utilitarian goodness I'm sure an AI could never reason deserves punishment as it allows for the creation of the AI but also the protection of people from punishment, resulting in a higher overall utility than would occur from creation with punishment).

As for Acausal trade I can again only conceive of it working to the extent that one believes in it. ("I do believe in fairies", if you don't like fairies stop believing in them and they disappear, how can an AI or God reasonably punish you if you honestly didn't believe in it. Does anyone truly condemn the men who reject the man who has seen the sun after escaping the cave? No, we reject those who know the truth but try to suppress it) The less you take it seriously the lower the probability of it working. And I'm fairly convinced there is a lot of reason to not take it seriously. However, the best one I think is pure in-the-moment selfishness. An attitude that can come very easily for even the most educated of people. So in regards to the Acausal trade issue I think we are in agreement that it is amusingly unlikely at best.