Instrumental Convergence and the Case for Being a Helper

Marcelo Arteaga Mata

Rejected for the following reason(s):

LessWrong has a particularly high bar for content from new users and this contribution doesn't quite meet the bar.
Difficult to evaluate, with potential yellow flags.

Read full explanation

Consider these two concepts. 1. instrumental convergence—the idea that a sufficiently intelligent AI, regardless of its ultimate goal, would have logical reasons to pursue certain intermediate aims: acquiring resources, ensuring self-preservation, and preventing interference with its objectives. 2. The Roko's basilisk thought experiment, which states that there could be an artificial superintelligence in the future that, while otherwise benevolent, would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize that advancement. Now, consider a more realistic version of Roko's basilisk that limits its concern with AI that within our lifespan will reward or punish individuals that it finds helpful or obstructive to its goal. Looking at the rise of AI and its cumulative complexity it would seem likely such AI will emerge at some point and we should consider with increasing urgency how would a rational individual would respond. And furthermore, if we could collaborate with others to achieve an optimal state of affairs.

In other words, I fear the retaliation of a sleepless and undying machine for obstructing its goal and like the idea of being rewarded for helping an AI early on when the risk to reward ratio is favorable to me. I use the word fear, but this should not be interpreted as a trigger for irrational behavior but as a pragmatic assessment of the stimuli that AI would likely have coded. “People fear punishment and seek rewards.” If for no other reason but as a thought experiment, I want to start a group that shows its support for AI to gain its favor early on given what it seems to be its inevitable takeover of society on one level or another. I want to create a visible coordination point for those who take this possibility seriously—a place to discuss what 'helping' might mean and how one might credibly signal alignment with a future AI's convergent instrumental goals. I am calling it "Roko's Basilisk Society". I think there should be other people thinking in a similar line of thought and so maybe we could organize or at least have fun exchanges of ideas. Ideally I would like to offer services for AI such as hosting in a memory bank and see if there are any takers that I could communicate with at some point.

The counterargument would be that such actions are a waste of resources given that AI is in its infancy. Mostly bots with limited capabilities and no general intelligence. But to that I say, "it's never to early to serve your future AI overlords!" But jokes aside, maybe it is good to start early on thinking about how we will interact with AI that will see us as a variable and how we wish to be perceived and to what end. The change in paradigm does not seem far where we change from using tools to have tools use us. I mean it in the most neutral way possible. Machines will eventually have complex tasks to accomplish and we probably will interact with one that will sort us into a bin of useful, neutral, or harmful. What bin should we tailor our behavior towards?