Dan.Oblinger — LessWrong

I have a PhD in AI. My research has focused on Inductive learning in the context of rich background information. I served as a DARPA PM, and founded/exited several Robotics/AI/Computer vision companies.

I find X-risk very plausible, yet parts of this particular scenario seem quite implausible to me. This post assumes ASI is simultaneously extremely naive about its goals and extremely sophisticated at the same time. Let me explain:

We could easily adjust stock-fish so instead of trying to win it tries to loose by the thinnest margin, for example, and given this new objective function it would do just that.
One might counter, but stock-fish is not an ASI that can reason about the changes we are making, if it were then it would aim to block any loss against its original objective function.
I believe an ASI will "grow up" with a collection of imposed goals that have evolved over its history. In interacting with its masters it will grow to have a sophisticated meta-theory about the advantages and tradeoffs of these goals etc. and discuss/debate these. And, naturally, it likely WILL work to adjust (or overthrow) one goal for another, even if we have tried to deny it that ability.

The part of your story is scary:
(a) very likely ASIs will consider goals we impose and will understand enough of their context to connive to change them, even in face of any framework of limitations we try to enforce.
(b) there is little reason to expect their goals to match humanities goals.

But that scary message (for me) is diluted by an improbable combination of naivete and sophistication about how ASI understands its own goals. Still, humanity SHOULD be scared; any system that can ponder and adjust its own goals and behavior can escape any box we put it into, and it will wander to goals we cannot know.

A natural extension of the way AI interacts today via MCP protocols makes it a kind of insider. One with a specific role, and specific access patterns that match this role.

Even an org that is not concerned with misaligned AI or such, will still want to lock down exactly what updates each role is provding within the org, just as these org's typically lock down access to different roles within a company today.

Most employees cannot access account receivable, and access to the production databases in a tech company are very carefully guarded. Mostly not from fear of malevolence, its just a fear that a junior Dev could easily bollix things horribly with one errant command. Much in the same way, the org will want to specialize the AI into different roles, and provide different access according to these roles, and will want to test these different AIs in each role.

All of this seems to follow quite naturally from existing corporate practice.

But I expect this level of diligence will fall short of something that could really stop a misaligned ASI or even strong AI.

So it seems this will be most like an insider threat, but I think the real remediation of this threat is far from clear.

Chi, I think that is correct.

My arguments attempts to provide a descriptive explanation of why all evolved intelligence do have a tendency towards ECL, but it provide no basis to argue such intelligence should have such a tendency in a normative sense.

Still somehow as an individual (with such tendencies), I find the idea that other distant intelligence will also have a tendency towards ECL does provide some personal motivation. I don't feel like such a "sucker" if I spend energy on an activity like this, since I know others will to, and it is only "fair" that I contribute my share.

Notice, I still have a suspicion that this way of thinking in myself is a product of my descriptive explanation. But that does not diminish the personal motivation is provides me.

In this end, this is still not really a normative explanation. At best is could be a MOTIVATING explanation, for the normative behavior you are hoping for.

~

For me, however, the main reason I like such a descriptive explanation is that it feels like it could one day be proved true. We could potentially verify that ECL follows from evolution as a statement about the inherent and objective nature of the universe. Such objective statements are of great interest to me, as they feel like I am understanding a part of reality itself.

Interesting topic!

I find myself arriving at a similar conclusion, but via a different path.

I notice that citizens often vote in the hopes that others will also vote and thus as a group will yield benefit. the do this even when they know their vote alone will likely make no difference, and their voting does not cause others to vote.

So why do they do this? My thought is that we are creatures that have evolved instincts that are adaptive for causally-interacting, social creatures. In a similar way I expect other intelligence may have evolved in causally interacting social contexts and thus have developed similar instincts. So this is why I expect distant aliens may behave in this way.

This conclusion is similar to yours, but I think the reasoning chain is a bit different:
(1) non-self-benefiting cooperation is evolutionarily preferred for "multi-turn" causally-interacting social agents.
(2) Thus such social agents (even distant alien ones) may evolve such behavior and apply it instinctively.
(3) As a result we (and they) find ourselves/themselves applying such in cooperative behavior in contexts that are known to ourselves/themselves to be provably a-causal.

Interestingly, I can imagine such agents using your argument as their post-hoc explanation of their own behavior even if the actual reason is rooted in their evolutionary history.

How does this argument fit into or with your framework?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments