Anti-Pascaline satisficer

Stuart_Armstrong

A putative new idea for AI control; index here.

It occurred to me that the anti-Pascaline agent design could be used as part of a satisficer approach.

The obvious thing to reduce dangerous optimisation pressure is to make a bounded utility function, with an easily achievable bound. Such as giving them a utility linear in paperclips that maxs out at 10.

The problem with this is that, if the entity is a maximiser (which it might become), it can never be sure that it's achieved its goals. Even after building 10 paperclips, and an extra 2 to be sure, and an extra 20 to be really sure, and an extra 3^^^3 to be really really sure, and extra cameras to count them, with redundant robots patrolling the cameras to make sure that they're all behaving well, etc... There's still an ε chance that it might have just dreamed this, say, or that its memory is faulty. So it has a current utility of (1-ε)10, and can increase this by reducing ε - hence by building even more paperclips.

Hum... ε, you say? This seems a place where the anti-Pascaline design could help. Here we would use it at the lower bound of utility. It currently has probability ε of having utility < 10 (ie it has not built 10 paperclips) and (1-ε) of having utility = 10. Therefore and anti-Pascaline agent with ε lower bound would round this off to 10, discounting the unlikely event that it has been deluded, and thus it has no need to build more paperclips or paperclip counting devices.

Note that this is an un-optimising approach, not an anti-optimising one, so the agent may still build more paperclips anyway - it just has no pressure to do so.

So it has a current utility of (1-ε)10, and can increase this by reducing ε - hence by building even more paperclips.

I take ε to be the probability that something weird is happening like you're hallucinating your paperclips. Why would building more paperclips reduce ε? If you are dreaming, you're just making more dream paperclips.

I'm sure you'd spend your time with trying to find increasingly elaborate ways to probe for bugs in Descartes' demon's simulation. It is not clear to me why your increasingly paranoid bug probes would involve making paperclips.

I agree that making more paperclips does not reduce ε, but an unsure AI might build more paperclips nonetheless.

If x is the probability that any one paperclip is hallucinated, the AI will never be certain it has created 10 paperclips (or any for that matter) as long as x > 0. But it can increase the probability it has created 10 paperclips by making 3^^^3 of them.

Bug probes may be a more efficient way to increase the probability, but that isn't certain.

It is not clear to me why your increasingly paranoid bug probes would involve making paperclips.

It need not. The problem occurs for any measure that burns resources (and probing the universe for bugs in the Descartes demon would be spectacular at burning resources).

One possibility is to implement the design which will makes agent strongly sensitive to the negative utility when he invests more time and resources on unnecessary actions after he ,with high-enough probability , achieved its original goal.

In the paperclip example : wasting time an resources in order to build more paperclips or building more sensors/cameras for analyzing the result should create enough negative utility to the agent compared to alternative actions.

This has problems with the creation of subagents: http://lesswrong.com/lw/lur/detecting_agents_and_subagents/

You can use a few resources to create subagents without that restriction.

Hasn't this been mentioned before, as satisficing on probability of object-level satisfaction?

Well, the agent design hasn't been mentioned before, but the idea seems plausible enough that an equivalent one could have been found.