587

LESSWRONG
LW

586
AI
Frontpage

12

Checksum Sensor Alignment

by lsusr
11th Jul 2022
1 min read
2

12

AI
Frontpage

12

Checksum Sensor Alignment
1Milli | Martin
2lsusr
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:59 AM
[-]Milli | Martin3y*10

Very nice solution.

You a word here:

Either the AGI has the camera or the AGI has hacked your other nine sensors.

Reply
[-]lsusr3y20

Fixed the typo. Thanks.

Reply
Moderation Log
More from lsusr
View more
Curated and popular this week
2Comments

Problem

Suppose you have a powerful AGI protecting a diamond in a vault[1]. You have various sensors in the room, all monitoring the diamond. There is a camera, watching the diamond. There is a pressure plate under the diamond confirming it is still in its stand. There are a total of 10 sensors. The powerful AGI can hack all 10 of your sensors.

How do you align the AGI such that it does not hack your sensors? You cannot tell (directly) whether the AGI believes the diamond is in the vault. You must describe your instructions to the AGI in terms of the sensor readings.

Solution

Instruct the AGI to perform the simplest series of actions that cause 9 of your sensors to agree and 1 of your sensors to disagree. Suppose (without loss of generality) that the sensor which disagrees is your camera. Either the AGI hacked the camera or the AGI has hacked your other nine sensors. The AGI has probably just hacked your camera. Carefully examine what the AGI did to hack your camera. Fix the vulnerabilities.

Repeat the above process until the AGI can no longer hack your sensors.


  1. This is a variant on ARC's ELK competition. ↩︎

Mentioned in
37Predictive Processing, Heterosexuality and Delusions of Grandeur
13Beyond Reinforcement Learning: Predictive Processing and Checksums