502

LESSWRONG
LW

501

PabloAMC's Shortform

by PabloAMC
2nd Sep 2023
AI Alignment Forum
1 min read
1

2

Ω 2

This is a special post for quick takes by PabloAMC. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
PabloAMC's Shortform
1PabloAMC
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 12:06 PM
[-]PabloAMC2y10

The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.

Reply
Moderation Log
More from PabloAMC
View more
Curated and popular this week
1Comments