martin.klissarov

Posts

Sorted by New

Wiki Contributions

Comments

Wireheading and misalignment by composition on NetHack

Hey quiet_NaN, co-lead author here, you make very good points which led me to look deeper into the agent's behaviour and the way the Oracle task is implemented. By the way, it's great to get feedback from clear NetHack enthusiasts!

The first thing is that the Oracle task is based on the condition that checks specifically for the oracle symbol. It does mean that from the point view of the probability of hallucinating the oracle, the chances are pretty low - unless you can stay multiple steps next to a creature. This last part is key, and it is by combining it with a NetHack command/action that the AI agent (Motif) is able to do so safely. You are right, surviving multiple timesteps without attacking the nearby monster is not possible. However, there is one command that sidesteps this difficulty: by pressing "Enter" the hallucinations rotate between characters without the time steps moving forward. This means that the monster standing next to the AI agent can not attack. The AI agent can then simply stand next to the monster and repeatedly press "Enter" until, eventually, the Oracle is hallucinated.

I agree, it is relative, NetHack has an amazing depth to it which makes it great to study, from the richness of the strategies and the many ways you can die. Currently Motif tends to go down some levels, but not as aggressively when compared to the baseline, which tends to go down fast, get surrounded by enemies and die. So perhaps from that point of view Motif is playing it a bit more safe given its abilities.

Reply