Creating Environments to Design and Test Embedded Agents

[-]Vanessa Kosoy6yΩ5130

It seems like collecting dollars requires a hard-coded notion of how to draw the boundary around the agents, which runs contrary to the intention. It seems more natural for require the agents to strive to change the world in a particular way (e.g. maximize the number of rubes in the world).

[-]lemonhope6yΩ110

Yes I agree it feels fishy. The problem with maximizing rubes is that the dilemmas might get lost in the detail of preventing rube hacking. Perhaps agents can "paint" existing money their own color, and money can only be painted once, and agents want to paint as much money as possible. Then the details remain in the env

[-]lemonhope6yΩ110

Or something simpler would be that the agent's money counter is in the environment but unmodifiable except by getting tokens, and the agent's goal is to maximize this quantity. Feels kind of fake maybe because money gives the agent no power or intelligence, but it's a valid object-in-the-world to have a preference over the state of.

Yet another option is to have the agent maximize energy tokens (which actions consume)

[-]Gurkenglas6y10

Consider drawing inspiration from the game Baba is You. Implement high-level modules like "argmax" of which perhaps a dozen can be assembled into an agent. Give it the ability to rearrange these modules, if its current configuration judges this to be a good idea, perhaps because the surplus can be turned into rubes. These modules need not be near the avatar; that way, the avatar can move into the control circuit to change it.

[-]Ramana Kumar6yΩ110

the objective of agent-designers is to have the agent collect as many agents as possible

Typo: should say "dollars"?

LESSWRONG
LW

LESSWRONG
LW

13

Creating Environments to Design and Test Embedded Agents

13

Ω 5

13

Ω 5

Creating a Proper Space to Design and Test Embedded Agents

Introduction

Inspiration

Desiderata for Automaton

Standard decision/game theory dilemmas are representable

It’s actually possible for agents to win; Information is discoverable

Agents are scorable, swappable, and comparable; Money exists

Omega is representable

Engineering details

An initial specification

A Contest Thing?

Attempting to Specify a Couple Dilemmas and Agents

Vanilla prisoner’s dilemma

Twin prisoner’s dilemma

Transparent Newcomb’s Problem / Parfit’s Hitchhiker

Omega itself in Transparent Newcomb

Agent in Transparent Newcomb

Pre-mortem

The agent insertion point did too much work

The agent-designer is outside of the environment. Cartesian!

We only find successful agents through search, and they are incomprehensible

Conclusion