Posts

Sorted by New

Wiki Contributions

Comments

Solid post, but I think in particular mentioning GPT distracts from the main point. GPT is a generative model with no reward function, meaning it has no goals that it's optimizing for. It's not engineered for reliability (or any other goal), so it's not meaningful to compare it's performance against humans in goal-oriented tasks.

If you assume a 1:20,000 random suicide rate and that 40% of people can kill themselves in a minute (roughly, the US gun ownership rate), then the rate of not doing it per decision is ~20,000 * 60 * 16 * 365 * 0.4 = 1:3,000,000,000, or ~99.99999997%.

IIUC, people aren't deciding whether to kill themselves once a minute, every minute. The thought only comes up when things are really rough, and thinking about it can take hours or days. That's probably a nitpick.

More importantly, an agent optimizing for not intentionally shooting itself in the face would probably be much more reliable at it than a human. It just has to sit still.

If you look at RL agents in simulated environments where death is possible (e.g. Atari games), the top agents outperform most human counterparts at not dying in most games. E.g. the MuZero average score in Space Invaders is several times higher than the average human baseline, which would require it die less often on average.

So when an agent is trained to not die, it can be very efficient at it.