AI agents and painted facades

Agents are much cheaper and faster to run than humans, so the amount of information and interactions that humans need to oversee will drastically increase.

But 1, 2, 4 are issues humans already face with managing other humans.

A decent chunk of people are "RL'd" into doing the bare minimum and playing organizational politics to optimize their effort:reward ratio.
Many employees are treated as fungible and are given limited context into their entire org. Even in a fully transparent org, if it's large enough, then it's impossible for everything to be in an individual's context anyway.
People fail all the time in quite surprising and subtle ways! We're just really bad at documenting and noticing most of them since we usually only explore and share really catastrophic failures. If someone loses ~$100, it's whatever. When someone loses ~$100M dollars, it's a lawsuit.

This is a long winded way of saying: I'm optimistic we can address managing agents (with their current capabilities) by drawing analogies to how we already do management in effective organizations, and find avenues to scale these management strategies 100x (or however many OOMs one might think we'll need).

[-]Karl Krueger2mo00

society depends on a loop of humans managing other humans.

What does this mean? "Managing" is typically a hierarchical relationship, but "loop" implies something cyclical rather than hierarchical.

[-]t14n2mo10

I interpreted loop as referring to an "OODA loop" where managers are observing those they are managing, delegating and action, and then waiting for feedback to then go back to the beginning of the loop.

e.g. nowadays, I delegate a decent chunk of implementation work to coding agents and the "loop" is me giving them a task, letting them riff on it for a few minutes, and then reviewing output before requesting changes or committing them in.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

38

AI agents and painted facades

38

38

Why is it hard to build good evals?

From evals to monitoring

Towards systems that work