0 Thou shalt not command an alighned AI

by Martin Vlach

11th May 2025

2 min read

4

0

Agent FoundationsOrthogonality ThesisAI

Frontpage

0

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:24 PM

[-]Robert Cousineau6mo30

Here is a copy edited version from Claude:

Sorry, You Should Not Command the Aligned AI
By Martin Vlach, Benjamin Schmidt
May 11, 2025
2 min read

Benjamin slumps in his chair, visibly tired. "I don't think we even know what alignment is. We can't even define it properly."

I straighten up across the table at the Mediterranean restaurant. "I disagree. Give me three seconds and I can define it."

"Fine," he says after a pause.

"Can we narrow it to alignment of AI to humans?" I ask.

"Yes, let's narrow it to alignment of one AI to one person."

"The AI is aligned if you give it a goal and it pursues that goal without modifying it with its own intentions or goals."

Benjamin frowns. "That sounds far too abstract."

"In what sense?"

"Like the goal—what is that, more precisely?"

"A state of the world you want to achieve, or a series of states."

"But how would you specify that?"

"You can describe it in infinitely many ways. There's a scale of detail you can choose, which implies a level of approximation of the state."

"That won't describe the state completely, though?"

"Well, maybe if you could describe to the quantum state level, but that's obviously impractical."

"So then the AI must somehow interpret your goal, right?"

"Not exactly, but you mean it would have to interpolate to fill in the under-specified parts of your goal description?"

"Yes, that's a good way to put it."

"Then what we've discovered is another axis, orthogonal to alignment, which controls to what level of under-specification we want the AI to interpolate versus where it needs to ask you to fill in gaps before pursuing your goal."

"We can't be saying 'Create a picture of a dog' and then need to specify each pixel."

"Of course not. But perhaps the AI should ask whether you want the picture on paper or digitally, using a reasonable threshold for necessary clarification."

"People want things they don't actually need though..."

"And they can end up in a bad state even with an aligned AI."

"So how do you make alignment guarantee good outcomes? People are stupid..."

"And that's on them. You can call it incompetence, but I'd call it misuse."

Reply

[-]Martin Vlach6mo*10

You mean the chevrons like this is non-standard, but also sub-standard, although it has the neat property to represent >Speaker one< and >>Speaker two<<? I can see the typography of those here is meh at best.-\

Reply

[-]Robert Cousineau6mo20

I personally have not seen that style of writing dialogue before, and did not recognize that was what you were doing until reading this comment from you. It along with the typos made it difficult for me to understand, so I had Claude copy edit it for me (and then figured maybe someone else would find that useful).

Reply

[-]Robert Cousineau6mo20

In response to what I understand to be your question ("So what do you do to make the alignment guarantee good outcomes? People are stupid.."), I think one commonly accepted answer here is:

Yes, that is a real problem. Something like CEV offers a solution (with a spherical cow, in a vacuum).

There is also a useful differentiation to be made between Inner Alignment and Outer Alignment.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

0

Thou shalt not command an alighned AI

0

0