I mostly don't think about corrigibility. But when I do, it's generally to label departures from an AI having agenty structure. Some people like to think about corrigibility in terms of stimulus-response patterns, or behavioral guarantees, or as a grab-bag of doomed attempts to give orders to something smarter than you without understanding it. These are all fine too.

I definitely think about values more in terms of abstract states. By "abstract" I mean that states don't have to be specific states of the universe's quantum wavefunction, they can be anything that fills the role of "state" in a hierarchical set of models of the world.

It's not that I'm hardcore committed to an AI never learning values that are about process. But I tend to think of even those in terms of state - as the AI having a model of itself and its own decision-making that it can control variables of like "how do I make decisions?" (Or even as vague as "Am I being good and just?")

Basically this is because I think that some state-based preferences are really important, and once you have those, deontological rules that have no grounding in state whatsoever are unnatural.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

6

[ Question ]

Simple question about corrigibility and values in AI.

6

6

1 Answers sorted by
top scoring

Oct 22, 2022

6

[ Question ]

Simple question about corrigibility and values in AI.

6

6

1 Answers sorted by top scoring

Oct 22, 2022

1 Answers sorted by
top scoring