I'm looking for a name for a problem. I expect it already has one, but I don't know what it is.
The problem: suppose we have an AI trying to learn what people want - e.g. an IRL variant. Intuitively speaking, we point at a bunch of humans and say “figure out what they want, then do that”. A few possible ways the AI could respond:
- “Hmm, to the extent that those things have utility functions, it looks like they want friendship, challenge, status, etc…”
- “Hmm, it looks like they want to maximize the number of copies of the information-carrying molecules in their cells.”
- “Hmm, it looks like they’re trying to maximize entropy in the universe.”
- “Hmm, it looks like they’re trying to minimize physical action.”
Why would the AI think these things? Well, you’re pointing at a bunch of atoms, and the microscopic laws of motion which govern those atoms can be interpreted as minimizing a quantity called action. Or you’re pointing at a bunch of organisms subject to a selection process which (locally) maximizes the number of copies of some information-carrying molecules. How is the AI supposed to know which optimization process you’re pointing to? How can it know which level of abstraction you’re talking about?
What data could tell the AI that you're pointing at humans, not the atoms they're made of?
This sounds like a question which would already have a name, so if anybody could point me to that name, I'd appreciate it.