The Pointers Problem

Consider an agent with a model of the world W. How does W relate to the real ~~world.~~world? W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f with W_chair ↦ R_chair.

The pointers problem refers to the fact that most humans would rather have an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing[citation needed].It was introduced in a post with the same name.

Consider an agent with a model of the world W. How does W relate to the real world. W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f with W_chair ↦ R_chair.

The pointers problem ~~refers~~ is about figuring out f.

In John's words (who introduced the concept here):

What functions of what variables (if any) in the environment and/or another world-model correspond to the ~~fact that most humans~~latent variables in the agent’s world-model?

This relates to alignment, as we would ~~rather have~~like an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. ~~It was introduced in~~ ~~a post with the same name~~.Therefore we'd like to figure out how to point to our values directly.

The pointers problem refers to the fact that most humans would rather have an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-~~knowing~~[citation needed].knowing. It was introduced in a post with the same name.

			v1.6.0Oct 18th 2024 GMT	(+6/-6)
			v1.5.0Dec 22nd 2023 GMT	(+538/-86)
			v1.4.0Aug 6th 2022 GMT	(+9/-25)
			v1.3.0May 31st 2021 GMT
			v1.2.0Dec 10th 2020 GMT	Tried to fix peculiar formatting issue.
			v1.1.0Dec 10th 2020 GMT	(+17671) Added a brief description.

			v1.6.0Oct 18th 2024 GMT	(+6/-6)
			v1.5.0Dec 22nd 2023 GMT	(+538/-86)
			v1.4.0Aug 6th 2022 GMT	(+9/-25)
			v1.3.0May 31st 2021 GMT
			v1.2.0Dec 10th 2020 GMT	Tried to fix peculiar formatting issue.
			v1.1.0Dec 10th 2020 GMT	(+17671) Added a brief description.

LESSWRONG
LW

LESSWRONG
LW

The Pointers Problem

The Pointers Problem