x

LESSWRONG
LW

The Pointers Problem — LessWrong

The Pointers Problem

Edited by Johannes C. Mayer, 1point7point4, et al. last updated 18th Oct 2024

Consider an agent with a model of the world W. How does W relate to the real world? W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f with W_chair ↦ R_chair.

The pointers problem is about figuring out f.

In John's words (who introduced the concept here):

What functions of what variables (if any) in the environment and/or another world-model correspond to the latent variables in the agent’s world-model?

This relates to alignment, as we would like an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. Therefore we'd like to figure out how to point to our values directly.

1

1

Posts tagged The Pointers Problem

10

61The Pointers Problem: Clarifications/Variations

5y

16

8

132The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

5y

50

3

72Don't design agents which exploit adversarial inputs

TurnTrout, Garrett Baker

3y

64

2

116Robust Delegation

abramdemski, Scott Garrabrant

7y

10

2

62Alignment allows "nonrobust" decision-influences and doesn't require robust grading

3y

41

2

48Don't align agents to evaluations of plans

3y

49

2

46[Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation

4y

11

2

41The Pointer Resolution Problem

2y

20

2

33People care about each other even though they have imperfect motivational pointers?

3y

25

2

24Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

8y

9

2

20Stable Pointers to Value III: Recursive Quantilization

7y

4

2

19Stable Pointers to Value II: Environmental Goals

8y

3

1

42Updating Utility Functions

JustinShovelain, Joar Skalse

4y

6

1

39Human sexuality as an interesting case study of alignment

3y

26

1

16Half-baked idea: a straightforward method for learning environmental goals?

10mo

7

Load More (15/17)

Add Posts