🧛‍♂️ 💊
Consider the parable of the vampire pill: would you take a vampire pill that would give you great strength and youth and great intelligence and great hair etc, but would invert your values such that you torturing people forever, starting with whoever you care most about now and slowly moving down the list? Then once they’re nearly dead, forcing them to take the vampire-pill, to propagate the wave of torture-murder further? Vampire-you will feel great about it, vampire-you will experience great positive utility in their frame. Vampires automatically prefer whatever would most deeply hurt their former selves. So if you care more about art, science, or literature at the same level... (read 580 more words →)

3

8

4

•••

2

•••

2

Replying toAlignment first, intelligence later

emmett9mo

Alignment first, intelligence later

You are completely correct. This approach cannot possibly create an AI that matches a fixed specification.

This is intentional, because any fixed specification of Goodness is a model of Goodness. All models are wrong (some are useful) and therefore break when sufficiently far out of distribution. Therefore constraining a model to follow a specification is, in the case of something as out of distribution as an ASI, guaranteeing bad behavior.

You can try to leave an escape hatch with corrigibility. In the limit I believe it is possible to slave an AI model to your will, basically By making it’s model of the Good be whatever the model thinks you want (or doing whatever... (read more)

1

22

7

•••

4

•••

4