LESSWRONG
LW

210
Yann Dubois
4110
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois3y*10

To be clear I was actually not interpreting the output “at face-value”. Quite the contrary: I was saying that ChatGPT gave this answer because it simply predicts the most likely answer between (next token prediction) between a human and agent, and given that it was trained on AI risk style arguments (or sci-fi ) this is the most likely output.

But this made me think of the longer-term question "what could be the consequences of training an AI on those arguments". Usually, the “instrumental goal” argument supposes that the AI is so “smart” that it would learn that “not being turned off” is necessary. If it is trained on these types of arguments, it could “realize” this much sooner.

Btw even though GPT doesn't "mean" what it says, it could still lead to actions that interpret exactly what it says. For example, many current RL algorithms use the LM's output as high-level planning. This might continue in the near future ...

Reply
5Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Q
3y
Q
6