3081

LESSWRONG
LW

3080
Self Fulfilling/Refuting PropheciesAI
Frontpage

22

[ Question ]

Examples of self-fulfilling prophecies in AI alignment?

by Chris Lakin
3rd Mar 2025
1 min read
A
6
7

22

22

Examples of self-fulfilling prophecies in AI alignment?
9Chris Lakin
8Chris Lakin
7Chris Lakin
3Xavi CF
3DivineMango
3Chris Lakin
2Chris Lakin
1[comment deleted]
New Answer
New Comment

6 Answers sorted by
top scoring

Chris Lakin

Mar 03, 2025

90

https://x.com/sama/status/1621621724507938816 

Add Comment

Chris Lakin

Mar 03, 2025

80

Training on Documents About Reward Hacking Induces Reward Hacking

Add Comment

Chris Lakin

Mar 03, 2025

70

Situational Awareness and race dynamics? h/t Jan Kulveit @Jan_Kulveit 

Add Comment
[-]Xavi CF6mo30

Situational Awareness probably caused Project Stargate to some extent. Getting the Republican party to take AI seriously enough to let them launch in the White House is no joke and less likely without the essay.

It also started the website-essay meta which is part of why AI 2027, The Compendium, and Gradual Disempowerment all launched the way they did, so there are knock-on effects too.

Reply

DivineMango

Apr 03, 2025

30

Superintelligence Strategy is pretty explicitly trying to be self-fulfilling, e.g. "This dynamic stabilizes the strategic landscape without lengthy treaty negotiations—all that is necessary is that states collectively recognize their strategic situation" (which this paper popularly argues exists in the first place)

Add Comment

Chris Lakin

Apr 03, 2025*

30

https://x.com/saffronhuang/status/1907863453009867183

Add Comment

Chris Lakin

Jul 16, 2025

20

Grok prompts lately, kinda "Don't think about elephants"

  • https://x.com/bitcloud/status/1942792945238983022 
Add Comment
[+][comment deleted]6mo10
Moderation Log
More from Chris Lakin
View more
Curated and popular this week
A
6
0
Self Fulfilling/Refuting PropheciesAI
Frontpage
Deleted by DivineMango, 04/03/2025

Like Self-fulfilling misalignment data might be poisoning our AI models, what are historical examples of self-fulfilling prophecies that have affected AI alignment and development?

Put a few potential examples below to seed discussion.