This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Aligned AI Proposals
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Aligned AI Proposals
Random Tag
Contributors
Posts tagged
Aligned AI Proposals
Most Relevant
5
64
How to Control an LLM's Behavior (why my P(DOOM) went down)
Ω
RogerDearnaley
6mo
Ω
30
5
46
Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
Ω
RogerDearnaley
4mo
Ω
8
5
35
Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley
4mo
4
5
22
Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley
4mo
4
5
21
Requirements for a Basin of Attraction to Alignment
Ω
RogerDearnaley
3mo
Ω
6
5
4
Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
RogerDearnaley
4mo
15
4
30
Interpreting the Learning of Deceit
Ω
RogerDearnaley
5mo
Ω
10
2
161
A list of core AI safety problems and how I hope to solve them
Ω
davidad
9mo
Ω
26
2
113
AI Alignment Metastrategy
Ω
Vanessa Kosoy
5mo
Ω
12
2
47
Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Ω
Chipmonk
5mo
Ω
3
2
45
an Evangelion dialogue explaining the QACI alignment plan
Ω
Tamsin Leake
1y
Ω
15
2
31
We have promising alignment plans with low taxes
Ω
Seth Herd
6mo
Ω
9
2
27
The (partial) fallacy of dumb superintelligence
Ω
Seth Herd
7mo
Ω
5
2
11
Two paths to win the AGI transition
Nathan Helm-Burger
10mo
8
2
8
Desiderata for an AI
Nathan Helm-Burger
10mo
0