LESSWRONG
LW

Wikitags

AI-Assisted Alignment

Edited by habryka, niplav, et al. last updated 20th May 2025

AI-Assisted Alignment is a cluster of alignment plans that involve AI somehow significantly helping with alignment research. This can include weak tool AI, or more advanced AGI doing original research.

There has been some debate about how practical this alignment approach is.

AI systems will likely try to solve alignment for their modifications and/or successors during a phase of self-improvement.

Other search terms for this tag: AI aligning AI, automated AI alignment, automated alignment research

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged AI-Assisted Alignment
64A "Bitter Lesson" Approach to Aligning AGI and ASI
Ω
RogerDearnaley
1y
Ω
41
41Requirements for a Basin of Attraction to Alignment
Ω
RogerDearnaley
2y
Ω
12
30The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Ω
RogerDearnaley
3mo
Ω
34
131We have to Upgrade
Jed McCaleb
2y
35
98[Link] Why I’m optimistic about OpenAI’s alignment approach
Ω
janleike
3y
Ω
15
14Proposed Alignment Technique: OSNR (Output Sanitization via Noising and Reconstruction) for Safer Usage of Potentially Misaligned AGI
sudo
2y
9
332Cyborgism
Ω
NicholasKees, janus
3y
Ω
47
107Beliefs and Disagreements about Automating Alignment Research
Ω
Ian McKenzie
3y
Ω
4
65How to Control an LLM's Behavior (why my P(DOOM) went down)
Ω
RogerDearnaley
2y
Ω
30
53[Link] A minimal viable product for alignment
Ω
janleike
3y
Ω
38
9Infinite Possibility Space and the Shutdown Problem
magfrump
3y
0
267Discussion with Nate Soares on a key alignment difficulty
Ω
HoldenKarnofsky
2y
Ω
43
175[Linkpost] Introducing Superalignment
Ω
beren
2y
Ω
69
168Davidad's Bold Plan for Alignment: An In-Depth Explanation
Ω
Charbel-Raphaël, Gabin
2y
Ω
40
160Agentized LLMs will change the alignment landscape
Seth Herd
2y
102
Load More (15/106)
Add Posts