LESSWRONGTags
LW

AI-assisted/AI automated Alignment

EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
AI-assisted/AI automated Alignment
Random Tag
Contributors
6Ruby

Not obviously the best name for this tag, but maybe good to explore/rename. Wiki-tags are publicly editable!

Posts tagged AI-assisted/AI automated Alignment
4
14Proposed Alignment Technique: OSNR (Output Sanitization via Noising and Reconstruction) for Safer Usage of Potentially Misaligned AGI
sudo -i
4mo
9
3
326Cyborgism
Ω
NicholasKees, janus
7mo
Ω
43
3
100Beliefs and Disagreements about Automating Alignment Research
Ω
Ian McKenzie
1y
Ω
4
2
173[Linkpost] Introducing Superalignment
Ω
beren
3mo
Ω
68
2
144Godzilla Strategies
Ω
johnswentworth
1y
Ω
67
2
126We have to Upgrade
Jed McCaleb
6mo
34
2
95Cyborg Periods: There will be multiple AI transitions
Ω
Jan_Kulveit, rosehadshar
7mo
Ω
8
2
55My thoughts on OpenAI's alignment plan
Akash
9mo
3
2
53Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)
Shoshannah Tekofsky
8mo
6
2
45A survey of tool use and workflows in alignment research
Ω
Logan Riggs, Jan, janus, jacquesthibs
2y
Ω
4
2
45What specific thing would you do with AI Alignment Research Assistant GPT?
Q
quetzal_rainbow, janus
9mo
Q
9
2
27[Linkpost] Jan Leike on three kinds of alignment taxes
Akash
9mo
2
2
21Model-driven feedback could amplify alignment failures
Ω
aogara
8mo
Ω
1
2
16AI-assisted alignment proposals require specific decomposition of capabilities
RobertM
6mo
2
2
16Discussion on utilizing AI for alignment
elifland
1y
3
Load More (15/53)
Add Posts