LESSWRONGTags
LW

AI-assisted/AI automated Alignment

EditHistory
Discussion (0)
Help improve this page (1 flag)
EditHistory
Discussion (0)
Help improve this page (1 flag)
AI-assisted/AI automated Alignment
Random Tag
Contributors
6Ruby

Not obviously the best name for this tag, but maybe good to explore/rename. Wiki-tags are publicly editable!

Posts tagged AI-assisted/AI automated Alignment
4
9Proposed Alignment Technique: OSNR (Output Sanitization via Noising and Reconstruction) for Safer Usage of Potentially Misaligned AGI
sudo -i
18d
9
3
321CyborgismΩ
NicholasKees, janus
4mo
Ω
43
3
99Beliefs and Disagreements about Automating Alignment ResearchΩ
Ian McKenzie
10mo
Ω
4
2
144Godzilla StrategiesΩ
johnswentworth
1y
Ω
66
2
109We have to Upgrade
Jed McCaleb
3mo
33
2
87Cyborg Periods: There will be multiple AI transitionsΩ
Jan_Kulveit, rosehadshar
4mo
Ω
8
2
54My thoughts on OpenAI's alignment plan
Akash
6mo
3
2
52Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)
Shoshannah Tekofsky
5mo
6
2
45What specific thing would you do with AI Alignment Research Assistant GPT?Q
quetzal_rainbow, janus
5mo
Q
9
2
45A survey of tool use and workflows in alignment researchΩ
Logan Riggs, Jan, janus, jacquesthibs
1y
Ω
4
2
27[Linkpost] Jan Leike on three kinds of alignment taxes
Akash
5mo
2
2
21Model-driven feedback could amplify alignment failuresΩ
aogara
5mo
Ω
1
2
16Discussion on utilizing AI for alignment
elifland
10mo
3
2
16AI-assisted alignment proposals require specific decomposition of capabilities
RobertM
3mo
2
2
15Making it harder for an AGI to "trick" us, with STVsΩ
Tor Økland Barstad
1y
Ω
5
Load More (15/47)
Add Posts