LESSWRONG
LW

Wikitags

Aligned AI Proposals

Edited by Dakara last updated 4th Jan 2025

Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment).

The main goal of these proposals is to ensure that AI systems will, all things considered, benefit humanity.

Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Aligned AI Proposals
30The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Ω
RogerDearnaley
3mo
Ω
34
64A "Bitter Lesson" Approach to Aligning AGI and ASI
Ω
RogerDearnaley
1y
Ω
41
34Why Aligning an LLM is Hard, and How to Make it Easier
Ω
RogerDearnaley
7mo
Ω
3
65How to Control an LLM's Behavior (why my P(DOOM) went down)
Ω
RogerDearnaley
2y
Ω
30
48Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
Ω
RogerDearnaley
2y
Ω
8
41Requirements for a Basin of Attraction to Alignment
Ω
RogerDearnaley
2y
Ω
12
37Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley
2y
4
35Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley
2y
4
16Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
RogerDearnaley
2y
15
30Interpreting the Learning of Deceit
Ω
RogerDearnaley
2y
Ω
14
165A list of core AI safety problems and how I hope to solve them
Ω
davidad
2y
Ω
29
124AI Alignment Metastrategy
Ω
Vanessa Kosoy
2y
Ω
13
54How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith
10mo
5
48Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Ω
Chris Lakin
2y
Ω
3
44We have promising alignment plans with low taxes
Ω
Seth Herd
2y
Ω
9
Load More (15/56)
Add Posts