This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Aligned AI Proposals
•
Applied to
How to safely use an optimizer
by
Mateusz Bagiński
1mo
ago
•
Applied to
Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?
by
Benjamin Bourlier
2mo
ago
•
Applied to
Alignment in Thought Chains
by
Faust Nemesis
2mo
ago
•
Applied to
Update on Developing an Ethics Calculator to Align an AGI to
by
sweenesm
2mo
ago
•
Applied to
Requirements for a Basin of Attraction to Alignment
by
RogerDearnaley
3mo
ago
•
Applied to
Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
by
RogerDearnaley
3mo
ago
•
Applied to
Proposal for an AI Safety Prize
by
sweenesm
3mo
ago
•
Applied to
Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
by
RogerDearnaley
4mo
ago
•
Applied to
Moral realism and AI alignment
by
Caspar Oesterheld
4mo
ago
•
Applied to
Striking Implications for Learning Theory, Interpretability — and Safety?
by
RogerDearnaley
4mo
ago
•
Applied to
Safety First: safety before full alignment. The deontic sufficiency hypothesis.
by
RogerDearnaley
4mo
ago
•
Applied to
Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
by
RogerDearnaley
4mo
ago
•
Applied to
AI Alignment Metastrategy
by
Gunnar_Zarncke
4mo
ago
•
Applied to
Gaia Network: a practical, incremental pathway to Open Agency Architecture
by
Roman Leventov
4mo
ago
•
Applied to
Interpreting the Learning of Deceit
by
RogerDearnaley
4mo
ago
•
Applied to
Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
by
RogerDearnaley
4mo
ago
•
Applied to
Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment
by
RogerDearnaley
5mo
ago