LESSWRONG
LW

Wikitags

AI

Edited by plex, Ruby, Ben Pace, jimrandomh, et al. last updated 23rd Jan 2025

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also .

Basic Alignment Theory


































Engineering Alignment



















 

Organizations















Strategy














 Other











Subscribe
4
Subscribe
4
Infra-Bayesianism
Embedded Agency
Gradient Hacking
Tool AI
Interpretability (ML & AI)
Value Learning
Goodhart's Law
AI Safety Camp
Alignment Research Center
Coherent Extrapolated Volition
Inner Alignment
Logical Induction
Logical Uncertainty
Myopia
Mesa-Optimization
Multipolar Scenarios
Intelligence Explosion
Decision Theory
Shard Theory
Deceptive Alignment
Goal-Directedness
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research
AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development
AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas
Discussion3
Discussion3
Complexity of Value
AIXI
Corrigibility
Newcomb's Problem
Instrumental Convergence
General Intelligence
Power Seeking (AI)
Simulator Theory
Outer Alignment
Recursive Self-Improvement
AI-assisted Alignment
Reward Functions
Transformative AI
Utility Functions
Optimization
RLHF
Sharp Left Turn
Impact Measures
Symbol Grounding
Humans Consulting HCH
Debate (AI safety technique)
Paperclip Maximizer
Eliciting Latent Knowledge
Oracle AI
Iterated Amplification
Inverse Reinforcement Learning
Solomonoff Induction
AI Boxing (Containment)
Factored Cognition
Agent Foundations
Whole Brain Emulation
Mild Optimization
Superintelligence
Orthogonality Thesis
Posts tagged AI
492A case for courage, when speaking of AI danger
So8res
7d
118
478New Endorsements for “If Anyone Builds It, Everyone Dies”
Malo
1mo
55
164So You Think You've Awoken ChatGPT
JustisMills
4d
33
636Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
So8res
2mo
113
140the jackpot age
thiccythot
3d
10
154Lessons from the Iraq War for AI policy
Buck
4d
23
477What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
2mo
15
343A deep critique of AI 2027’s bad timeline models
titotal
25d
39
656AI 2027: What Superintelligence Looks Like
Ω
Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo
3mo
Ω
222
357the void
Ω
nostalgebraist
1mo
Ω
104
269Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
10d
Ω
102
103Vitalik's Response to AI 2027
Daniel Kokotajlo
3d
36
138Why Do Some Language Models Fake Alignment While Others Don't?
Ω
abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
6d
Ω
14
186Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild
Adam Karvonen, Sam Marks
12d
25
287Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)
Ω
LawrenceC
1mo
Ω
19
Load More (15/10852)
Add Posts