LESSWRONG
LW

Wikitags

AI

Written by plex, Ruby, Ben Pace, jimrandomh, et al. last updated 23rd Jan 2025

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also .

Basic Alignment Theory


































Engineering Alignment



















 

Organizations















Strategy














 Other











Subscribe
4
Subscribe
4
Decision Theory
Coherent Extrapolated Volition
Tool AI
Interpretability (ML & AI)
Value Learning
AI Safety Camp
Alignment Research Center
Anthropic
Deceptive Alignment
Embedded Agency
Shard Theory
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research
AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development
AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas
Discussion3
Discussion3
Gradient Hacking
Goodhart's Law
Factored Cognition
Impact Measures
Humans Consulting HCH
Oracle AI
Iterated Amplification
Sharp Left Turn
Inverse Reinforcement Learning
Reward Functions
AI Boxing (Containment)
Logical Induction
Intelligence Explosion
Optimization
Eliciting Latent Knowledge
Myopia
Power Seeking (AI)
Transformative AI
Superintelligence
Utility Functions
Outer Alignment
Paperclip Maximizer
Symbol Grounding
AI-assisted Alignment
Simulator Theory
General Intelligence
AIXI
Complexity of Value
Corrigibility
RLHF
Solomonoff Induction
Agent Foundations
Goal-Directedness
Debate (AI safety technique)
Whole Brain Emulation
Recursive Self-Improvement
Mesa-Optimization
Multipolar Scenarios
Infra-Bayesianism
Inner Alignment
Logical Uncertainty
Posts tagged AI
945AGI Ruin: A List of Lethalities
Ω
Eliezer Yudkowsky
3y
Ω
711
152There's No Fire Alarm for Artificial General Intelligence
Eliezer Yudkowsky
7y
72
139Superintelligence FAQ
Scott Alexander
9y
39
220An overview of 11 proposals for building safe advanced AI
Ω
evhub
5y
Ω
37
187Risks from Learned Optimization: Introduction
Ω
evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse, Scott Garrabrant
6y
Ω
42
436What failure looks like
Ω
paulfchristiano
6y
Ω
55
234Embedded Agents
Ω
abramdemski, Scott Garrabrant
7y
Ω
42
228The Rocket Alignment Problem
Ω
Eliezer Yudkowsky
7y
Ω
44
124Challenges to Christiano’s capability amplification proposal
Ω
Eliezer Yudkowsky
7y
Ω
54
210Embedded Agency (full-text version)
Ω
Scott Garrabrant, abramdemski
7y
Ω
17
158Biology-Inspired AGI Timelines: The Trick That Never Works
Ω
Eliezer Yudkowsky
4y
Ω
142
55A space of proposals for building safe advanced AI
Ω
Richard_Ngo
5y
Ω
4
260larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
4y
Ω
31
131Robustness to Scale
Ω
Scott Garrabrant
7y
Ω
23
86Deepmind's Gopher--more powerful than GPT-3
Ω
hath
4y
Ω
26
Load More (15/10810)
Add Posts
Mild Optimization
Instrumental Convergence
Newcomb's Problem
Orthogonality Thesis