LESSWRONG
LW

Wikitags

AI

Edited by plex, Ruby, Ben Pace, jimrandomh, et al. last updated 23rd Jan 2025

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also .

Basic Alignment Theory


































Engineering Alignment



















 

Organizations















Strategy














 Other











Subscribe
4
Subscribe
4
Shard Theory
Tool AI
Coherent Extrapolated Volition
Decision Theory
Deceptive Alignment
Interpretability (ML & AI)
Value Learning
AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research
AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development
AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas
Discussion3
Discussion3
AIXI
General Intelligence
Complexity of Value
Corrigibility
Logical Induction
Intelligence Explosion
Gradient Hacking
Outer Alignment
Multipolar Scenarios
Optimization
Infra-Bayesianism
Mesa-Optimization
Logical Uncertainty
Transformative AI
Superintelligence
AI-assisted Alignment
Whole Brain Emulation
Debate (AI safety technique)
Inverse Reinforcement Learning
Embedded Agency
Symbol Grounding
Recursive Self-Improvement
Eliciting Latent Knowledge
Oracle AI
AI Boxing (Containment)
Iterated Amplification
RLHF
Paperclip Maximizer
Goal-Directedness
Myopia
Power Seeking (AI)
Utility Functions
Simulator Theory
Inner Alignment
Goodhart's Law
Sharp Left Turn
Solomonoff Induction
Humans Consulting HCH
Factored Cognition
Reward Functions
Agent Foundations
Impact Measures
Newcomb's Problem
Instrumental Convergence
Mild Optimization
Orthogonality Thesis
Posts tagged AI
945AGI Ruin: A List of Lethalities
Ω
Eliezer Yudkowsky
3y
Ω
711
152There's No Fire Alarm for Artificial General Intelligence
Eliezer Yudkowsky
7y
72
139Superintelligence FAQ
Scott Alexander
9y
39
220An overview of 11 proposals for building safe advanced AI
Ω
evhub
5y
Ω
37
187Risks from Learned Optimization: Introduction
Ω
evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse, Scott Garrabrant
6y
Ω
42
436What failure looks like
Ω
paulfchristiano
6y
Ω
55
234Embedded Agents
Ω
abramdemski, Scott Garrabrant
7y
Ω
42
228The Rocket Alignment Problem
Ω
Eliezer Yudkowsky
7y
Ω
44
124Challenges to Christiano’s capability amplification proposal
Ω
Eliezer Yudkowsky
7y
Ω
54
210Embedded Agency (full-text version)
Ω
Scott Garrabrant, abramdemski
7y
Ω
17
158Biology-Inspired AGI Timelines: The Trick That Never Works
Ω
Eliezer Yudkowsky
4y
Ω
142
55A space of proposals for building safe advanced AI
Ω
Richard_Ngo
5y
Ω
4
260larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
4y
Ω
31
131Robustness to Scale
Ω
Scott Garrabrant
7y
Ω
23
86Deepmind's Gopher--more powerful than GPT-3
Ω
hath
4y
Ω
26
Load More (15/10834)
Add Posts