LESSWRONG
LW

2700
Wikitags

AI

Edited by plex, Ruby, Ben Pace, jimrandomh, et al. last updated 23rd Jan 2025

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Debate (AI safety technique)
Eliciting Latent Knowledge
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Interpretability (ML & AI)
Value Learning
 

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

 Other

AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

Subscribe
Discussion
10
Subscribe
Discussion
10
Posts tagged AI
2
1Toddler Shoggoth Has Plenty Of Raw Material (The Memetic Cocoon Threat Model)
KAP
4h
1
1
2What the new generation of AI believers sees
ParrotRobot
5h
0
1
13What would my 12-year-old self think of agent foundations?
Alex_Altair
5h
0
1
10Transformative AI expectations are not new
ParrotRobot
9h
0
1
16Wiki AI
abramdemski
10h
0
2
0Darwin’s LLMs - Natural Selection is Already Shaping AI
Ben Turtel
15h
0
2
-1Biology of the Living - A Conversation with two generations of Google AI
matthew allen
17h
1
2
6The new Pluribus TV show is a great and unusual analogy for AI.
AGoyet
17h
0
1
20Considering the Relevance of Computational Uncertainty for AI Safety
Cole Wyeth
17h
1
1
31Diagonalization: A (slightly) more rigorous model of paranoia
habryka
1d
10
1
2Support the Movement against AI extinction risk
samuelshadrach
1d
1
1
69AI safety undervalues founders
Ryan Kidd
1d
56
2
31Why does ChatGPT think mammoths were alive in December?
Steffee
1d
0
2
20AI loves octopuses
Sean Herrington
1d
15
1
6Private Latent Notation and AI-Human Alignment
Robert Shuler
2d
1
Load More (15/11929)
Add Posts