LESSWRONG
LW

1298
Wikitags

AI

Edited by plex, Ruby, Ben Pace, jimrandomh, et al. last updated 23rd Jan 2025

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Debate (AI safety technique)
Eliciting Latent Knowledge
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Interpretability (ML & AI)
Value Learning
 

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

 Other

AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

Subscribe
Discussion
10
Subscribe
Discussion
10
Posts tagged AI
658The Rise of Parasitic AI
Adele Lopez
1mo
175
262Towards a Typology of Strange LLM Chains-of-Thought
1a3orn
5d
21
191If Anyone Builds It Everyone Dies, a semi-outsider review
dvd
5d
43
133The "Length" of "Horizons"
Adam Scholl
2d
21
336Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures
Charbel-Raphaël
1mo
27
475How Does A Blind Model See The Earth?
henry
2mo
38
648Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
So8res
5mo
114
291“If Anyone Builds It, Everyone Dies” release day!
alexvermeer
1mo
3
360AI Induced Psychosis: A shallow investigation
Ω
Tim Hua
1mo
Ω
43
237Reasons to sell frontier lab equity to donate now rather than later
Daniel_Eth, Ethan Perez, ryan_greenblatt
22d
33
521A case for courage, when speaking of AI danger
So8res
3mo
128
361Four ways learning Econ makes people dumber re: future AI
Ω
Steven Byrnes
19d
Ω
49
109Recontextualization Mitigates Specification Gaming Without Modifying the Specification
Ω
ariana_azarbal, vgillioz, TurnTrout, cloud
5d
Ω
11
665AI 2027: What Superintelligence Looks Like
Ω
Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo
6mo
Ω
222
249The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).
Katalina Hernandez
1mo
63
Load More (15/11650)
Add Posts