x

LESSWRONG

LW

AI — LessWrong

AI

Edited by plex, Ruby, Ben Pace, jimrandomh, et al. last updated 23rd Jan 2025

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Debate (AI safety technique)
Eliciting Latent Knowledge
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Interpretability (ML & AI)
Value Learning

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

Other

AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

Add Posts

12

12

Posts tagged AI

31

976AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

4y

715

30

153There's No Fire Alarm for Artificial General Intelligence

Eliezer Yudkowsky

8y

72

23

143Superintelligence FAQ

Scott Alexander

10y

39

20

222An overview of 11 proposals for building safe advanced AI

6y

37

19

188Risks from Learned Optimization: Introduction

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse, Scott Garrabrant

7y

42

18

452What failure looks like

paulfchristiano

7y

55

18

245Embedded Agents

abramdemski, Scott Garrabrant

7y

42

16

245The Rocket Alignment Problem

Eliezer Yudkowsky

8y

44

15

127Challenges to Christiano’s capability amplification proposal

Eliezer Yudkowsky

8y

54

14

222Embedded Agency (full-text version)

Scott Garrabrant, abramdemski

7y

17

14

160Biology-Inspired AGI Timelines: The Trick That Never Works

Eliezer Yudkowsky

4y

151

14

55A space of proposals for building safe advanced AI

6y

4

11

261larger language models may disappoint you [or, an eternally unfinished draft]

4y

31

11

144Robustness to Scale

Scott Garrabrant

8y

23

11

87Deepmind's Gopher--more powerful than GPT-3

4y

26

Load More (15/13560)

Add Posts