You are viewing revision 2.24.0, last edited by plex

Artificial Intelligence is the study of creating intelligence in algorithms. On LessWrong, the primary focus of AI discussion is to ensure that as humanity builds increasingly powerful AI systems, the outcome will be good. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Recursive Self-Improvement
Solomonoff Induction
Treacherous Turn
Utility Functions

Engineering Alignment

AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
Tool AI
Transparency / Interpretability
Tripwire
Value Learning

 

Strategy

AI Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Transformative AI

Organizations

AI Safety Camp
Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
Future of Life Institute
Machine Intelligence Research Institute
OpenAI
Ought

 

Other

AI Capabilities
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Reinforcement Learning
Research Agendas 
Superintelligence
Whole Brain Emulation