Basic Alignment Theory AIXI Coherent Extrapolated Volition Complexity of Value Corrigibility Deceptive Alignment Decision Theory Embedded Agency Fixed Point Theorems Goodhart's Law Goal-Directedness Gradient Hacking Infra-Bayesianism Inner Alignment Instrumental Convergence Intelligence Explosion Logical Induction Logical Uncertainty Mesa-Optimization Multipolar Scenarios Myopia Newcomb's Problem Optimization Orthogonality Thesis Outer Alignment Paperclip Maximizer Power Seeking (AI) Recursive Self-Improvement Simulator Theory Sharp Left Turn Solomonoff Induction Superintelligence Symbol Grounding Transformative AI Treacherous Turn Utility Functions Whole Brain Emulation | Engineering Alignment Agent Foundations AI-assisted Alignment AI Boxing (Containment) Conservatism (AI) Debate (AI safety technique) Eliciting Latent Knowledge Factored Cognition Humans Consulting HCH Impact Measures Inverse Reinforcement Learning Iterated Amplification Mild Optimization Oracle AI Reward Functions RLHF Shard Theory Tool AI Transparency / Interpretability Tripwire Value Learning | Organizations AI Safety Camp Alignment Research Center Anthropic Apart Research AXRP CHAI (UC Berkeley) Conjecture (org) DeepMind FHI (Oxford) Future of Life Institute MIRI OpenAI Ought Redwood Research SERI MATS |