This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Power Seeking (AI)
•
Applied to
Steering Llama-2 with contrastive activation additions
by
TurnTrout
6mo
ago
•
Applied to
Natural Abstraction: Convergent Preferences Over Information Structures
by
paulom
8mo
ago
•
Applied to
You can't fetch the coffee if you're dead: an AI dilemma
by
hennyge
10mo
ago
•
Applied to
The Game of Dominance
by
Karl von Wendt
10mo
ago
•
Applied to
Incentives from a causal perspective
by
tom4everitt
1y
ago
•
Applied to
Instrumental Convergence? [Draft]
by
Dan H
1y
ago
•
Applied to
Categorical-measure-theoretic approach to optimal policies tending to seek power
by
Vika
1y
ago
•
Applied to
My Overview of the AI Alignment Landscape: Threat Models
by
Michelle Viotti
1y
ago
•
Applied to
Ideas for studies on AGI risk
by
dr_s
1y
ago
•
Applied to
Instrumental convergence in single-agent systems
by
Jacob Pfau
1y
ago
•
Applied to
Risks from GPT-4 Byproduct of Recursively Optimizing AIs
by
ben hayum
1y
ago
•
Applied to
[Linkpost] Shorter version of report on existential risk from power-seeking AI
by
Ruby
1y
ago
•
Applied to
The Waluigi Effect (mega-post)
by
Cleo Nardo
1y
ago
•
Applied to
Power-seeking can be probable and predictive for trained agents
by
Vika
1y
ago
•
Applied to
Power-Seeking = Minimising free energy
by
Jonas Hallgren
1y
ago