LESSWRONGTags
LW

Power Seeking (AI)

•

Applied to Steering Llama-2 with contrastive activation additions by TurnTrout 6mo ago

•

Applied to Natural Abstraction: Convergent Preferences Over Information Structures by paulom 8mo ago

•

Applied to You can't fetch the coffee if you're dead: an AI dilemma by hennyge 10mo ago

•

Applied to The Game of Dominance by Karl von Wendt 10mo ago

•

Applied to Incentives from a causal perspective by tom4everitt 1y ago

•

Applied to Instrumental Convergence? [Draft] by Dan H 1y ago

•

Applied to Categorical-measure-theoretic approach to optimal policies tending to seek power by Vika 1y ago

•

Applied to My Overview of the AI Alignment Landscape: Threat Models by Michelle Viotti 1y ago

•

Applied to Ideas for studies on AGI risk by dr_s 1y ago

•

Applied to Instrumental convergence in single-agent systems by Jacob Pfau 1y ago

•

Applied to Risks from GPT-4 Byproduct of Recursively Optimizing AIs by ben hayum 1y ago

•

Applied to [Linkpost] Shorter version of report on existential risk from power-seeking AI by Ruby 1y ago

•

Applied to The Waluigi Effect (mega-post) by Cleo Nardo 1y ago

•

Applied to Power-seeking can be probable and predictive for trained agents by Vika 1y ago

•

Applied to Power-Seeking = Minimising free energy by Jonas Hallgren 1y ago