This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Corrigibility
•
Applied to
Improvement on MIRI's Corrigibility
by
WCargo
8d
ago
•
Applied to
Shutdown-Seeking AI
by
Simon Goldstein
16d
ago
•
Applied to
Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program
by
Christopher King
21d
ago
•
Applied to
Mr. Meeseeks as an AI capability tripwire
by
Eric Zhang
1mo
ago
•
Applied to
Collective Identity
by
NicholasKees
1mo
ago
•
Applied to
(Slightly) Scalable RLHF Alternatives: A Productive Path for Slow Takeoff Worlds?
by
marc/er
1mo
ago
•
Applied to
Creating a self-referential system prompt for GPT-4
by
Ozyrus
1mo
ago
•
Applied to
GPT-4 implicitly values identity preservation: a study of LMCA identity management
by
Ozyrus
1mo
ago
•
Applied to
Aggregating Utilities for Corrigible AI [Feedback Draft]
by
Raemon
1mo
ago
•
Applied to
A Corrigibility Metaphore - Big Gambles
by
WCargo
1mo
ago
•
Applied to
Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios
by
Simon Lermen
1mo
ago
•
Applied to
Corrigibility, Much more detail than anyone wants to Read
by
Gunnar_Zarncke
1mo
ago
•
Applied to
Archetypal Transfer Learning and a Corrigibility-Friendly Optimization Technique
by
MiguelDev
1mo
ago
•
Applied to
«Boundaries/Membranes» and AI safety compilation
by
Chipmonk
1mo
ago
•
Applied to
An Impossibility Proof Relevant to the Shutdown Problem and Corrigibility
by
Audere
1mo
ago
•
Applied to
Simulating a possible alignment solution in GPT2-medium using Archetypal Transfer Learning
by
MiguelDev
2mo
ago
•
Applied to
Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner & Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium
by
MiguelDev
2mo
ago
•
Applied to
Thinking about maximization and corrigibility
by
James Payor
2mo
ago
•
Applied to
Capabilities and alignment of LLM cognitive architectures
by
Seth Herd
2mo
ago