This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Corrigibility
•
Applied to
Shutdown-Seeking AI
by
Simon Goldstein
1d
ago
•
Applied to
Mr. Meeseeks as an AI capability tripwire
by
Eric Zhang
14d
ago
•
Applied to
Collective Identity
by
NicholasKees
15d
ago
•
Applied to
(Slightly) Scalable RLHF Alternatives: A Productive Path for Slow Takeoff Worlds?
by
marc/er
16d
ago
•
Applied to
Creating a self-referential system prompt for GPT-4
by
Ozyrus
16d
ago
•
Applied to
GPT-4 implicitly values identity preservation: a study of LMCA identity management
by
Ozyrus
16d
ago
•
Applied to
Aggregating Utilities for Corrigible AI [Feedback Draft]
by
Raemon
20d
ago
•
Applied to
A Corrigibility Metaphore - Big Gambles
by
WCargo
23d
ago
•
Applied to
Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios
by
Simon Lermen
24d
ago
•
Applied to
Corrigibility, Much more detail than anyone wants to Read
by
Gunnar_Zarncke
1mo
ago
•
Applied to
Archetypal Transfer Learning and a Corrigibility-Friendly Optimization Technique
by
MiguelDev
1mo
ago
•
Applied to
«Boundaries/Membranes» and AI safety compilation
by
Chipmonk
1mo
ago
•
Applied to
An Impossibility Proof Relevant to the Shutdown Problem and Corrigibility
by
Audere
1mo
ago
•
Applied to
Simulating a possible alignment solution in GPT2-medium using Archetypal Transfer Learning
by
MiguelDev
1mo
ago
•
Applied to
Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner & Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium
by
MiguelDev
1mo
ago
•
Applied to
Thinking about maximization and corrigibility
by
James Payor
1mo
ago
•
Applied to
Capabilities and alignment of LLM cognitive architectures
by
Seth Herd
1mo
ago
•
Applied to
The Guardian Version 1
by
MiguelDev
2mo
ago
•
Applied to
Paying the corrigibility tax
by
Max H
2mo
ago