This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Eliciting Latent Knowledge (ELK)
•
Applied to
Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)
by
Scott Emmons
23d
ago
•
Applied to
Trying to measure AI deception capabilities using temporary simulation fine-tuning
by
alenoach
1mo
ago
•
Applied to
[ASoT] Some thoughts on human abstractions
by
leogao
3mo
ago
•
Applied to
What Discovering Latent Knowledge Did and Did Not Find
by
Ruby
3mo
ago
•
Applied to
Thoughts on self-inspecting neural networks.
by
Deruwyn
3mo
ago
•
Applied to
Searching for a model's concepts by their shape – a theoretical framework
by
Kaarel
3mo
ago
•
Applied to
Why I’m not working on {debate, RRM, ELK, natural abstractions}
by
Steven Byrnes
4mo
ago
•
Applied to
[RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision".
by
gekaklam
4mo
ago
•
Applied to
Collin Burns on Alignment Research And Discovering Latent Knowledge Without Supervision
by
Raemon
5mo
ago
•
Applied to
[ASoT] Simulators show us behavioural properties by default
by
Jozdien
5mo
ago
•
Applied to
Can we efficiently distinguish different mechanisms?
by
Multicore
5mo
ago
•
Applied to
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
by
Charbel-Raphaël
5mo
ago
•
Applied to
My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)
by
Robert_AIZI
5mo
ago
•
Applied to
Article Review: Discovering Latent Knowledge (Burns, Ye, et al)
by
Robert_AIZI
5mo
ago
•
Applied to
Can we efficiently explain model behaviors?
by
Raemon
6mo
ago
•
Applied to
How is ARC planning to use ELK?
by
Raemon
6mo
ago
•
Applied to
Discovering Latent Knowledge in Language Models Without Supervision
by
Xodarap
6mo
ago