LESSWRONG
LW

3317
Wikitags

Eliciting Latent Knowledge

Edited by Multicore last updated 31st Mar 2022

Eliciting Latent Knowledge is an open problem in AI safety.

Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.

But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.

In these cases, the prediction model "knows" facts (like "the camera was tampered with") that are not visible on camera but would change our evaluation of the predicted future if we learned them. How can we train this model to report its latent knowledge of off-screen events?

--ARC report

See also: Transparency/Interpretability

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Eliciting Latent Knowledge
228ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano, Mark Xu, Ajeya Cotra
4y
Ω
90
138Mechanistic anomaly detection and ELK
Ω
paulfchristiano
3y
Ω
22
138ELK prize results
Ω
paulfchristiano, Mark Xu
4y
Ω
50
104Finding gliders in the game of life
Ω
paulfchristiano
3y
Ω
8
53Counterexamples to some ELK proposals
Ω
paulfchristiano
4y
Ω
10
150Prizes for ELK proposals
Ω
paulfchristiano
4y
Ω
152
18Robustness of Contrast-Consistent Search to Adversarial Prompting
Nandi, i, Jamie Wright, Seamus_F, hugofry
2y
1
65ELK First Round Contest Winners
Ω
Mark Xu, paulfchristiano
4y
Ω
6
38Eliciting Latent Knowledge Via Hypothetical Sensors
Ω
John_Maxwell
4y
Ω
1
31ELK Proposal: Thinking Via A Human Imitator
Ω
TurnTrout
4y
Ω
6
25Importance of foresight evaluations within ELK
Ω
Jonathan Uesato
4y
Ω
1
23Towards a better circuit prior: Improving on ELK state-of-the-art
Ω
evhub, kcwoolverton
3y
Ω
0
69Implications of automated ontology identification
Ω
Alex Flint, adamShimi, Robert Miles
4y
Ω
27
50My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)
Robert_AIZI
3y
0
166What Discovering Latent Knowledge Did and Did Not Find
Ω
Fabien Roger
3y
Ω
17
Load More (15/103)
Add Posts