LESSWRONG
LW

14
Wikitags

Spurious Counterfactuals

Edited by abramdemski last updated 20th Jun 2022

Spurious Counterfactuals are spuriously low evaluations of the quality of a potential action, which are only provable because they are self-fulfilling (usually due to Lob's theorem). For example, if I know that I go left, then it is logically true that if I went right, I would get -10 utility (because in classical logic, false statements imply any statement). This suggests that if I fully believed that I went left, then I would indeed go left. By Lob's theorem, I indeed go left. 

Building agents who avoid this line of reasoning, despite having full access to their own source code and the ability to logically reason about their own behavior, is one goal of Embedded Agency.

Subscribe
Discussion
Subscribe
Discussion
Posts tagged Spurious Counterfactuals
210Embedded Agency (full-text version)
Ω
Scott Garrabrant, abramdemski
7y
Ω
17
29An Introduction to Löb's Theorem in MIRI Research
orthonormal
11y
27
15A Possible Resolution To Spurious Counterfactuals
Ω
JoshuaOSHickman
4y
Ω
5
14Modeling naturalized decision problems in linear logic
Ω
jessicata
5y
Ω
2
5Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory
Christopher King
3y
4
Add Posts