Scott Garrabrant

What Would I Do? Self-prediction in Simple Algorithms

Having be won't work.

Surprisingly, having go to 0 at any quickly computably rate won't work. For example, if you could imagine having a logical induction built out of a collection of traders where one trader has almost all the money and says that on days of the form , utility conditioned on going left is 0 (where is a fast growing function). Then, you have a second trader that forces the probability on day of the statement that the agent goes left to be slightly higher that . Finally, you have a third trader that forces the expected utility conditioned on right to be very slightly above 0 on days of the form .

The first trader never loses money, since the condition is never met. The second trader only loses a bounded amount of money, since it is forcing the probability of a sentence that will be false to be very small. The third trader similarly only loses a bounded amount of money. The exploration clause will never trigger, and the agent will never go left on any day of the form .

The issue here is that we need to not only explore infinitely often, we need to explore infinitely often on all simple subsets of days, if the probability goes to 0 slowly, you can just look at a subset of days that is sufficiently sparse.

There are ways around this that allow us to make a logical induction agent that explores with destiny 0 (meaning that the limit as goes to infinity of the proportion of days that the agent explores is 0). This is done by explicitly exploring infinitely often on every quickly computable subset of days, while still having the probability of exploring go to 0.

What Would I Do? Self-prediction in Simple Algorithms

It does not approach it from above or below. As goes to infinity, the proportion of for which =="Left" need not converge to 1/2, but it must have 1/2 as a limit point, so the proportion of for which =="Left" is arbitrarily close to 1/2 infinitely often. Further, the same is true for any easy to compute subsequence of rounds.

So, unfortunately it might be that goes left many many times in a row e.g. for all between and , but it will still be unpredictable, just not locally independent.

Sunday July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stuart_Armstrong

Date is wrong. It says June 28.

[Site Meta] Feature Update: More Tags! (Experimental)

Here are some maybe useful tags. Interpret these as ideas, not requests.

Mechanism Design (I think I am imagining including systemization that aligns incentives within yourself in here, which maybe means you would want a more general name like "Aligning Incentives" but I think I prefer "Mechanism Design")

Fake Frameworks (When I first thought of this, I was thinking of people tagging their own posts. Maybe it is a little weird to have people tagging each other's posts as fake. )

Embedded Agency (Where I am imagining this as being largely for technical work) (In particular, I personally would get more use out one big embedded agency tag than a bunch of smaller tags, since I feel like all the most interesting stuff in embedded agency cuts across tags like "decision theory")

Something like the class including: Toward a New Technical Explanation of Technical Explanation, Embedded World Models, technical logical uncertainty work, things about dealing with the fact that Bayes is not a viable strategy for embedded agents. "Embedded World Models" "Resource Bounded Epistemics" "Embedded Epistemics" "Post-Bayesianism" I would hope the name here does not make people think it should only be for technical things.

Something like the class including: How I Lost 100 Pounds Using TDT, Humans Are Embedded Agents Too, Inner alignment in the brain, Sources of intuitions and data on AGI, things about applying AI alignment theory to human rationality and vice versa. Maybe more generally about applying results from one field to another field. "Interdisciplinary Analogies"?

A method for fair bargaining over odds in 2 player bets!

https://www.lesswrong.com/posts/aiz4FCKTgFBtKiWsE/even-odds is another proposal that gives incentive compatible betting by having the bet be smaller than the maximum. (maybe its the same, haven't checked.)

What's the upper bound of how long COVID is contagious?

You should (more strongly?) disambiguate between how long after being sick are you safe, or how long after being 100% isolated are you safe.

Voting Phase of 2018 LW Review

Is it pro-social or anti-social to vote on posts I have skimmed but not read?

Humans Are Embedded Agents Too

We actually avoided talking about AI in most of the cartoon, and tried to just imply it by having a picture of a robot.

The first time (I think) I presented the factoring in the embedded agency sequence was at a MIRI CFAR collaboration workshop, so parallels with humans was live in my thinking.

The first time we presented the cartoon in roughly its current form was at MSFP 2018, where we purposely did it on the first night before a CFAR workshop, so people could draw analogies that might help them transfer their curiosity in both directions.

Somewhere in between? I have reliable intuition about what would happen that comes before being able to construct the proof, but can reliably be turned into the proof. All of the proofs that these agents do what I say they do can be found by asking:

Assume that the probability does not converge as I say it does. How can I use this to make money if I am allowed to see (continuously) the logical inductors beliefs, and bet against them?

For example in the first example, If the probability was greater that 1/2+δ infinity often, I could wait until the probability is greater than 1/2+δ, then bet that the agent goes right. This bet will always pay out, and double my money, and I can do this forever.