All of Scott Garrabrant's Comments + Replies

Note that I tentatively think this will be the last post in the Geometric Rationality sequence.

I think it is EDT vs UDT. We prefer B to A, but we prefer CA to CB, not because of dutch books, but because CA is good enough for Bob to be fair, and A is not good enough for Bob.

...huh. So UDT in general gets to just ignore the independence axiom because: * UDT's whole shtick is credibly pre-committing to seemingly bad choices in some worlds in order to get good outcomes in others, and/or * UDT is optimizing over policies rather than actions, and I guess there's nothing stopping us having preferences over properties of the policy like fairness (instead of only ordering policies by their "ground level" outcomes). * And this is whereGcomes in, it's one way of encoding something-like-fairness. Sound about right?


I don't know that I have much "solid theoretical grounding." From my perspective, this sequence is me putting together a bunch of related concepts (and thus doing some of the hard parts of noticing that they are related), but not really giving good theoretical grounding. In fact, I was putting off posting this sequence, so I could have time to develop theoretical grounding, but then gave up on that and just posted what I had in response to the community wanting orientation around FTX. 

Yeah, I think this definition is more centrally talking about Nash bargaining than Kelly betting. Kelly betting can be expressed as maximizing a utility function that is logarithmic in wealth, and so can be seen as VNM rational

The point I was trying to make with the partial functions was something like "Yeah, there are 0s, yeah it is bad, but at least we can never assign low probability to any event that any of the hypotheses actually cares about." I guess I could have make that argument more clearly if instead, I just pointed out that any event in the sigma algebra of any of the hypotheses will have probability at least equal to the probability of that hypothesis times the probability of that event in that hypothesis. Thus the 0s (and the s) are really coming from the ... (read more)

I agree with all your intuition here. The thing about the partial functions is unsatisfactory, because it is discontinuous.

It is trying to be #1, but a little more ambitious. I want the distribution on distributions to be a new type of epistemic state, and the geometric maximization to be the mechanism for converting the new epistemic state to a traditional probability distribution. I think that any decent notion of an embedded epistemic state needs to be closed under both mixing and coarsening, and this is trying to satisfy that as naturally as possible.

I... (read more)

I think your numbers are wrong, and the right column on the output should say 20% 20% 20%.

The output actually agrees with each of the components on every event in that component's sigma algebra. The input distributions don't actually have any conflicting beliefs, and so of course the output chooses a distribution that doesn't disagree with either.

I agree that the 0s are a bit unfortunate.

I think the best way to think of the type of the object you get out is not a probability distribution on  but what I am calling a partial probability distribut... (read more)

1Vivek Hebbar2d
Yeah, the right column should obviously be all 20s. There must be a bug in my code[1] [#fn61pzde69tvl]:/ Take the following hypothesish3: If I add this intoPwith weight10−9, then the middle column is still nearly zero. But I can now ask for the probablity of the event inh3corresponding to the center square, and I get back an answer very close to zero. Where did this confidence come from? I guess I'm basically wondering what this procedure is aspiring to be. Some candidates I have in mind: 1. Extension to the coarse case of regular hypothesis mixing (where we go from P(w) and Q(w) toaP(w)+(1−a)Q(w)) 2. Extension of some kind of Bayesian update-flavored thing where we go toP(w)Q (w)then renormalize 1. ETA:P(w)aQ(w)1−aseems more plausible thanP(w)Q(w) 3. Some kind of "aggregation of experts who we trust a lot unless they contradict each other", which isn't cleanly analogous to either of the above Even in case 3, the near-zeros are really weird. The only cases I can think of where it makes sense are things like "The events are outcomes of a quantum process. Physics technique 1 creates hypothesis 1, and technique 2 creates hypothesis 2. Both techniques are very accurate, and the uncertainity they express is due to fundamental unknowability. Since we know both tables are correct, we can confidently rule out the middle column, and thus rule out certain events in hypothesis 3." But more typically, the uncertainity is in the maps of the respective hypotheses, not in the territory, in which case the middle zeros seem unfounded. And to be clear, the reason it seems like a real issue[2] [#fn2bp29v4bjp6]is that when you add in hypothesis 3 you have events in the middle which you can query, but the values can stay arbitrarily close to zero if you add in hypothesis 3 with low weight. 1. ^ [#fnref61pzde69tvl]ETA: Found the bug, it was fixable by substituting a single character 2. ^ [#fnref2bp29v4bjp6]Rather than "if a zero falls in th

Yeah, I think Thompson sampling is even more robust, but I don't know much about the nice properties of Thompson sampling besides the density 0 exploration.

Note that the cross entropy, (and thus ) is dependent on meaningless details of what events you consider the same vs different, but is not (as much), and when maximizing with respect to , this is the same maximization.

(I am just pointing out that KL divergence is a more natural concept than cross entropy.)

The middle piece here should beGx∼P[Q(x)]/Gx∼P[P(x)], right? Anyway KL-divergence is based.

You draw a part, which is a subset of , and thus has a probability according to .

Yeah, the Thompson sampling and Nash bargaining are different in that the Thompson sampling proposal has two argmaxes, where as Nash bargaining only has one. There are really two things being brought it with Thompson sampling, and Plurality is what you get if you only add the inner argmax, and something like Nash bargaining is what you get if you only add the geometric part. There is no reason you have to add the two things at the same place. All I know is Thompson sampling has some pretty nice asymptotic guarantees.

You could just Nash bargain between your... (read more)

Yeah, that is correct.

The thing I originally said about (almost) uniqueness was maybe wrong. Oops! I edited, and it it is correct now. 

To see that there might be many solutions under the weakest notion of egalitarianism, consider the case where there are three people, , and , with utility , and each with probability . The constraints on utility are that , and that . The thing is that if we give a small enough weight to , then almost everything we can do with  and&nbs... (read more)

Where by the "same odds," I mean if you can take 3:2 for True, you can take 2:3 for False.


Be warned that this explanation only applies if the environment is offering both sides of every event at the same odds.

2Matt Goldenberg6d
Yes, I got down to the Nash Bargaining part which is a bit harder and got confused again, but this helped as a very simple math intuition for why to Kelly Bet, if not how to calculate it in most real world betting situation.
3Scott Garrabrant6d
Where by the "same odds," I mean if you can take 3:2 for True, you can take 2:3 for False.

1) So, VNM utility theorem, assuming the space of lotteries is closed under arbitrary mixtures, where you e.g. you can specify a sequence of lotteries, and take the mixture that assigns probability  to the th lottery. implies bounded utilities, since otherwise, you can get a lottery with infinite utility, and violate continuity.

I think there are some reasons to not want to allow arbitrary lotteries, and then, you could technically have unbounded utility, but then you get a utility function that can only assign utilities in such a way tha... (read more)

Agreed. I should have had disclaimer that I was talking about preference utilitarianism. 

I am not sure what is true about what most people think.

My guess is that most philosophers who identify with utilitarianism mean welfare. 

I would guess that most readers of LessWrong would not identify with utilitarianism, but would say they identify more with preference utilitarianism than welfare utilitarianism.

My guess is that a larger (relative to LW) proportion of EAs identify with utilitarianism, and also they identify with the welfare version (relative... (read more)

I agree with your guesses. I don't have a crisp definition of this, but I just mean that, e.g., we compare the following two worlds: (1) 99.99% of agents are non-sentient paperclippers, and each agent has equal (bargaining) power. (2) 99.99% of agents are non-sentient paperclippers, and the paperclippers are all confined to some box. According to plenty of intuitive-to-me value systems, you only (maybe) have reason to increase paperclips in (1), not (2). But if the paperclippers felt really sad about the world not having more paperclips, I'd care—to an extent that depends on the details of the situation—about increasing paperclips even in (2).

Yep, I had been wanting to write this sequence for months, but FTX caused me to sprint for a week until it was all done, because it seems like now is the time people are especially hungry for this theory.

This sequence was going to be my main priority for December (and Kelly betting was going to be my most central example). I thought the main reason EAs needed it was to be able to not feel guilting every time they stop to have fun, to not get Pascal's mugged by calculations about the amount of matter in the universe, to not let longtermism take over the ent... (read more)

Yeah, remember the above is all for updateless agents, which are already computationally intractable. For updateful agents, we will want to talk about conditional counterfactability. For example, if you and I are in a prisoners dilemma, we could would conditional on all the stuff that happened prior to us being put in separate cells, and given this condition, the histories are much smaller. 

Also, we could do all of our reasoning up to a high level world model that makes histories more reasonably sized.

Also, if we could think of counterfactability as a... (read more)

I agree, this is why I said I am being sloppy with conflating the output and our understanding of the output. We want our understanding of the output to screen off the history.

I mean, the definition is a little vague. If your meaning is something like "It goes in A if it is more accurately described as controlled by the viscera, and it goes in P if it is more accurately described as controlled by the environment," then I guess you can get a bijection by definition, but it is not obvious these are natural categories. I think there will be parts of the boundary that feel like they are controlled by both or neither, depending on how strictly you mean "controlled by."

My default plan is to not try to rename Cartesian frames, mostly because the benefit seems small, and I care more about building up the FFS ontology over the Cartesian frame one.

I agree completely. I am not really happy with any of the language in this post, and I want it to have scope limited to this post. I will for the most part say boundary for both the additive and multiplicative variants.

To be clear, everywhere I say “is wrong,” I mean I wish the model is slightly different, not that anything is actually is mistaken. In most cases, I don’t really have much of an idea how to actually implement my recommendation.

Forcing the AxP bijection is an interesting idea, but it feels a little too approximate to my taste.

Oh yeah, oops, that is what it says. Wasn’t careful, and was responding to reading an old draft. I agree that the post is already saying roughly what I want there. Instead, I should have said that the B=AxP bijection is especially unrealistic. Sorry.

Why is it unrealistic? Do you actually mean it's unrealistic that the set I've defined as "A" will be interpretable at "actions" in the usual coarse-grained sense? If so I think that's a topic for another post when I get into talking about the coarsened variablesVc,Ac,Pc,Ec...

Overall, this is my favorite thing I have read on lesswrong in the last year.


I agree very strongly with most of this post, both in the way you are thinking about boundaries, and in the scale and scope of applications of boundaries to important problems.

In particular on the applications, I think that boundaries as you are defining them are crucial to developing decision theory and bargaining theory (and indeed are already helpful for thinking about bargaining and fairness in real life), but I also agree with your other potential applications.

I pa... (read more)

2Scott Garrabrant1mo
More of my thoughts here [].
3Scott Garrabrant1mo
To be clear, everywhere I say “is wrong,” I mean I wish the model is slightly different, not that anything is actually is mistaken. In most cases, I don’t really have much of an idea how to actually implement my recommendation.
Thanks, Scott! Are you sure? The informal description I gave for A and P allow for the active boundary to be a bit passive and the passive boundary to be a bit active. From the post: There's a question of how to factor B into a zillion fine-grained features in the first place, but given such a factorization, I think we can define A and P fairly straightforwardly using Shapley value to decide how much V versus E is controlling each feature, and then A and P won't overlap and will cover everything.

Note that I wrote this post a month ago, while seeing an earlier draft of the sequence post 3a (before active/passive distinction was central) and was waiting to post it until after that post. I am posting it now unedited, so some of the thoughts here might be outdated. In particular, I think this post does not respect enough the sense in which the FFS ontology is wrong in that it does not have space for expressing the direction of entanglement.

Ok, I believe this version of the Lemma, and am moving on to trying to get the rest of the argument.

Ok, here are some questions to help me understand/poke holes in this proof. (Don't think too hard on these questions. If the answers are not obvious to you, then I am asking the wrong questions.

  1. Does the argument (or a simple refactorization of the argument you also believe) decompose through "If  strictly dominates , then there is a  that also strictly dominates  such that the probability of any voter being indifferent between something sampled from  and something sampled from  is 0 (or
... (read more)
#1: It doesn't. The previous version implied that there was aB′for which the probability of ties was arbitrarily low, but the new version can have lots of voters who are indifferent. If B puts its mass in the interior of a face F, then we redistribute probability mass within the interior of F, but some voters assign the same utility to everything in F. #4: The current lemma is:

I still haven't understood all of your argument, but have you missed the fact that some faces are entirely contained in ?

(Your arguments look similar to stuff we did when trying to apply this paper.)

I think this is OK (though still lots of room for subtleties). Spelling this aspect out in more detail: * Fix some arbitrary A which is strictly dominated by B. * We claim that there exists a face F and a continuous B' over F such that B' also dominates A. * Sample some lottery from B to obtain a concrete lottery b that strictly dominates A. * If b is a vertex we are done. Otherwise, let F be the face such that b lies in the interior of F. * For each voter, their level sets are either hyperplanes in F or else they are all of F. * We can ignore the voters who are indifferent within all of F, because any B' supported in F will be the same as b from those voters' perspectives. * Now defineXu,X′u,Lu,Las before, but restricting to the voters who have preferences within F. * We obtain a continuous distribution B' for whichf(B′,A)≈f(b,A)if we ignore the voters who were indifferent. Butf(B′,A)=f(b,A)for the voters who are indifferent, so we havef(B′,A)≈f(b,A)overall. * (Of course this just goes through the existence of an open set of lotteries all of which strictly dominate A, we can just take B' uniform over that set.) This lemma is what we need, because we will run follow the leader over the space of pairs (F, A) where F is a face and A is a distribution over that face. So we conclude that the limit is not dominated by any pair (F, B') where F is a face and B' is a continuous distribution.

For example, if there is a unanimous winner, you only have to pick them half the time, and can do whatever you want the other half of the time.

2Vanessa Kosoy1mo
Yes, this is a good point. Maybe we can strengthen the "weak-MLL" criterion in other ways while preserving existence. For example, we can consider the "p -dominance" condition Pr[v(B)>v(A)]≤1−p and look for an LL that is "weak p -maximal" for the highest possible p. The function on the LHS is lower-semincontinuous, hence there exists a maximal p for which a weak p-maximal LL exists.

Yeah, I believe this works, and that it feels too weak.

4Scott Garrabrant1mo
For example, if there is a unanimous winner, you only have to pick them half the time, and can do whatever you want the other half of the time.

Maximal lottery-lotteries also satisfies consistency and participation, since they are just maximal lotteries over a larger set of candidates.

Actually, I think there is a decent chance I am wrong here. It might be merging two electorates that produce the same lottery-lottery over candidates also produces that lottery-lottery, but we can't say the same thing if we collapse that to lotteries, which is what we would need for consistency.

If  assigns positive probability to  we need one more step. The idea is that for each point in  we can find a "safe" direction to shift the probability from  so that it's no longer in : if we pick a random direction each voter prefers to move one way or the other, and so one direction must have majority approval (note that the argument would break at this point if  depended on the probability that  was at least as good as , rather than being defined symmetrically). Once we'v

... (read more)

I agree that's a bug in the proof (and the lemma obviously can't be true as written given that e.g. if 90% of voters are single-issue voters who hate candidate X, then no continuous lottery-lottery can dominate a lottery that puts 0 probability on X).

I haven't thought about how serious this is. It feels like we could hope to fix things by considering a game where you pick (i) which candidates to give probability 0, which is a discrete choice from a set of size and hence relatively unproblematic, (ii) what distribution to play within the correspond... (read more)

When a voter compares two lottery-lotteries, they take expected utilities with respect to the inner , but they sample with respect to the outer , and support whichever sampled thing they prefer. 

If we collapse and treat everything like the outer , that just gives us the original maximal lotteries, which e.g. is bad because it chooses anarchy in the above example.

If we collapse and treat everything like the inner , then existence will fail, because there can be non-transitive cycles of majority preferences.

Yes, although I bet that Condorcet and consistent are still incompatible for deterministic voting systems, even if you allow taking in the entire utility functions

It is not the case that voters on average necessarily prefer the maximal lottery to any other lottery in expected utility. It is only the case that voters on average prefer a candidate sampled at random from the maximal lottery to a candidate sampled at random from any other lottery. Doing the first thing would be impossible, because there can be cycles.

This is how anarchy wins in maximal lotteries in the example in the next section. If you compare anarchy to choosing a leader at random, in expected utility, choosing a leader at random is a Pareto improvement, but if you sample a candidate at random, they will lose to Anarchy by 1 vote. 

Utility functions contain all the preferences on all gambles.

If you just maximize average utility as score voting does, yes it is true that the incentives point to always giving utility 0 or 1, which recreates approval voting. I don't know about MLL. 

Yeah, that is correct, although note that the ballot could just ask the voters for their utility function. The assumption means the ballot asks for a ranking of candidates, possibly with ties, and no other information.

Note that this is only true for ranked methods, and not scored methods [], like Approval Voting, Star Voting, etc.

Yeah, preorder is misleading. I was only trying to say with as few characters as possible that they are only considering a ranking of candidates possibly with ties. (Which is a preorder, but is less general.)

Oh, at first I was interpreting voting block as set of people with the same full preference profile. Obviously, since you are modifying RD, it should just be people with the same first choice, so my point doesn't matter.

Unlike RD and ML, your proposal is not clone invarient.

I don't like this because irrelevant alternatives can split a voting block, and have a large effect.

2Scott Garrabrant1mo
Oh, at first I was interpreting voting block as set of people with the same full preference profile. Obviously, since you are modifying RD, it should just be people with the same first choice, so my point doesn't matter. Unlike RD and ML, your proposal is not clone invarient.

Feels like this post should somewhere mention Donald Hobson who (I believe) invented this game.


I would go with the physical object one.

Just to double check: You'd rather say that embedded (in embedded agency) is the synonym of “built-in” (like "In 2005 next generation proprietary embedded controller was introduced.") then “ingrained” (like "Less - is embedded metalanguage: valid CSS will be valid less-program with the same semantics."), correct?
Thank you!

I love the "E+M" name. It reminds of electricity and magnetism, and IMO embedded agency and multi-agent rationality will eventually be seen as two sides of the same coin about as much as electricity and magnetism.

I think our current best theories of both don't look much like each other, and predict that as we progress on each, they will slowly look more and more like one field.

I mostly agree with this post.

Figuring out the True Name of a thing, a mathematical formulation sufficiently precise that one can apply lots of optimization pressure without the formulation breaking down, is absolutely possible and does happen.

Precision feels pretty far from the true name of the important feature of true names, I am not quite sure what precision means, but on one definition, precision is the opposite of generality, and true names seem anti-precise. I am not saying precision is not a virtue, and it does seem like precision is involved. (lik... (read more)

You're right, I wasn't being sufficiently careful about the wording of a bolded sentence. I should have said "robust" where it said "precise". Updated in the post; thankyou. Also I basically agree that robustness to optimization is not the True Name of True Names, though it might be a sufficient condition.
Load More