Super interesting series! Your second post actually gave me some insights on a problem that was a big part of my phd, but it will take me some time to think through. So here is a ~~simpler~~ unrelated comment.

Thompson sampling seems to be too perfectionistic to me: it wants to take an action that is optimal in some world, rather than one that is merely great in all worlds. For example, suppose you have a suitcase with a 6 digit lock. In each turn, you can make a guess at the correct combination. If you guess correctly, you get 10 utils, if you guess wrong you get 1 util. If you don’t guess but instead pass, you get 9 utils.

(The model is supposed to be greedy and ignore future advantages so information revealed from a wrong guess shouldn’t matter. If this bothers you, you can imaging that when you “pass”, you still get to make a guess at the correct combination, and you learn if it is correct, but you get 9 utils no matter if your guess is correct or not.)

Each possible combination is considered a hypothesis so each hypothesis is deterministic. Thompson sampling would never pass, because that is not the optimal thing to do for any particular hypothesis (although it is the optimal thing to do when you don’t know which hypothesis is true). Instead it would keep guessing combinations, because each combination is optimal is some hypothesis.

This is not an issue of AM vs. GM: the same thing happens in the plurality decision procedure when we only use AM. It is an issue about what we take argmax over. If we consider the combination to be due to randomness within a single hypothesis, the decision procedures will correctly choose to pass until the correct combination has been revealed.

Possibly related question: Is there any reason the AM-GM boundary is at the same level as where we take argmax in the definition of m? Or could we have an arithmetic expectation outside of argmax (hence outside of m) but inside the geometric expectation? Or even have a geometric expectation inside of argmax in m but possibly over the arithmetic expectation?

Reply

[-]Scott Garrabrant3y80

Yeah, the Thompson sampling and Nash bargaining are different in that the Thompson sampling proposal has two argmaxes, where as Nash bargaining only has one. There are really two things being brought it with Thompson sampling, and Plurality is what you get if you only add the inner argmax, and something like Nash bargaining is what you get if you only add the geometric part. There is no reason you have to add the two things at the same place. All I know is Thompson sampling has some pretty nice asymptotic guarantees.

You could just Nash bargain between your hypotheses directly, but then you are dependent on where the 0 point is. One nice thing about Thompson sampling is that it gives you a semi-principled place to put the 0, because the inner argmax means we convert everything to probabilities.

Reply

[-]dust_to_must3y56

Thanks for your posts, Scott! This has been super interesting to follow.

Figuring out where to set the AM-GM boundary strikes me as maybe the key consideration wrt whether I should use GM -- otherwise I don't know how to use it in practical situations, plus it just makes GM feel inelegant.

From your VNM-rationality post, it seems like one way to think about the boundary is commensurability. You use AM within clusters whose members are willing to sacrifice for each other (are willing to make Kaldor-Hicks improvements, and have some common currency s.t. "K-H improvement" is well-defined; or, in another framing, have a meaningfully shared utility function) . Maybe that's roughly the right notion to start with? But then it feels strange to me to not consider things commensurate across epistemic viewpoints, especially if those views are contained in a single person (though GM-ing across internal drives does seem plausible to me).

I'd love to see you (or someone else) explore this idea more, and share hot takes about how to pin down the questions you allude to in the AM-GM boundary section of this post: where to set this boundary, examples of where you personally would set it in different cases, and what desiderata we should have for boundary-setting eventually. (It feels plausible to me that having maximally large clusters is in some important sense the right thing to aim for).

Reply

[-]PaulK3y40

Wow, I came here to say literally the same thing about commensurability: that perhaps AM is for what's commensurable, and GM is for what's incommensurable.

Though, one note is that to me it actually seems fine to consider different epistemic viewpoints as incommensurate. These might be like different islands of low K-complexity, that each get some nice traction on the world but in very different ways, and where the path between them goes through inaccessibly-high K-complexity territory.

Reply

[-]Cole Wyeth8mo20

Great post!

We in universal AI already have a name for the generalization of AIXI that may use a different history distribution: AI\mu. Since you intend \mu to be a distribution over hypotheses, NOT the true environment as Marcus Hutter usually uses \mu, I would also preferably replace \mu -> \nu. This may seem pedantic but it's kind of triggering to use AIXI to refer to AI\nu, because the "XI" part of "AIXI" is (sometimes) intended to translate as \xi, Hutter's chronological version of the universal distribution.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

126

Geometric Exploration, Arithmetic Exploitation

126

126

A Exploration/Exploitation Toy Model

AIXI

Plurality

Thompson Sampling

Density Zero Exploration

The AM-GM Boundary