Infrafunctions and Robust Optimization

Here are the most interesting things about these objects to me that I think this post does not capture.

Given a distribution over non-negative non-identically-zero infrafunctions, up to a positive scalar multiple, the pointwise geometric expectation exists, and is an infra function (up to a positive scalar multiple).

(I am not going to give all the math and be careful here, but hopefully this comment will provide enough of a pointer if someone wants to investigate this.)

This is a bit of a miracle. Compare this with arithmetic expectation of utility functions. This is not always well defined. For example, if you have a sequence of utility functions U_n, each with weight 2^{-n}, but which alternate in which of two outcomes they prefer, and each utility function gets an internal weighting to cancel out their small weight an then some, the expected utility will not exist. There will be a series of larger and larger utility monsters canceling each other out, and the limit will not exist. You could fix this requiring your utility functions are bounded, as is standard for dealing with utility monsters, but it is really interesting that in the case of infra functions and geometric expectation, you don't have to.

If you try to do a similar trick with infra functions, up to a positive scalar multiple, geometric expectation will go to infinity, but you can renormalize everything since you are only working up to a scalar multiple, to make things well defined.

We needed the geometric expectation to only be working up to a scalar multiple, and you cant expect a utility function if you take a geometric expectation of utility functions. (but you do get an infrafunction!)

If you start with utility functions, and then merge them geometrically, the resulting infrafunction will be maximized at the Nash bargaining solution, but the entire infrafunction can be thought of as an extended preference over lotteries of the pair of utility functions, where as Nash bargaining only told you the maximum. In this way geometric merging of infrafunctions is starting with an input more general than the utility functions of Nash bargaining, and giving an output more structured than the output of Nash bargaining, and so can be thought of as a way of making Nash bargaining more compositional. (Since the input and output are now the same type, so you can stack them on top of each other.)

For these two reasons (utility monster resistance and extending Nash bargaining), I am very interested in the mathematical object that is non-negative non-identically-zero infrafunctions defined only up to a positive scalar multiple, and more specifically, I am interested in the set of such functions as a convex set where mixing is interpreted as pointwise geometric expectation.

[-]Scott Garrabrant2yΩ7101

I have been thinking about this same mathematical object (although with a different orientation/motivation) as where I want to go with a weaker replacement for utility functions.

I get the impression that for Diffractor/Vanessa, the heart of a concave-value-function-on-lotteries is that it represents the worst case utility over some set of possible utility functions. For me, on the other hand, a concave value function represents the capacity for compromise -- if I get at least half the good if I get what I want with 50% probability, then I have the capacity to merge/compromise with others using tools like Nash bargaining.

This brings us to the same mathematical object, but it feels like I am using the definition of convex set related to the line segment connecting any two points in the set is also in the set, where Diffractor/Vanessa is using the definition of convex set related to being an intersection of half planes.

I think this pattern where I am more interested in merging, and Diffractor and Vanessa are more interested in guarantees, but we end up looking at the same math is a pattern, and I think the dual definitions of convex set in part explains (or at least rhymes with) this pattern.

[-]Raemon3yΩ360

note: I tagged this "Infrabayesianism" but wasn't actually sure whether it was or not according to you.

[-]Rohin Shah2yΩ220

I forget if I already mentioned this to you, but another example where you can interpret randomization as worst-case reasoning is MaxEnt RL, see this paper. (I reviewed an earlier version of this paper here (review #3).)

[-]James Payor3y*Ω221

Can I check that I follow how you recover quantilization?

Are you evaluating distributions over actions, and caring about the worst-case expectation of that distribution?

If so, proposing a particular action is evaluated badly? (Since there's a utility function in your set that spikes downward at that action.)

But proposing a range of actions to randomize amongst can be assessed to have decent worst-case expected utility, since particular downward spikes get smoothed over, and you can rely on your knowledge of "in-distribution" behaviour?

Edited to add: fwiw it seems awesome to see quantilization formalized as popping out of an adversarial robustness setup! I haven't seen something like this before, and didn't notice if the infrabayes tools were building to these kinds of results. I'm very much wanting to understand why this works in my own native-ontology-pieces.

[-]James Payor3yΩ110

If that's correct, here are some places this conflicts with my intuition about how things should be done:

I feel awkward about the randomness is being treated essential. I'd rather be able to do something other than randomness in order to get my mild optimization, and something feels unstable/non-compositional about needing randomness in place for your evaluations... (Not that I have an alternative that springs to mind!)

I also feel like "worst case" is perhaps problematic, since it's bringing maximization in, and you're then needing to rely on your convex set being some kind of smooth in order to get good outcomes. If I have a distribution over potential utility functions, and quantilize for the worst 10% of possibilities, does that do the same sort of work that "worst case" is doing for mild optimization?

[-]drocta2y10

For the "Crappy Optimizer Theorem", I don't understand why condition 4, that if , then $Q (s) (f) \leq 0$ , isn't just a tautology^[1]. Surely if $\forall x \in X, f (x) \leq c$ , then no-matter what $s : (X \to R) \to X$ is being used,
as $Q (s) (f) := f (s (f))$ , then letting $x = s (f)$ , then $f (x) \leq c$ , and so $Q (s) (f) = f (s (f)) = f (x) \leq c$ .

I guess if the 4 conditions are seen as conditions on a function $F : (X \to R) \to R$ (where they are written for $F = Q (s)$ ), then it no-longer is automatic, and it is just when specifying that $F = Q (s)$ for some $s$ , that condition 4 becomes automatic?

______________

[start of section spitballing stuff based on the crappy optimizer theorem]
Spitball 1:
What if instead of saying $s : (X \to R) \to X$ , we had $s : (X \to R) \to Δ X$ ? would we still get the results of the crappy optimizer theorem?
If we define if s(f) is now a distribution over X, then, I suppose instead of writing Q(s)(f)=f(s(f)) should write Q(s)(f) = s(f)(f) , and, in this case, the first 2 and 4th conditions seem just as reasonable. The third condition... seems like it should also be satisfied?

Spitball 2:
While I would expect that the 4 conditions might not be exactly satisfied by, e.g. gradient descent, I would kind of expect basically any reasonable deterministic optimization process to at least "almost" satisfy them? (like, maybe gradient-descent-in-practice would fail condition 1 due to floating point errors, but not too badly in reasonable cases).
Do you think that a modification of this theorem for functions Q(s) which only approximately satisfy conditions 1-3, would be reasonably achievable?

______________

^{^}
I might be stretching the meaning of "tautology" here. I mean something provable in our usual background mathematics, and which therefore adding it as an additional hypothesis to a theorem, doesn't let us show anything that we couldn't show without it being an explicit hypothesis.

[-]Jeremy Gillen2y10

I really like infrafunctions as a way of describing the goals of mild optimizers. But I don't think you've described the correct reasons why infrafunctions help with reflective stability. The main reason is you've hidden most of the difficulty of reflective stability in the bound.

My core argument is that a normal quantilizer is reflectively stable^[1] if you have such a bound. In the single-action setting, where it chooses a policy once at the beginning and then follows that policy, it must be reflectively stable because if the chosen policy constructs another optimizer that leads to low true utility, then that policy must have very low base probability (or the bound can't have been true). In a multiple-action setting, we can sample each action conditional on the previous actions, according to the quantilizer distribution, and this will be reflectively stable in the same way (given the bound).

Adding in observations doesn't change anything here if we treat U and V as being expectations over environments.

The way you've described reflective stability in the dynamic consistency section is an incentive to keep the same utility infrafunction no matter what observations are made. I don't see how this is necessary or even strongly related to reflective stability. Can't we have a reflectively stable CDT agent?

Two core difficulties of reflective stability

I think the two core difficulties of reflective stability are 1) getting the $\int | U - V | d ν \leq ϵ$ bound (or similar) and 2) describing an algorithm that lazily does a ~minimal amount of computation for choosing the next few actions. I expect realistic agents need 2 for efficiency. I think utility infrafunctions do help with both of these, to some extent.

The key difficulty of getting a tight $\int | U - V | d ν \leq ϵ$ bound with normal quantilizers is that simple priors over policies don't clearly distinguish policies that create optimizers. So there's always a region at the top where "create an optimizer" makes up most of the mass. My best guess for a workaround for this is to draw simple conservative OOD boundaries in state-space and policy-space (the base distribution is usually just over policy space, and is predefined). When a boundary is crossed, it lowers the lower bound on the utility (gives Murphy more power). These boundaries need to be simple so that they can be learned from relatively few (mostly in-distribution) examples, or maybe from abstract descriptions. Being simple and conservative makes them more robust to adversarial pressure.

Your utility infrafunction is a nice way to represent lots of simple out-of-distribution boundaries in policy-space and state-space. This is much nicer than storing this information in the base distribution of a quantilizer, and it also allows us to modulate how much optimization pressure can be applied to different regions of state or policy-space.

With 2, an infrafunction allows on-the-fly calculation that the consequences of creating a particular optimizer are bad. It can do this as long as the infrafunction treats the agent's own actions and the actions of child-agents as similar, or if it mostly relies on OOD states as the signal that the infrafunction should be uncertain (have lots of low spikes), or some combination of these. Since the max-min calculation is the motivation for randomizing in the first place, an agent that uses this will create other agents that randomize in the same way. If the utility infrafunction is only defined over policies, then it doesn't really give us an efficiency advantage because we already had to calculate the consequences of most policies when we proved the bound.

One disadvantage, which I think can't be avoided, is that an infrafunction over histories is incentivized to stop humans from doing actions that lead to out-of-distribution worlds, whereas an infrafunction over policies is not (to the extent that stopping humans doesn't itself cross boundaries). This seems necessary because it needs to consider the consequences of the actions of optimizers it creates, and this generalizes easily to all consequences since it needs to be robust.

^{^}
Where I'm defining reflective stability as: If you have an anti-Goodhart modification in your decision process (e.g. randomization), ~never follow a plan that indirectly avoids the anti-Goodhart modification (e.g. making a non-randomized optimizer).
The key difficulty here being that the default pathway for achieving a difficult task involves creating new optimization procedures, and by default these won't have the same anti-Goodhart properties as the original.

[This comment is no longer endorsed by its author]Reply

[-]drocta2y20

I thought CDT was considered not reflectively-consistent because it fails Newcomb's problem?
(Well, not if you define reflective stability as meaning preservation of anti-Goodhart features, but, CDT doesn't have an anti-Goodhart feature (compared to some base thing) to preserve, so I assume you meant something a little broader?)
Like, isn't it true that a CDT agent who anticipates being in Newcomb-like scenarios would, given the opportunity to do so, modify itself to be not a CDT agent? (Well, assuming that the Newcomb-like scenarios are of the form "at some point in the future, you will be measured, and based on this measurement, your future response will be predicted, and based on this the boxes will be filled")

My understanding of reflective stability was "the agent would not want to modify its method of reasoning". (E.g., a person with an addiction is not reflectively stable, because they want the thing (and pursue the thing), but would rather not want (or pursue) the thing.
The idea being that, any ideal way of reasoning, should be reflectively stable.

And, I thought that what was being described in the part of this article about recovering quantilizers, was not saying "here's how you can use this framework to make quantalizers better", so much as "quantilizers fit within this framework, and can be described within it, where the infrafunction that produces quantilizer-behavior is this one: [the (convex) set of utility functions which differ (in absolute value) from the given one, by, in expectation under the reference policy, at most epsilon]"

So, I think the idea is that, a quantilizer for a given utility function and reference distribution $ν$ is, in effect, optimizing for an infrafunction that is/corresponds-to the set of utility functions $V$ satisfying the bound in question,
and, therefore, any quantilizer, in a sense, is as if it "has this bound" (or, "believes this bound")

And that therefore, any quantilizer should -

- wait.. that doesn't seem right..? I was going to say that any quantilizer should therefore be reflectively stable, but that seems like it must be wrong? What if the reference distribution includes always taking actions to modify oneself in a way that would result in not being a quantilizer? uhhhhhh

Ah, hm, it seems to me like the way I was imagining the distribution $ν$ and the context in which you were considering it, are rather different. I was thinking of $ν$ as being an accurate distribution of behaviors of some known-to-be-acceptably-safe agent, whereas it seems like you were considering it as having a much larger support, being much more spread out in what behaviors it has as comparably likely to other behaviors, with things being more ruled-out rather than ruled-in ?

[-]Jeremy Gillen2y20

Good point on CDT, I forgot about this. I was using a more specific version of reflective stability.

> - wait.. that doesn't seem right..?

Yeah this is also my reaction. Assuming that bound seems wrong.

I think there is a problem with thinking of as a known-to-be-acceptably-safe agent, because how can you get this information in the first place? Without running that agent in the world? To construct a useful estimate of the expected value of the "safe"-agent, you'd have to run it lots of times, necessarily sampling from it's most dangerous behaviours.

Unless there is some other non-empirical way of knowing an agent is safe?

Yeah I was thinking of having large support of the base distribution. If you just rule-in behaviours, this seems like it'd restrict capabilities too much.

[-]drocta2y20

Well, I was kinda thinking of as being, say, a distribution of human behaviors in a certain context (as filtered through a particular user interface), though, I guess that way of doing it would only make sense within limited contexts, not general contexts where whether the agent is physically a human or something else, would matter. And in this sort of situation, well, the action of "modify yourself to no-longer be a quantilizer" would not be in the human distribution, because the actions to do that are not applicable to humans (as humans are, presumably, not quantilizers, and the types of self-modification actions that would be available are not the same). Though, "create a successor agent" could still be in the human distribution.

Of course, one doesn't have practical access to "the true probability distribution of human behaviors in context M", so I guess I was imagining a trained approximation to this distribution.

Hm, well, suppose that the distribution over human-like behaviors includes both making an agent which is a quantilizer and making one which isn't, both of equal probability. Hm. I don't see why a general quantilizer in this case would pick the quantilizer over the plain optimizer, as the utility...

Hm...
I get the idea that the "quantilizers correspond to optimizing an infra-function of form [...]" thing is maybe dealing with a distribution over a single act?

Or.. if we have a utility function over histories until the end of the episode, then, if one has a model of how the environment will be and how one is likely to act in all future steps, given each of one's potential actions in the current step, one gets an expected utility conditioned on each of the potential actions in the current step, and this works as a utility function over actions for the current step,
and if one acts as a quantilizer over that, each step.. does that give the same behavior as an agent optimizing an infra-function defined using the condition with the $L_{1}$ norm described in the post, in terms of the utility function over histories for an entire episode, and reference distributions for the whole episode?

argh, seems difficult...

LESSWRONG
LW

LESSWRONG
LW

61

Infrafunctions and Robust Optimization

61

Ω 33

61

Ω 33