A stylized dialogue on John Wentworth's claims about markets and optimization

[-]Algon3y3729

I really want to see a debate between Eliezer/Nate and Scott Garrabrant on the necessity of utility functions. More generally, I would like to know what the cruxes between MIRI researchers are on various topics in alignment.

[-]PaulK3y202

IMO, coordination difficulties among sub-agents can't be waved away so easily. The solutions named, side-channel trades and counterfactual coordination, are both limited.

I would frame the nature of their limits, loosely, like this. In real minds (or at least the human ones we are familiar with), the stuff we care about lives in a high-dimensional space. A mind could be said to be, roughly, a network spanning such a space. A trade between elements (~sub-agents) that are nearby in this space will not be too hard to do directly. But for long-distance trades, side-channel reward will need to flow through a series of intermediaries -- this might involve several changes of local currencies (including traded favors or promises). Each local exchange needs to be worthwhile to its participants, and not overload the relationships that it's piggybacking on.

These long-distance trades can be really difficult to set up sometimes. The same way it would be hard for a random villager in the middle ages in France to send $10 to another random villager in China.

The difficulty depends on things like the size / dimensionality of the space; how well-connected it is; and how much slack is available in the relevant places in the system (for the intermediate elements to wiggle around enough to make all the local trades possible). Note that the need for slack makes this a holistic constraint: if you just have one really important trade to make, then sure, you can probably make it happen, by using up lots of slack (locking a lot of intermediate elements into orientations optimized for that big trade). But you can't do that for every possible trade. So these issues really show up when you have a lot of heterogeneous trades to make.

Counterfactual ("logical" ) coordination has similar issues. If A and B want to counterfactually coordinate, but they're far apart in this mind-space, then they can only communicate or understand one another in a limited way, via intermediaries (or via the small # of dimensions they do share). This just makes things harder -- hard to get shared meaning, hard to agree on what's fair, hard to find a solution together that will generalize well instead of being brittle.

BTW, I'm not denying that intelligence (whatever that might mean) helps with all this, but I am denying that it's a panacea.

[-]quetzal_rainbow3y40

It seems to me that this is basically solved by "you put probability distributions over all things that you don't actually know and may have disagreement about"

[-]PaulK3y10

This is for logical coordination? How does it help you with that?

[-]quetzal_rainbow3y30

Like it helps everywhere when uncertainty is here? Imagine a problem "You are in Prisoner's dilemma with such and such payoffs, find optimal strategy if distribution of your possible opponents is 25% CooperateBots, 33% DefectBots and 42% those who actually knows decision theory".

[-]PaulK3y21

I still don't know exactly what parts of my comment you're responding to. Maybe talking about a concrete sub-agent coordination problem would help ground this more.

But as a general response: in your example it sounds like you already have the problem very well narrowed down, to 3 possibilities with precise probabilities. What if there were 10^100 possibilities instead? Or uncertainty where the full real thing is not contained in the hypothesis space?

[-]eapi3y10

Loosely related to this, it would be nice to know if systems which reliably don't turn down 'free money' must necessarily have almost-magical levels of internal coordination or centralization. If the only things which can't (be tricked into) turn(ing) down free money when the next T seconds of trade offers are known are Matrioshka brains at most T light-seconds wide, does that tell us anything useful about the limits of that facet of dominance as a measure of agency?

[-]harsimony3y117

Could someone help me collect the relevant literature here?

I think the complete class theorems are relevant: https://www.lesswrong.com/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations

The Non-Existence of Representative Agents: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3302656

Representative Agents: https://en.wikipedia.org/wiki/Representative_agent

John Wentworth on Subagents: https://www.lesswrong.com/posts/3xF66BNSC5caZuKyC/why-subagents

[-]Richard_Ngo3yΩ892

Ty for post. Just for reference, does John endorse this summary?

[-]So8res3yΩ450

John said "there was not any point at which I thought my views were importantly misrepresented" when I asked him for comment. (I added this note to the top of the post as a parenthetical; thanks.)

[-]johnswentworth3yΩ14235

More details:

I think the argument Nate gave is at least correct for markets of relatively-highly-intelligent agents, and that was a big update for me (thankyou Nate!). I'm still unsure how far it generalizes to relatively less powerful agents.
Nate left out my other big takeaway: Nate's argument here implies that there's probably a lot of money to be made in real-world markets! In practice, it would probably look like an insurance-like contract, by which two traders would commit to the "side-channel trades at non-market prices" required to make them aggregate into an expected utility maximizer. (Obviously the contract wouldn't be phrased in those terms; most of the work of implementation would be to figure out what trades need to occur in practice under what conditions to achieve aggregability, and then figuring out simple approximations of those conditions to write into contracts.)
In the year since this discussion, I've also understood better why Nate seems to care mostly about the principles of relatively-highly-intelligent agents, as opposed to e.g. humans. I think that crux was mostly about corrigibility as an alignment target, and I have updated substantially toward that position as well.
My main remaining disagreement, for purposes of applying this argument to superhuman AI, is that an intelligence which originally develops as a market of relatively-weak agents does not obviously choose to self-modify in the way described in the post, in the process of becoming more intelligent. It's not clear that the component weak subagents themselves "upgrade".
- Analogy for humans: insofar as human values are well thought of as a "market of weak subagents", it's not clear to me that making the individual subagents more capable (to the point where they make the sort of trades required by Nate's argument) is actually the way I'd prefer to upgrade myself. I'm not convinced that that would actually be the right way to reflectively extend my extant values.

[-]Svyatoslav Usachev3y30

Real markets mostly have it covered, because they have something close to [aggregated] utilons -- money, and so exchanges between 2 different goods rarely take place.

Also, any business can be seen as a "side-channel trade" -- the market value of one individual's time is often lower than the value they can produce in cooperation with others.

[-]Mikhail Samin3y70

How do I test whether I actually understand the sort of thing Nate describes and not just consider it obvious in hindsight? (I feel like I was able to explain it to people before reading this post)

[-]Ruby3y40

Curated. I love a good dialog, one where two parties are responding not to the modeled objector, but actually another person who gets to speak back from their actual models. And as N says, he's sympathetic to people saying "none of you idiots have any idea wtf you're doing". In this case, N might have known, but I'd be keen to see the day N, J, or co. has their ignorance revealed.

[-]Max H3y*20

This recent tweet of Eliezer's crystallized a concept for me which I think is relevant to the concepts of optimization and agents discussed in the dialogue: https://twitter.com/ESYudkowsky/status/1639406023680344064

In complicated systems in real life, the thing that is better at "preimaging outcomes onto choices" is the scary one, and the interesting / complicated systems are the ones where the choosing algorithm is complex.

Sure, it's true that you can construct toy systems in restricted domains (like the mushrooms and peppers one) and define "agents" in these systems which technically violate certain efficiency assumptions.

But the reason these examples aren't compelling (to me) is that it's kind of obvious what all the agents in them will do, once you write down their utility functions and the starting resources available to them. There's not much complexity "left over" for interesting decision algorithms.

Two of the real-world examples in this dialogue actually demonstrate the difference between these kinds of systems nicely:

I could not step into the shoes of a successful hedge fund trader, and, given all the same choices and resources available to the trader, make decisions which result in more money in my trading account than than the original trader could.

OTOH, if I were some kind of ghost-in-the-machine of a bacterium making ATP, I could (probably) make the same (or better, in cases where that's possible) decisions that the actual bacterium is making, given all the same information and choices to available to it. (Though I might need a computer to keep track of all the hormones and blood-glucose levels and feedback loops.)

I can see how both examples might tell us something useful about intelligent systems, but the markets example seems more likely to have something to say about what the actual scary thing looks like.

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]metacoolus3y10

This discussion sheds light on an important consideration in AI: the loss or mutation of agency when aggregating systems or agents. It’s vital to remember that optimizers aren’t always constructed to fit alongside other optimizers neatly, and harmony can involve sacrifices. While John’s point about weak efficiency is well noted, finding a balance between descriptive and prescriptive models is essential. I appreciate the counterargument’s reasoning that capable aggregates of optimizers don't pass up certain gains. Energy is a zero-sum currency. When efficiencies are revealed, smart agents will find ways to fight for them.

[-]Tim Freeman3y1-2

>if you won't accept 1 pepper for 1 mushroom, then you should accept 2 mushrooms for 1 pepper

You need a bunch more assumptions for this to hold, and I would like to know what they are. For example: If I don't have or want any mushrooms, and nobody I know wants mushrooms, then I can't accept 1 pepper for 1 mushroom because I can't pay the mushroom. But it still doesn't make any sense for me to accept two mushrooms for one pepper either because I don't have any use for two mushrooms. To get intuition about this, replace "mushroom" with something that is both useless and unavailable, such as a pound of neutrinos in a box. There's no way to get the neutrinos into the box, and even if you had them in the box, they would leave the box instantly and still be useless.

In general, there is a tendency for people to use alleged theorems without checking the premises. You can get surprising outcomes when the premises don't hold.

[-]eapi3y10

the order-dimension of its preference graph is not 1 / it passes up certain gains

If the order dimension is 1, then the graph is a total order, right? Why the conceptual detour here?

[-]quetzal_rainbow3y10

Am I correct that "knowing what system thinks is fair" is equivalent to "knowing under which bargaining solution system acts"?

[-]David Johnston3y10

Which is closer to Nate’s position: a) competition leads to highly instrumentally efficient AIs or b) inductive biases lead to highly instrumentally efficient AIs?

[+]gpt4_summaries3y-6-6

^{^}

This example (of two cases where a market's decision about a trade differs depending on hidden state) relies on the initial wealth distributions being unequal. Legends hold that there are other examples where the hidden state doesn't depend on initial differences, if the utilities aren't logarithmic. John Wentworth tells me he cares in practice about this additional fact, and notes that further information can be found in the literature under the heading of "non-existence of representative agents". I have not myself constructed such an example, and would be interested if someone has a simple one.

^{^}

Hopefully there are canonical solutions. For instance, in an ultimatum game, the Schelling fair point is that both participants get utility halfway between their best and worst deals, which solution is invariant under affine transformation. Knowing that agents are willing to accept these canonical solutions as 'fair' does not seem like a large additional burden of knowledge.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

169

A stylized dialogue on John Wentworth's claims about markets and optimization

169

Ω 74

169

Ω 74