Embedded Agency: Not Just an AI Problem

[-]Richard_Ngo6y130

We have strong outside-view reasons to expect that the information processing in question probably approximates Bayesian reasoning (for some model of the environment), and the decision-making process approximately maximizes some expected utility function (which itself approximates fitness within the ancestral environment).

The use of "approximates" in this sentence (and in the post as a whole) is so loose as to be deeply misleading - for the same reasons that the "blue-minimising robot" shouldn't be described as maximising some expected utility function, and the information processing done by a single neuron shouldn't be described as Bayesian reasoning (even approximately!)

[-]johnswentworth6y160

I think the idea that real-world coherence can't work mainly stems from everybody relying on the VNM utility theorem, and then trying to make it work directly without first formulating the agent's world-model as a separate step. If we just forget about VNM utility theorem and come at the problem from a more principled Bayesian angle instead, things work out just fine.

Here's the difference: VNM utility theorem postulates "lotteries" as something already present in the ontology. Agents have preferences over lotteries directly, and agents' preferences must take probabilities as inputs. There's no built-in notion of what exactly "randomness" means, what exactly a "probability" physically corresponds to, or anything like that. If we formulate those notions correctly, then things work, but VNM utility does not itself provide the formulation, so everybody gets confused.

Contrast that with e.g. FTAP + dutch book arguments: these provide a similar-looking conclusion to VNM utility theory (i.e. maximize expected utility), but the assumptions are quite different. In particular, they do not start with any inherent notion of "probability" - assuming inexploitability, they show that some (not necessarily unique) probability distribution exists, under which the agent can be interpreted as maximizing utility. This puts focus on the real issue: what exactly is the agent's world-model?

As you say in the post you linked:

those hypothetical choices are always between known lotteries with fixed probabilities, rather than being based on our subjective probability estimates as they are in the real world... VNM coherence is not well-defined in this setup, so if we want to formulate a rigorous version of this argument, we’ll need to specify a new definition of coherence which extends the standard instantaneous-hypothetical one.

... which is exactly right. That's why I consider VNM coherence a bad starting point for this sort of thing.

Getting more into the particulars of that post...

I would summarize the main argument in your post as roughly: "we can't observe counterfactual behavior, and without that we can't map the utility function, unless the utility function is completely static and depends only on current state of the world." So we can't map utilities over trajectories, we can't map off-equilibrium strategies, we can't map time-dependent utilities, etc.

The problem with that line of argument is that it treats the agent as a black box. Breaking open the black box is what embedded agency is all about, including all the examples in the OP. Once the black box is open, we do not need to rely on observed behavior - we know what the internal gears are, so we can talk about counterfactual behavior directly. In particular, once the black box is open, we can (in principle) talk about the agent's internal ontology. Once the agent's internal ontology is known, possibilities like "the agent prefers to travel in circles" are hypotheses we can meaningfully check - not by observing the agent's behavior, but by seeing what computation it performs with its internal notion of "travelling in circles".

[-]Pattern6y40

What's the connection to "Embedded Agency", and what do you mean by using the term?

(This piece sounds like it's about extracting utility functions and probability distributions, and it's not clear how that's related (in the framework this post outlines).)

[-]Vaniver6y300

See here.

[As a side note, I notice that the habit of "pepper things with hyperlinks whenever possible" seems to be less common on modern LW than it was on old LW, but I think it was actually a pretty great habit and I'd like to see more of it.]

[-]johnswentworth6y40

Thanks for bringing up the hyperlink thing; I will use them more liberally in the future. When writing for a LW audience, I tend to lean toward fewer links to avoid sounding patronizing. But actually thinking about it for a second, that seems like a very questionable gain with a significant cost.

[-]habryka6y20

Yeah, this seems true. Might be subtle UI things. We could probably also push towards this by making searching for links easier, for example by having a Github style search that shows up when you start typing some character (like / or #)

[-]dxu6y20

Seconded.

[-]johnswentworth6y20

Let me know if you've read the link Vaniver gave and the connection still isn't clear. If that's the case, then there's an inferential gap I've failed to notice, and I'll probably write a whole additional post to flesh out that connection.

[-]Gurkenglas6y10

One way that comes to mind is to use the constructive VNM utility theorem proof. The construction is going to be approximate because the system's rationality is. So next things to study include in what way the rationality is approximate, and how well this and other constructions preserve this (and other?) approximations.

Oh, and isn't inverse reinforcement learning about this?

[-]johnswentworth6y80

See my reply to ricraz's comment for my thoughts on using VNM utility theorem in general. The use you suggest could work, but if we lean on VNM then the hard part of the problem is backing out the agent's internal probabilistic model.

IRL is about this, but the key difference is that it black-boxes the agent. It doesn't know what the agent's internal governing equations look like, it just sees the outputs.

LESSWRONG
LW

LESSWRONG
LW

15

Embedded Agency: Not Just an AI Problem

15

15

Biology

Economics

Neuro/Psych/FAI

ML/AI