mattmacdermott

Wiki Contributions

Comments

We’re already comparing to the default outcome in that we’re asking “what fraction of the default expected utility minus the worst comes from outcomes at least this good?”.

I think you’re proposing to replace “the worst” with “the default”, in which case we end up dividing by zero.

We could pick some other new reference point other than the worst, but different to the default expected utility. (But that does introduce the possibility of negative OP and still have sensitivity issues).

Nice, I'd read the first but didn't realise there were more. I'll digest later.

I think agents vs optimisation is definitely reality-carving, but not sure I see the point about utility functions and preference orderings. I assume the idea is that an optimisation process just moves the world towards states, but an agent tries to move the world towards certain states i.e. chooses actions based on how much they move the world towards certain states, so it make sense to quantify how much of a weighting each state gets in its decision-making. But it's not obvious to me that there's not a meaningful way to assign weightings to states for an optimisation process too - for example if a ball rolling down a hill gets stuck in the large hole twice as often as it gets stuck in the medium hole and ten times as often as the small hole, maybe it makes sense to quantify this with something like a utility function. Although defining a utility function based on the typical behaviour of the system and then trying to measure its optimisation power against it gets a bit circular.

Anyway, the dynamical systems approach seems good. Have you stopped working on it?

Probably the easy utility function makes agent 1 have more optimisation power. I agree this means comparisons between different utility functions can be unfair, but not sure why that rules out a measure which is invariant under positive affine transformations of a particular utility function?

Hm, I'm not sure this problem comes up.

Say I've built a room-tidying robot, and I want to measure its optimisation power. The room can be in two states: tidy or untidy. A natural choice of default distribution is my beliefs about how tidy the room will be if I don't put the robot in it. Let's assume I'm pretty knowledgeable and I'm extremely confident that in that case the room will be untidy: and (we do have to avoid probabilities of 0, but that's standard in a Bayesian context). But really I do put the robot in and it gets the room tidy, for an optimisation power of bits.

That 11 bits doesn't come from any uncertainty on my part about the optimisation process, although it does depend on my uncertainty about what would happen in the counterfactual world where I don't put the robot in the room. But becoming more confident that the room would be untidy in that world makes me see the robot as more of an optimiser.

Unlike in information theory, these bits aren't measuring a resolution of uncertainty, but a difference between the world and a counterfactual.

An interesting point about the agency-as-retargetable-optimisation idea is that it seems like you can make the perturbation in various places upstream of the agent's decision-making, but not downstream, i.e. you can retarget an agent by perturbing its sensors more easily than its actuators.

For example, to change a thermostat-controlled heating system to optimise for a higher temperature, the most natural perturbation might be to turn the temperature dial up, but you could also tamper with its thermistor so that it reports lower temperatures. On the other hand, making its heating element more powerful wouldn't affect the final temperature.

I wonder if this suggests that an agent's goal lives in the last place in a causal chain of things you can perturb to change the set of target states of the system.

Do you expect useful generic descriptive models of agency to exist?

Nice, thanks. It seems like the distinction the authors make between 'building agents from the ground up' and 'understanding their behaviour and predicting roughly what they will do' maps to the distinction I'm making, but I'm not convinced by the claim that the second one is a much stronger version of the first.

The argument in the paper is that the first requires an understanding of just one agent, while the second requires an understanding of all agents. But it seems like they require different kinds of understanding, especially if the agent being built is meant to be some theoretical ideal of rationality. Building a perfect chess algorithm is just a different task to summarising the way an arbitrary algorithm plays chess (which you could attempt without even knowing the rules).

1. In our universe, as opposed to the "current basic theory of AI" universe.

2. From Arbital:

A Cartesian agent setup is one where the agent receives sensory information from the environment, and the agent sends motor outputs to the environment, and nothing else can cross the "Cartesian border" separating the agent and environment. If you can eat a psychedelic mushroom that affects the way you process the world - not just presenting you with sensory information, but altering the computations you do to think - then this is an example of an event that "violates the Cartesian boundary". Likewise if the agent drops an anvil on its own head. Nothing that happens in a Cartesian universe can kill a Cartesian agent or modify its processing; all the universe can do is send the agent sensory information, in a particular format, that the agent reads.

3. For embedded agency. In the old frame agents aren't really made of anything.

Thanks. Is there a particular source whose notation yours most aligns with?

When you write  I understand that to mean that  for all . But when I look up definitions of conditional probability it seems that that notation would usually mean  for all  

Am I confused or are you just using non-standard notation?

Load More