tailcalled - LessWrong

Is instrumental convergence a thing for virtue-driven agents?

The assumption of virtue ethics isn't that virtue is unknown and must be discovered - it's that it's known and must be pursued.

If it is known, then why do you not ever answer my queries about providing an explicit algorithm for converting intelligence into virtuous agency, instead running in circles about how There Must Be A Utility Function!?

If the virtuous action, as you posit, is to consume ice cream, intelligence would allow an agent to acquire more ice cream, eat more over time by not making themselves sick, etc.

I'm not disagreeing with this, I'm saying that if you apply the arguments which show that you can fit a utility function to any policy to the policies that turn down some ice cream, then as you increase intelligence and that increases the pursuit of ice cream, the resulting policies will score lower on the utility function which values turning down ice cream.

But any such decision algorithm, for a virtue ethicist, is routing through continued re-evaluation of whether the acts are virtuous, in the current context, not embracing some farcical LDT version of needing to pursue ice cream at all costs. Your assumption, which is evidently that the entire thing turns into a compressed and decontextualized utility function ("algorithm") is ignoring the entire hypothetical.

You're the one who said that virtue ethics implies a utility function! I didn't say anything about it being compressed and decontextualized, except as a hypothetical example of what virtue ethics is because you refused to provide an implementation of virtue ethics and instead require abstracting over it.

I'm not interested in continuing this conversation until you stop strawmanning me.

Is instrumental convergence a thing for virtue-driven agents?

tailcalled3d20

No, that's not my argument.

Let's imagine that True Virtue is seeking and eating ice cream, but that you don't know what true virtue is for some reason.

Now let's imagine that we have some algorithm for turning intelligence into virtuous agency. (This is not an assumption that I'm willing to grant (since you haven't given something like argmax for virtue), and really that's the biggest issue with my proposal, but let's entertain it to see my point.)

If the algorithm is run on the basis of some implementation of intelligence that is not good enough, then the resulting agent might turn down some opportunities to get ice cream, by mistake, and instead do something else, such as pursue money (but less money than you could get the ice cream for). As a result of this, you would conclude that pursuing ice cream is not virtuous, or at least, not as virtuous as pursuing money.

If you then turn up the level of intelligence, the resulting agent would pursue ice cream in this situation where it previously pursued virtue. However, this would make it score worse on your inferred utility function where pursuing money is more virtuous than pursuing intelligence.

Now of course you could say that your conclusion that pursuing ice cream is less virtuous than pursuing money is wrong. But then you can only say that if you grant that you cannot infer a virtue-ethical utility function from a virtue-ethical policy, as this utility function was inferred from the policy.

Is instrumental convergence a thing for virtue-driven agents?

tailcalled3d20

I didn't say you need to understand what an argument is, I said you need to understand your own argument.

It is true that if the utility functions cover a sufficiently broad set of possibilities, any "reasonable" policy (for a controversial definition of "reasonable") maximizes a utility function, and if the utility functions cover an even broader set of possibilities, literally any policy maximizes a utility function.

But, if you want to reference these facts, you should know why they are true. For instance, here's a rough sketch of a method for finding a utility function for the first statement:

If you ask a reasonable policy to pick between two options, it shouldn't have circular preferences, so you should be able to offer it different options and follow the preferred one until you find the absolute best scenario according to the policy. Similarly, you should be able to follow the dispreferred one until you find the absolute worst scenario according to the policy. Then you can define the utility of any outcome based on the probability mixture of the best and worst scenario where the policy switches between preferring the outcome vs preferring the probability mixture.

Now let's say there's an option where e.g. you're not smart enough to realize that option gives you ice cream. Then you won't be counting the ice cream when you decide at what threshold you prefer that option to the mixture. But then that means the induced utility function won't include the preference for ice cream.

Is instrumental convergence a thing for virtue-driven agents?

tailcalled3d-20

I'm showing that the assumptions necessary for your argument don't hold, so you need to better understand your own argument.

Is instrumental convergence a thing for virtue-driven agents?

tailcalled4d20

The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.

Is instrumental convergence a thing for virtue-driven agents?

tailcalled4d20

This.

In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.

In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.

While there are certain circumstances where consequentialism can specify this virtue, it's quite difficult to do in general. (E.g. you can't just minimize the difference between f(x) and y because then it might manipulate x instead of y.)

Is instrumental convergence a thing for virtue-driven agents?

tailcalled5d62

I didn't claim virtue ethics says not to predict consequences of actions. I said that a virtue is more like a procedure than it is like a utility function. A procedure can include a subroutine predicting the consequences of actions and it doesn't become any more of a utility function by that.

The notion that "intelligence is channeled differently" under virtue ethics requires some sort of rule, like the consequentialist argmax or Bayes, for converting intelligence into ways of choosing.

Is instrumental convergence a thing for virtue-driven agents?

tailcalled5d18-3

Consequentialism is an approach for converting intelligence (the ability to make use of symmetries to e.g. generalize information from one context into predictions in another context or to e.g. search through highly structured search spaces) into agency, as one can use the intelligence to predict the consequences of actions and find a policy which achieves some criterion unusually well.

While it seems intuitively appealing that non-consequentialist approaches could be used to convert intelligence into agency, I have tried a lot and not been able to come up with anything convincing. For virtues in particular, I would intuitively think that a virtue is not a motivator per se, but rather the policy generated by the motivator. So I think virtue-driven AI agency just reduces to ordinary programming/GOFAI, and that there's no general virtue-ethical algorithm to convert intelligence into agency.

The most straightforward approach to programming a loyal friend would be to let the structure of the program mirror the structure^[1] of the loyal friendship. That is, you would think of some situation that a loyal friend might encounter, and write some code that detects and handles this situation. Having a program whose internal structure mirrors its external behavior avoids instrumental convergence (or any kind of convergence) because each behavior is specified separately and one can make arbitrary exceptions as one sees fit. However, it also means that the development and maintenance burden scales directly with how many situations the program generalizes to.

^{^}
This is the "standard" way to write programs - e.g. if you make a SaaS app, you often have template files with a fairly 1:1 correspondence to the user interface, database columns with a 1:many correspondence to the user interface fields, etc.. By contrast, a chess bot that does a tree search does not have a 1:1 correspondence between the code and the plays; for instance the piece value table does not clearly affect it's behavior in any one situation, but obviously kinda affects its behavior in almost all situations. (I don't think consequentialism is the only way for the structure of a program to not mirror the structure of its behavior, but it's the most obvious way.)

Latent variables for prediction markets: motivation, technical guide, and design considerations

tailcalled13d20

Not sure what you mean. Are you doing a definitional dispute about what counts as the "standard" definition of Bayesian networks?

Latent variables for prediction markets: motivation, technical guide, and design considerations

tailcalled13d20

Your linked paper is kind of long - is there a single part of it that summarizes the scoring so I don't have to read all of it?

Either way, yes, it does seem plausible that one could create a market structure that supports latent variables without rewarding people in the way I described it.

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments