Notion of Preference in Ambient Control

[-]Tyrrell_McAllister15y10

This is unlike the situation with A and O, where the agent can't just perform action A, since it's not defined in the way the agent knows how to perform (even though A is (provably) equivalent to one of the constants, the agent can't prove that for any given constant).

It's probably a good idea to maintain the distinction between a constant symbol c and the element v(c) assigned to c by the interpretation's valuation map v into the domain of discourse.

For example, I found the quote above confusing, but I think that you meant the following: "This is unlike the situation with A and O, where the agent can't just perform action v(A), since it's not defined in the way the agent knows how to perform. It is true that we can prove, in the metalanguage, that there exists an action X such that v(A) = X. However, there is no action X such that, for some constant symbol 'X' such that v('X') = X, the agent can prove [A = 'X']."

[-]Vladimir_Nesov15y00

Not what I meant. The set of possible actions is defined syntactically, as a set of formulas that the agent, from outside its theory, can recognize and directly act upon. Definition of A (as it's syntactically given) is not one of those. Thus, the agent can't perform A directly, the best it can hope for is to find another formula B which defines the same value (in all models) and is a possible action. The agent stops short of this goal in proving a moral argument involving A and B instead, [A=B => U=large], and enacts this moral argument by performing B, which is a possible action (as a formula), unlike A. The agent, however, can't prove [A=B], even though [A=B] is provable in agent's theory (see the first named section).

[-]Tyrrell_McAllister15y00

Not what I meant. The set of possible actions is defined syntactically, as a set of formulas that the agent, from outside its theory, can recognize and directly act upon. Definition of A (as it's syntactically given) is not one of those. Thus, the agent can't perform A directly, the best it can hope for is to find another formula B which defines the same value (in all models) and is a possible action. The agent stops short of this goal in proving a moral argument involving A and B instead, [A=B => U=large], and enacts this moral argument by performing B, which is a possible action (as a formula), unlike A.

This looks to me like an explanation for why my original interpretation of your quote is a true statement. So I'm worried that I'm still misunderstanding you, since you say that my interpretation is not what you meant.

Here is my interpretation again, but in more syntactic terms:

"This is unlike the situation with A and O, where the agent can't just perform action v(A), since it's not defined in the way the agent knows how to perform. It is true that we can prove that, in every interpretation v, there is an action-constant X such that v(A) = v(X). However, there is no action-constant X such that the agent can prove [A = X]."

The rest of your parent comment explains why the symbol A will never appear in the position in moral arguments where action-constant-symbols appear. Is that right?

[-]Vladimir_Nesov15y00

While I don't disagree with what you are saying in your reformulation (but for insignificant details), it's a different statement from the one I was making. In my own words, you are stating that the agent can't prove A=X for any (constant denoting) possible action X, but I'm not stating that at all: I'm only saying that A itself is not a possible action, that is as a formula is not an element of the set of formulas that are possible actions. I also don't see why you'd want that v(-) thing in this context: the agent performs an action by examining formulas for possible actions "as text strings", not by magically perceiving their semantics.

[-]Tyrrell_McAllister15y00

I also don't see why you'd want that v(-) thing in this context: the agent performs an action by examining formulas for possible actions "as text strings", not by magically perceiving their semantics.

It's how I help myself to keep the map and the territory distinct. v(A), under the standard interpretation, is what the agent does. The constant A, on the other hand, is a symbol that the agent uses in its reasoning, and which isn't even defined in such a way that the agent can directly perform what it represents.

The valuation v is for my benefit, not the agent's. The agent doesn't use or perceive the semantics of its theory. But I do perceive the semantics when I reason about how the agent's reasoning will effect its actions.

[-]Vladimir_Nesov15y00

v(A), under the standard interpretation,

What does standard interpretation have to do with this? If v(-) maps from formulas to actions, fine, but then A is just a string, so interpretations don't matter.

[-]Tyrrell_McAllister15y00

I think that I'm missing your point. Of course, the interpretation doesn't affect what the agent or its theory can prove. Is that all you're saying?

The reason that I'm led to think in terms of semantics is that your post appeals to properties of the agent that aren't necessarily encoded in the agent's theory. At least, the current post doesn't explicitly say that these properties are encoded in the theory. (Maybe you made it clear how this works in one of your previous post. I haven't read all of these closely.)

The properties I'm thinking of are (1) the agent's computational constraints and (2) the fact that the agent actually does the action represented by the action-constant that yields the highest computed utility, rather than merely deducing that that constant has the highest computed utility.

For example, you claim that [A=1] must be derivable in the theory if the agent actually does A. The form of your argument, as I understand it, is to note that [A=1] is true in the standard interpretation, and to show that [A=1] is the sort of formula which, if true under one interpretation, must be true in all, so that [A=1] must be a theorem by completeness. I'm still working out why [A=1] is the required kind of formula, but the form of your argument does seem to appeal to a particular interpretation before generalizing to the rest.

[-]Vladimir_Nesov15y00

For example, you claim that [A=1] must be derivable in the theory if the agent actually does A.

If the agent actually does 1 (I assume you meant to say). I don't see what you are trying to say again. I agree with the last paragraph (you could recast the argument that way), but don't understand the third paragraph.

[-]Tyrrell_McAllister15y00

If the agent actually does 1 (I assume you meant to say).

Whoops. Right.

I don't see what you are trying to say again. I agree with the last paragraph (you could recast the argument that way), but don't understand the third paragraph.

Okay. Let me try to make my point by building on the last paragraph, then. According to my understanding, you start out knowing that v(A) = v(1) for a particular interpretation v. Then you infer that v'(A) = v'(1) for an arbitrary interpretation v'. Part of my reason for using the v(.) symbol is to help myself keep the stages of this argument distinct.

[-]Vladimir_Nesov15y00

According to my understanding, you start out knowing that v(A) = v(1) for a particular interpretation v.

If v is an interpretation, it maps (all) terms to elements of corresponding universe, while possible actions are only some formulas, so associated mapping K would map some formulas to the set of actions (which don't have to do anything with any universe). So, we could say that K(1)=1', but K(A) is undefined. K is not an interpretation.

[-]Tyrrell_McAllister15y00

If v is an interpretation, it maps (all) terms to elements of corresponding universe, while possible actions are only some formulas, . . .

Maybe we're not using the terminology in exactly the same way.

For me, an interpretation of a theory is an ordered pair (D, v), where D is a set (the domain of discourse), and v is a map (the valuation map) satisfying certain conditions. In particular, D is the codomain of v restricted to the constant symbols, so v actually contains everything needed to recover the interpretation. For this reason, I sometimes abuse notation and call v itself the interpretation.

The valuation map v

maps constant symbols to elements of D,
maps n-ary function symbols to maps from D^n to D,
maps n-ary predicate symbols to subsets of D^n,
maps sentences of the theory into {T, F}, in a way that satisfies some recursive rules coming from the rules of inference.

Now, in the post, you write

Each such statements defines a possible world Y resulting from a possible action X. X and Y can be thought of as constants, just like A and O, or as formulas that define these constants, so that the moral arguments take the form [X(A) => Y(O)].

(Emphasis added.) I've been working with the bolded option, which I understand to be saying that A and 1 are constant symbols. Hence, given an interpretation (D, v), v(A) and v(1) are elements of D, so we can ask whether they are the same elements.

[-]Vladimir_Nesov15y00

I agree with everything you wrote here...

[-]Tyrrell_McAllister15y00

I agree with everything you wrote here...

What was your "associated mapping K"? I took it to be what I'm calling the valuation map v. That's the only map that I associate to an interpretation.

[-]Vladimir_Nesov15y00

K has a very small domain. Say, K("2+2")=K("5")="pull the second lever", K("4") undefined, K("A") undefined. Your v doesn't appear to be similarly restricted.

[-]Tyrrell_McAllister15y10

The new symbols extend the language, while their definitions, obtained from agent and world programs respectively by standard methods of defining recursively enumerable functions, extend the theory.

I haven't yet read beyond this point, but this is a kind of confusing thing to write. Definitions can't extend a theory, because they don't give you new theorems. My assumption is that you will add axioms that incorporate the new symbols, and the axioms will extend the theory.

[-]Vladimir_Nesov15y40

Definitions can't extend a theory, because they don't give you new theorems.

A conservative extension of a language/theory doesn't introduce new theorems in the old language, but could introduce new theorems that make use of new symbols, although in the case of extension by definitions, all new theorems can also be expressed in the smaller (original) language and would be the theorems of original theory.

[-]Tyrrell_McAllister15y10

Okay, thanks. I didn't know that adding certain kinds of axioms was called "extension by definitions".

[-][anonymous]15y00

This is unlike the situation with A and O, where the agent can't just perform action A, since it's not defined in the way the agent knows how to perform (even though A is (provably) equivalent to one of the constants, the agent can't prove that for any given constant).

It might be clearer to maintain the distinction between a constant symbol c and the element v(c), in the domain of discourse, assigned to c by the interpretation valuation v.

For example, I found the quote above confusing, but I think that you meant "This is unlike the situation with v(A) and v(O), where the agent can't just perform action v(A), since it's not defined in the way the agent knows how to perform (even though v(A) is (provably, in the metalogic) equal to the interpretation of one of the constants, the agent can't prove that for any given constant)."

[-]Will_Sawin15y00

Some axioms are definitions.

Previous theorem: All unmarried men are not married New definition: "Bachelor" means "unmarried man" New theorem: All bachelors are unmarried men.

I'm pretty sure that's what he means. Hopefully clarified, if not made perfectly in accord with standard definitions.

[-]Tyrrell_McAllister15y10

I'm pretty sure that's what he means.

I think that he means something analogous to the way that we can add some axioms involving the symbol "+" to the Peano axioms, and then show in second-order logic that the new axioms define addition uniquely.

[-][anonymous]15y00

Still commenting while reading:

The agent normally won't even know "explicit values" of actual action and actual outcome. Knowing actual value would break the illusion of consistent consequences: suppose the agent is consistent, knows that A=2, and isn't out of time yet, then it can prove [A=1 => O=100000], even if in fact O=1000, use that moral argument to beat any other with worse promised outcome, and decide A=1, contradiction.

This would only happen if the agent had a rule of inference that allowed it to infer from

A=1 => O=100000

and

all other promised outcomes are worse than 100000

that

A = 1.

But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn't put this rule of inference into the theory.

ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant 1 by the interpretation. But the point is that the agent doesn't bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.

[-]Jonathan_Graehl15y10

Consider a first-order language and a theory in that language (defining the way agent reasons, the kinds of concepts it can understand and the kinds of statements it can prove). This could be a set theory such as ZFC or a theory of arithmetic such as PA. The theory should provide sufficient tools to define recursive functions and/or other necessary concepts.

This is unclear to me, and I've read and understood Enderton. I would have thought that ZFC and PA were sets of axioms and would say nothing about how an agent reasons.

Also,

In first-order logic, all valid statements are also provable by a formal syntactic argument.

Do you mean in the context of some axioms? (of course, you can always talk about whether the statement "PA implies X" is valid, so it doesn't really matter).

I haven't read the rest yet. I'm confident that you have a very precise and formally defined idea in mind, but I'd appreciate it if you could spell out your definitions, or link to them (mathworld, wikipedia, or even some textbook).

[-]Vladimir_Nesov15y00

I would have thought that ZFC and PA were sets of axioms and would say nothing about how an agent reasons.

The other way around: the agent reasons using ZFC or PA. (And not just sets of axioms, but associated deductive system, so rules of what can be proved how.)

In first-order logic, all valid statements are also provable by a formal syntactic argument.

I simply mean completeness of first-order logic.

[-]Jonathan_Graehl15y00

Okay, thanks. I'll certainly read the rest tomorrow :)

[-]Will_Sawin15y00

An agent that reasoned by proving things in ZFC could exist.

Stupid argument: "This program, computed with this data, produces this result" is a statement in ZFC and is provable or disprovable as appropriate.

Obviously, a real ZFC-based AI would be more efficient than that.

ZFC is nice because Newton's laws, for example, can be formulated in ZFC but aren't computable. A computable agent could reason about those laws using ZFC, for example deriving the conservation of energy, which would allow him to compute certain things.

[-]Tyrrell_McAllister15y00

In first-order logic, all valid statements are also provable by a formal syntactic argument.

Do you mean in the context of some axioms?

A first-order logic comes with a set of axioms by definition.

[-]gRR14y00

Continuing from here...

The world() function can be without parameters and can call agent() directly, but it doesn't have to be defined in this way inside the agent's model of the world (the axioms in the first-order language). Instead, the world can be modeled as a function world(X) with X being a variable ranging over possible agents. Then the agent() can find the best possible agent and impersonate it. That is, prove that a decision A is what the best agent would do, and then return A.

This has the benefit that the agent can prove stuff about its true decision without generating contradictions. And also, the utility maximization principle will be embedded within the proof system, without the need for an external rule ("Choose best among moral arguments").

[-]Vladimir_Nesov15y00

(Clarified definition of utility function in the first paragraph of that section; the previous definition could be interpreted to allow constant functions defined to be equal to the actual utility for any argument and other difficult-to-reason-about uniqueness-breaking pathological functions.)

[-]Tyrrell_McAllister15y00

Still commenting while reading:

The agent normally won't even know "explicit values" of actual action and actual outcome. Knowing actual value would break the illusion of consistent consequences: suppose the agent is consistent, knows that A=2, and isn't out of time yet, then it can prove [A=1 => O=100000], even if in fact O=1000, use that moral argument to beat any other with worse promised outcome, and decide A=1, contradiction.

This would only happen if the agent had a rule of inference that allowed it to infer from

A=1 => O=100000

and

all other promised outcomes are worse than 100000

that

A = 1.

But why would the first-order theory use such a rule of inference? You seem to have just given an argument for why we shouldn't put this rule of inference into the theory.

ETA: I guess that my point leads right to your conclusion, and explains it. The agent is built so that, upon deducing the first two bullet-points, the agent proceeds to do the action assigned to the constant symbol 1 by the interpretation. But the point is that the agent doesn't bother to infer the third bullet-point; the agent just acts. As a result, it never deduces any formulas of the form [A=X], which is what you were saying.

[-]Vladimir_Nesov15y00

The agent never proves A=1, but it does (by assumption) prove A=2, while in fact it turns out that the agent acts as A=1 (without proving it), and so in the standard model it's true that A=1, while agent's theory says that in standard model A=2, which means that agent's theory is inconsistent, which contradicts the assumption that it's consistent.

[-]Tyrrell_McAllister15y00

The agent never proves A=1.

Okay. I was confused because you wrote that the agent "can . . . decide A=1", rather than "will do action 1."

Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent's theory isn't complete in general. So should we expect that [A=1] is a theorem of the agent's theory? (ETA1: Okay, I think that your comment here, explains why [A=1] must be a theorem if the agent actually does 1. But I still need to think it through.)

(ETA2: Now that I better understand the axiom that defines A in the theory, I see why [A=1] must be a theorem if the agent actually does 1.)

Also, it seems that your proof only goes through if the agent "knows that A=2" when the agent will not in fact do action v(2) (which directly contradict soundness). But if the agent knows [A=X], where v(X) is the actual action, then all we can conclude is that the agent declines to observe my first two bullet-points above in a way that would induce it to do v(1). (Here, v(X) is the interpretation of the constant symbol X under the standard model.)

[-]Vladimir_Nesov15y00

Is there a rule of inference in the system such that the first two bullet points above entail the third within the system? I see that [A=1] is true in the standard model, but the agent's theory isn't complete in general. So why should we expect that [A=1] is a theorem of the agent's theory?

Every recursive function is representable in Robinson's arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m => Q|- w(n,m) and F(n)<>m => Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.

That the agent doesn't avoid looking for further moral arguments to prove is reflected in "isn't out of time" condition, which is not formalized, and is listed as the first open problem. If in fact A=X, then it can't prove ludicrous moral arguments with A=X as the premise, only the actual utility, but it clearly can prove arguments that beat the true one by using a false premise.

[-]Tyrrell_McAllister15y00

Every recursive function is representable in Robinson's arithmetic Q, that is for any (say, 1-ary) recursive function F, there is a formula w such that F(n)=m => Q|- w(n,m) and F(n)<>m => Q|- ~w(n,m). Hence, statements like this that hold in the standard model, also hold in any model.

I think that I need to see this spelled out more. I take it that, in this case, your formula w is w(A,X) = [A=X]. What is your recursive function F?

[-]Vladimir_Nesov15y00

We are representing the no-arguments agent program agent() using a formula w with one free variable, such that agent()=n => Q|- w(n) and agent()<>n => Q|- ~w(n). Actual action is defined by the axiom w(A), where A is a new constant symbol.

[-]Tyrrell_McAllister15y00

Okay, I'm convinced. In case it helps someone else, here is how I now understand your argument.

We have an agent written in some program. Because the agent is a computer program, and because our 1st-order logic can handle recursion, we can write down a wff [w(x)], with one free variable x, such that, for any action-constant X, [w(X)] is a theorem if and only if the agent does v(X).

[ETA: Vladimir points out that it isn't handling recursion alone that suffices. Nonetheless, a theory like PA or ZFC is apparently powerful enough to do this. I don't yet understand the details of how this works, but it certainly seems very plausible to me.]

In particular, if the agent goes through a sequence of operations that conclude with the agent doing v(X), then that sequence of operations can be converted systematically into a proof of [w(X)]. Conversely, if [w(X)] is a theorem, then it has a proof that can be reinterpreted as the sequence of operations that the agent will carry out, and which will conclude with the agent doing v(X).

The wff [w(x)] also has the property that, given two constant U and V, [w(U) & w(V)] entails [U = V].

Now, the agent's axiom system includes [w(A)], where A is a constant symbol. Thus, if the agent does v(X), then [w(A)] and [w(X)] are both theorems, so we must have that [A=X] is a theorem.

[-]Vladimir_Nesov15y00

This works (although "because our 1st-order logic can handle recursion" is not it, etc.). (Note that "[w(X)] is a theorem of T if and only if X is as we need" is weak T-representability, while I cited the strong kind, that also guarantees [~w(X)] being a theorem if X is not as we need.)

[-]Tyrrell_McAllister15y00

(although "because our 1st-order logic can handle recursion" is not it, etc.).

That was there because of this line from your post: "The theory should provide sufficient tools to define recursive functions and/or other necessary concepts."

[-]Vladimir_Nesov15y00

Doesn't make it an explanatory sufficient condition to conclude what you did: I object to your use of "because".

[-]Tyrrell_McAllister15y00

Okay, thanks. My understanding is definitely vaguest at the point where the agent's program is converted into the wff [w(x)]. Still, the argument is at the point of seeming very plausible to me.

[-]Vladimir_Nesov15y00

No worries. Your logic seems rusty though, so if you want to build something in this direction, you should probably reread a good textbook.

[-]Tyrrell_McAllister15y00

Not so much rusty as never pursued beyond the basics. The logic I know is mostly from popular books like Gödel, Escher, Bach, plus a philosophy course on modal logic, where I learned the basic concepts used to talk about interpretations of theories.

[-]Tyrrell_McAllister15y00

Above, action and utility are defined separately, with axioms that generally don't refer to each other. Axioms that define action don't define utility, and conversely. Moral arguments, on the other hand, define utility in terms of action. If we are sure that one of the moral arguments proved by the agent refers to the actual action (without knowing which one; if we have to choose an actual action based on that set of moral arguments, this condition holds by construction), then actual utility is defined by the axioms of action (the agent) and these moral arguments, without needing preference (axioms of utility).

I'm a little confused here. Here is my understanding so far:

The agent has an axiom-set S and rules of inference, which together define the agent's theory. Given S and some computational constraints, the agent will deduce a certain set M of moral arguments. The moral arguments contain substrings of the form "U = U1". The largest constant U1* appearing in such substrings of the moral arguments in M is, by definition, the actual utility. The definition of "actual utility" that I just wrote (which is in the metalanguage, not the agent's language) is the preference.

In this sense, preference and actual utility are defined by all the axioms in S taken together. Similarly, the actual action is defined by all of S. So, what are you getting at when you talk about having one set of axioms define action while another set of axioms defines utility?

[-]Vladimir_Nesov15y20

There is some background theory S the agent reasons with, say ZFC. This theory is extended by definitions to define action A and utility U. Say, these extensions consist of sets of axioms AX and UX. Then, the agent derives the set of moral arguments M from theory S+AX+UX. By preference, I refer specifically to UX, which defines utility U in the context of agent's theory S. But if M is all (moral arguments) the agent will infer, then S+AX+M also defines U, just as well as S+AX+UX did. Thus, at that point, we can forget about UX and use M instead.

[-]Tyrrell_McAllister15y00

Okay, thanks. This is clear.

I'm not sure why you want to think in terms of S+AX+M instead of S+AX+UX, though. Doesn't starting with the axiom set S union AX union UX better reflect how the agent actually reasons?

[-]Vladimir_Nesov15y00

It does start with S+AX+UX, but it ends with essentially S+AX+M. This allows to understand the point of this activity better: by changing original axioms to equivalent ones, the agent expresses the initially separately defined outcome in terms of action, and uses that expression (dependence) to determine the outcome it prefers.

[-]Will_Sawin15y00

This post doesn't seem to introduce a lot of new concepts so I can't see much to discuss.

Open problems: Your first open problem might not technically always be true, but it doesn't really matter, because as you pointed out, the statement the agent uses to derive its action is always I don't see how you think agents can do anything agent-y without utility functions. The instrumental values + agents in games ones look interesting. I agree that the impossible one is impossible. Although it might be the kind of impossible thing you can do, or it might be the kind of impossible thing Godel proved you can't do.

One thing that's bothering me is that agents in the real world have to use dirty tricks to make problems simpler. Like, two strategies will do the same in 99% of situations, so let's ignore that part and focus on the rest, hmmm calculate calculate calculate this one is better. But when I try to formalize that I lose at Newcomb's problem. So that's an open problem?

[-]timtyler15y-10

(The impossible item.) Given an agent program, define its preference.

If you have the agent's program, you already have a pretty comprehensive model of it. It is harder to infer preferences from behaviour. That's the problem Tim Freeman addressed here.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

21

Notion of Preference in Ambient Control

21

21

Truth, provability, and provability by the agent

Sense and denotation

Abstract worlds

Possible actions and possible outcomes

Possible worlds

Controlling axiomatic definitions

Preference and utility

Merging axioms

Utility functions

Open problems