Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

jessicata's Comments

Two Alternatives to Logical Counterfactuals

The way you are using it doesn’t necessarily imply real control, it may be imaginary control.

I'm discussing a hypothetical agent who believes itself to have control. So its beliefs include "I have free will". Its belief isn't "I believe that I have free will".

It’s a “para-consistent material conditional” by which I mean the algorithm is limited in such a way as to prevent this explosion.

Yes, that makes sense.

However, were you flowing this all the way back in time?

Yes (see thread with Abram Demski).

What do you mean by dualistic?

Already factorized as an agent interacting with an environment.

Two Alternatives to Logical Counterfactuals

Secondly, “free will” is such a loaded word that using it in a non-standard fashion simply obscures and confuses the discussion.

Wikipedia says "Free will is the ability to choose between different possible courses of action unimpeded." SEP says "The term “free will” has emerged over the past two millennia as the canonical designator for a significant kind of control over one’s actions." So my usage seems pretty standard.

For example, recently I’ve been arguing in favour of what counts as a valid counterfactual being at least partially a matter of social convention.

All word definitions are determined in large part by social convention. The question is whether the social convention corresponds to a definition (e.g. with truth conditions) or not. If it does, then the social convention is realist, if not, it's nonrealist (perhaps emotivist, etc).

Material conditions only provide the outcome when we have a consistent counterfactual.

Not necessarily. An agent may be uncertain over its own action, and thus have uncertainty about material conditionals involving its action. The "possible worlds" represented by this uncertainty may be logically inconsistent, in ways the agent can't determine before making the decision.

Proof-based UDT doesn’t quite use material conditionals, it uses a paraconsistent version of them instead.

I don't understand this? I thought it searched for proofs of the form "if I take this action, then I get at least this much utility", which is a material conditional.

So, to imagine counterfactually taking action Y we replace the agent doing X with another agent doing Y and flow causation both forwards and backwards.

Policy-dependent source code does this; one's source code depends on one's policy.

I guess from a philosophical perspective it makes sense to first consider whether policy-dependent source code makes sense and then if it does further ask whether UDT makes sense.

I think UDT makes sense in "dualistic" decision problems that are already factorized as "this policy leads to these consequences". Extending it to a nondualist case brings up difficulties, including the free will / determinism issue. Policy-dependent source code is a way of interpreting UDT in a setting with deterministic, knowable physics.

Two Alternatives to Logical Counterfactuals

I think it's worth examining more closely what it means to be "not a pure optimizer". Formally, a VNM utility function is a rationalization of a coherent policy. Say that you have some idea about what your utility function is, U. Suppose you then decide to follow a policy that does not maximize U. Logically, it follows that U is not really your utility function; either your policy doesn't coherently maximize any utility function, or it maximizes some other utility function. (Because the utility function is, by definition, a rationalization of the policy)

Failing to disambiguate these two notions of "the agent's utility function" is a map-territory error.

Decision theories require, as input, a utility function to maximize, and output a policy. If a decision theory is adopted by an agent who is using it to determine their policy (rather than already knowing their policy), then they are operating on some preliminary idea about what their utility function is. Their "actual" utility function is dependent on their policy; it need not match up with their idea.

So, it is very much possible for an agent who is operating on an idea U of their utility function, to evaluate counterfactuals in which their true behavioral utility function is not U. Indeed, this is implied by the fact that utility functions are rationalizations of policies.

Let's look at the "turn left/right" example. The agent is operating on a utility function idea U, which is higher the more the agent turns left. When they evaluate the policy of turning "right" on the 10th time, they must conclude that, in this hypothetical, either (a) "right" maximizes U, (b) they are maximizing some utility function other than U, or (c) they aren't a maximizer at all.

The logical counterfactual framework says the answer is (a): that the fixed computation of U-maximization results in turning right, not left. But, this is actually the weirdest of the three worlds. It is hard to imagine ways that "right" maximizes U, whereas it is easy to imagine that the agent is maximizing a utility function other than U, or is not a maximizer.

Yes, the (b) and (c) worlds may be weird in a problematic way. However, it is hard to imagine these being nearly as weird as (a).

One way they could be weird is that an agent having a complex utility function is likely to have been produced by a different process than an agent with a simple utility function. So the more weird exceptional decisions you make, the greater the evidence is that you were produced by the sort of process that produces complex utility functions.

This is pretty similar to the smoking lesion problem, then. I expect that policy-dependent source code will have a lot in common with EDT, as they both consider "what sort of agent I am" to be a consequence of one's policy. (However, as you've pointed out, there are important complications with the framing of the smoking lesion problem)

I think further disambiguation on this could benefit from re-analyzing the smoking lesion problem (or a similar problem), but I'm not sure if I have the right set of concepts for this yet.

Referencing the Unreferencable

If you fix a notion of referenceability rather that equivocating, then the point that talking of unreferenceable entities is absurd will stand.

If you equivocate, then very little can be said in general about referenceability.

(I would say that "our universe's simulators" is referenceable, since it's positing something that causes sensory inputs)

Two Alternatives to Logical Counterfactuals

It seems the approaches we're using are similar, in that they both are starting from observation/action history with posited falsifiable laws, with the agent's source code not known a priori, and the agent considering different policies.

Learning "my source code is A" is quite similar to learning "Omega predicts my action is equal to A()", so these would lead to similar results.

Policy-dependent source code, then, corresponds to Omega making different predictions depending on the agent's intended policy, such that when comparing policies, the agent has to imagine Omega predicting differently (as it would imagine learning different source code under policy-dependent source code).

Two Alternatives to Logical Counterfactuals

I agree this is a problem, but isn't this a problem for logical counterfactual approaches as well? Isn't it also weird for a known fixed optimizer source code to produce a different result on this decision where it's obvious that 'left' is the best decision?

If you assume that the agent chose 'right', it's more reasonable to think it's because it's not a pure optimizer than that a pure optimizer would have chosen 'right', in my view.

If you form the intent to, as a policy, go 'right' on the 100th turn, you should anticipate learning that your source code is not the code of a pure optimizer.

Two Alternatives to Logical Counterfactuals

This indeed makes sense when "obs" is itself a logical fact. If obs is a sensory input, though, 'A(obs) = act' is a logical fact, not a logical counterfactual. (I'm not trying to avoid causal interpretations of source code interpreters here, just logical counterfactuals)

Two Alternatives to Logical Counterfactuals

In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.

Conditioning on 'A(obs) = act' is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between "If Oswald didn't kill Kennedy, then someone else did" and "If Oswald didn't kill Kennedy, then someone else would have".

Indeed, troll bridge will present a problem for "playing chicken" approaches, which are probably necessary in counterfactual nonrealism.

For policy-dependent source code, I intend for the agent to be logically updateful, while updateless about observations.

Why is this much better than counterfactuals which keep the source code fixed but imagine the execution trace being different?

Because it doesn't lead to logical incoherence, so reasoning about counterfactuals doesn't have to be limited.

This seems to only push the rough spots further back—there can still be contradictions, e.g. between the source code and the process by which programmers wrote the source code.

If you see your source code is B instead of A, you should anticipate learning that the programmers programmed B instead of A, which means something was different in the process. So the counterfactual has implications backwards in physical time.

At some point it will ground out in: different indexical facts, different laws of physics, different initial conditions, different random events...

This theory isn't worked out yet but it doesn't yet seem that it will run into logical incoherence, the way logical counterfactuals do.

But then we are faced with the usual questions about spurious counterfactuals, chicken rule, exploration, and Troll Bridge.

Maybe some of these.

Spurious counterfactuals require getting a proof of "I will take action X". The proof proceeds by showing "source code A outputs action X". But an agent who accepts policy-dependent source code will believe they have source code other than A if they don't take action X. So the spurious proof doesn't prevent the counterfactual from being evaluated.

Chicken rule is hence unnecessary.

Exploration is a matter of whether the world model is any good; the world model may, for example, map a policy to a distribution of expected observations. (That is, the world model already has policy counterfactuals as part of it; theories such as physics provide constraints on the world model rather than fully determining it). Learning a good world model is of course a problem in any approach.

Whether troll bridge is a problem depends on how the source code counterfactual is evaluated. Indeed, many ways of running this counterfactual (e.g. inserting special cases into the source code) are "stupid" and could be punished in a troll bridge problem.

I by no means think "policy-dependent source code" is presently a well worked-out theory; the advantage relative to logical counterfactuals is that in the latter case, there is a strong theoretical obstacle to ever having a well worked-out theory, namely logical incoherence of the counterfactuals. Hence, coming up with a theory of policy-dependent source code seems more likely to succeed than coming up with a theory of logical counterfactuals.

The absurdity of un-referenceable entities

It seems fine to have categories that are necessarily empty. Such as "numbers that are both odd and even". "Non-ontologizable thing" may be such a set. Or it may be more vague than that, I'm not sure.

Two Alternatives to Logical Counterfactuals

I'm not using "free will" to mean something distinct from "the ability of an agent, from its perspective, to choose one of multiple possible actions". Maybe this usage is nonstandard but find/replace yields the right meaning.

Load More