# All of jessicata's Comments + Replies

Thoughts on Voting Methods

I agree that moving to distributions and scalar utility is a good way of avoiding Pareto suboptimal outcomes.

Thoughts on Voting Methods

FPTP would be if there weren't more points awarded for winning by more votes.

Here is an example of an election.

3 prefer A > B > C

4 prefer B > C > A

5 prefer C > A > B

(note, this is a Condorcet cycle)

Now we construct the following payoff matrix for a zero sum game, where the number given is for the utility of the row player:

\ A B C

A 0 4 -6

B -4 0 2

C 6 -2 0

This is basically rock paper scissors, except that the A strategy wins twice as much when it wins as the C strategy does, and the B strategy wins 3 times as much as the C strategy does.

2abramdemski5moAh, I see. What I was missing from this description: was understanding that we construct a two player game, not a game where the players are the voters (or even a game where the candidates are the players).
Thoughts on Voting Methods

I don't know what you are saying is the case.

2abramdemski5moI believe Adam Smith is saying that your description of "the rule" sounds like something which is true of any voting method. I don't agree, but I must admit that I myself don't understand what you're describing in that paragraph. How is the game supposed to work? Is there a more direct explanation of the distribution, rather than just a characterization as a Nash equilibria?
Thoughts on Voting Methods

Curious what you think of Consistent Probabilistic Social Choice.

There is a unique consistent voting system in cases where the system may return a stochastic distribution of candidates!

(where consistent means: grouping together populations that agree doesn't change the result, and neither does duplicating candidates)

What is the rule? Take a symmetric zero-sum game where each player picks a candidate, and someone wins if their candidate is preferred by the majority to the other, winning more points if they are preferred by a larger majority. This... (read more)

2abramdemski5moOK, I basically don't like the voting system. Scott pointed out to me that the condorcet criterion makes more sense if we include stochastic outcomes. In the cases where the Condorcet winner is the utilitarian-worst candidate, a mixture of other candidates will win over the Condorcet winner. (So that candidate won't really be the Condorcet winner, if we include stochastic outcomes as "candidates".) But that's not what's going on here, because this technique always selects a Condorcet winner, if there is one. So (apparently) it's not including stochastic outcomes in the right way. We can do better by modifying the game: We specify a symmetric two-player zero-sum game where each player selects a distribution over candidates. You score points based on how many more votes your proposed distribution would get against the other player's. The game's Nash equilibrium (a distribution over distribution over candidates) is the output distribution. However, I'm a bit suspicious of this, since I didn't especially like the basic proposal and this is the same thing one level up. Since this is the unique voting system under some consistency conditions, I must not agree with some of the consistency conditions, although I'm not sure which ones I disagree with.
2abramdemski6moSounds awesome!! I'll need to evaluate it to get a better idea of what's going on. I'm not necessarily expecting it to be utilitarian-good or good in a bargaining sense, but still, it sounds really interesting.
The Bayesian Tyrant

The basic point here is that Bayesians lose zero sum games in the long term. Which is to be expected, because Bayesianism is a non adversarial epistemology. (Adversarial Bayesianism is simply game theory)

This sentence is surprising, though: "It is a truth more fundamental than Bayes’ Law that money will flow from the unclever to the clever".

Clearly, what wins zero sum games wins zero sum games, but what wins zero sum games need not correspond to collective epistemology.

As a foundation for epistemology, many things are superior to "might makes right", including Bayes' rule (despite its limitations).

Legislating Bayesianism in an adversarial context is futile; mechanism design is what is needed.

6abramdemski9moDid you read radical probabilism [https://www.lesswrong.com/posts/xJyY5QkQvNJpZLJRo/radical-probabilism-1] yet? I think legislating Bayesianism is just actually a bad idea (even if we could get around the excessive-paperwork-to-show-all-your-evidence problem). I don't think prediction markets are a perfect mechanism, but I think the non-Bayesianism highlighted here is a feature, not a bug. But all my actual arguments for that are in my Radical Probabilism post. I'm curious what alternative mechanism design you might propose.
Many-worlds versus discrete knowledge

Thanks! To the extent that discrete branches can be identified this way, that solves the problem. This is pushing the limits of my knowledge of QM at this point so I'll tag this as something to research further at a later point.

3interstice9moYou might be interested in the work of Jess Riedel, whose research agenda is centered around finding a formal definition of wavefunction branches, e.g. https://arxiv.org/abs/1608.05377 [https://arxiv.org/abs/1608.05377]
Many-worlds versus discrete knowledge

I'm not asking for there to be a function to the entire world state, just a function to observations. Otherwise the theory does not explain observations!

(aside: I think Bohm does say there is a definite answer in the cat case, as there is a definite configuration that is the true one; it's Copenhagen that fails to say it is one way or the other)

Many-worlds versus discrete knowledge

Then you need a theory of how the continuous microstate determines the discrete macrostate. E.g. as a function from reals to booleans. What is that theory in the case of the wave function determining photon measurements?

2Charlie Steiner9moIf the microphysical theory is like quantum mechanics (Bohm-ish mechanics very much included), this is basically Schrödinger's cat argument. It would be absurd if there was not some function from the microphysical state of the world to the truth of the macrophysical fact of whether the cat in the box is alive or dead. Therefore, there is some such function, and if quantum mechanics doesn't support it then quantum mechanics is incomplete. Schrödinger was wrong about the cat thing, as far as we can tell. His knowledge of discrete macrophysical states of cats had an explanation, but didn't directly reflect reality. There are absurd quantum states that don't allow for a function from the microphysical state of the world to whether I observe a photon as having spin left or spin right. If I believe otherwise, my beliefs deserve an explanation, but that doesn't mean they directly reflect reality.
Many-worlds versus discrete knowledge

I'm saying that our microphysical theories should explain our macrophysical observations. If they don't then we toss out the theory (Occam's razor).

Macrophysical observations are discrete.

2Charlie Steiner9moI model macrophysical observations as discrete too. But I also model tables and chairs as discrete, without needing to impose any requirements that they not be made of non-discrete stuff. A microphysical explanation of discrete observations doesn't need to be made up of discrete parts.
Many-worlds versus discrete knowledge

Let me know if anyone succeeds at that. I've thought in this direction and found it very difficult.

Many-worlds versus discrete knowledge

Consistent histories may actually solve the problem I'm talking about, because it discusses evolving configurations, not just an evolving wave function.

Many-worlds versus discrete knowledge

The wave function is a fluid in configuration space that evolves over time. You need more theory than that to talk about discrete branches of it (configurations) evolving over time.

I agree that once you have this, you can say the knowledge gained is indexical.

I think it's something like: Sometimes you find that the wavefunction  is the sum of a discrete number of components  , with the property that for any relevant observable A for . (Here, "" also includes things like "has a value that varies quasi-randomly and super-rapidly as a function of time and space, such that it averages to 0 for all intents and purposes", and "relevant observable" likewise means "observable that might come up in practice, as opposed to artificial observables with ... (read more)

Many-worlds versus discrete knowledge

It's rather nonstandard to consider things like photon measurements to be nonphysical facts. Presumably, these come within the domain of physical theories.

Suppose we go with Solomonoff induction. Then we only adopt physical theories that explain observations happening over subjective time. These observations include discrete physical measurements.

It's not hard to see how Bohm explains these measurements: they are facts about the true configuration history.

It is hard to see how many worlds explains these measurements. Some sort of bridge law is required. The straightworward way of specifying the bridge law is the Bohm interpretation.

Many-worlds versus discrete knowledge

Yes the argument has to be changed but that's mostly an issue of wording. Just replace discrete knowledge with discrete factual evidence.

If a Bayesian sees that the detector has detected a photon, how is that evidence about the wave function?

Many-worlds versus discrete knowledge

Many worlds plus a location tag is the Bohm interpretation. You need theory for how locations evolve into other locations (in order to talk about multiple events happening in observed time), hence the nontriviality of the Bohm interpretation.

Many worlds plus a location tag is the Bohm interpretation.

Really? I don't think I agree with that. In many-worlds, you can say "The photon passed through the apparatus in the branch of the wavefunction I find myself in", and you can also say "The photon did not pass through the apparatus in other branches of the wavefunction that I do not find myself in". The Bohm interpretation would reject the latter.

And if the measurement just happened on Earth, but you're 4 lightyears away near Alpha Centauri, space-like-separated from the measurement, you can say "Th... (read more)

Many-worlds versus discrete knowledge

I believe there are physical theories and physical facts, but that not all facts are straightforwardly physical (although, perhaps these are indirectly physical in a way that requires significant philosophical and conceptual work to determine, and which has degrees of freedom).

The issue in this post is about physical facts, e.g. measurements, needing to be interpreted in terms of a physical reality. These interpretations are required to have explanatory physical theories even if there are also non-physical facts.

3abramdemski9moHmmm. So facts aren't exclusively physical in nature, but physical theories need to do all their explanatory work on their own, without reference to any of the nonphysical facts? I'm still pretty confused. The post makes a lot more sense to me if I read it as yet another puzzle for physicalism, rather than something directly related to your actual ontology. Naively (ie in my naive understanding) it seems like an agent-centric perspective (ie the opposite of a view from nowhere) is more or less like Solomonoff induction (so e.g. solves anthropic reasoning via UDASSA). The world is built outward from the agent, rather than the other way around, but we still get something like indexical facts. So many-worlds seems ok.
Many-worlds versus discrete knowledge

Bayesianism still believes in events, which are facts about the world. So the same problem comes up there, even if no fact can be known with certainty.

(in other words: the same problems that apply to 100% justification of belief apply to 99% justification of belief)

3abramdemski9moIf so, it seems to require a different argument to point to the problem in that case, since your argument in the post relied on "discrete knowledge". I don't currently see what stops a radical probabilist from interpreting evidence as unreliable information about the wave function. (I do have an intuition that this'll be problematic; I'm just saying that I don't currently see the argument, and I think it's different from the argument in the post.)
3abramdemski9moI'm a little confused: you reject physicalism, and yet you seem to be speaking from a physicalist ontology here, requiring there to be a physical fact (a true configuration) or no fact at all (no indexical information).
Why artificial optimism?

I don't have a great theory here, but some pointers at non-hedonic values are:

• "Wanting" as a separate thing from "liking"; what is planned/steered towards, versus what affective states are generated? See this. In a literal sense, people don't very much want to be happy.
• It's common to speak in terms of "mental functions", e.g. perception and planning. The mind has a sort of "telos"/direction, which is not primarily towards maximizing happiness (if it were, we'd be happier); rather, the happiness signal has a function as part of the mind's functioning.
Why artificial optimism?

Experiences of sentient beings are valuable, but have to be "about" something to properly be experiences, rather than, say, imagination.

I would rather that conditions in the universe are good for the lifeforms, and that the lifeforms' emotions track the situation, such that the lifeforms are happy. But if the universe is bad, then it's better (IMO) for the lifeforms to be sad about that.

The issue with evolution is that it's a puzzle that evolution would create animals that try to wirehead themselves, it's not a moral argument against wireheading.

2ESRogs1yHow do you measure this? What does it mean that conditions in the universe are good for the lifeforms other than that it gives them good experiences? You're wanting to ground positive emotions in objectively good states. But I'm wanting to ground the goodness of states in the positive emotions they produce. Perhaps there's some reflexivity here, where we both evaluate positive emotions based on how well they track reality, and we also evaluate reality on how much it produces positive emotions. But we need some way for it to bottom out. For me, I would think positive emotions are more fundamentally good than universe states, so that seems like a safer place to ground the recursion. But I'm curious if you've got another view.
Why artificial optimism?

"Isn't the score I get in the game I'm playing one of the most important part of the 'actual state of affairs'? How would you measure the value of the actual state of affairs other than according to how it affects your (or others') scores?"

I'm not sure if this analogy is, by itself, convincing. But, it's suggestive, in that happiness is a simple, scalar-like thing, and it would be strange for such a simple thing to have a high degree of intrinsic value. Rather, on a broad perspective, it would seem that those things of most intrinsic value are those thi

2ESRogs1yI get the analogy. And I guess I'd agree that I value more complex positive emotions that are intertwined with the world more than sort of one note ones. (E.g. being on molly felt nice but kind of empty.) But I don't think there's much intrinsic value in the world other than the experiences of sentient beings. A cold and lifeless universe seems not that valuable. And if the universe has life I want those beings to be happy, all else equal. What do you want? And regarding the evolutionary perspective, what do I care what's fit or not? My utility function is not inclusive genetic fitness.
Jimrandomh's Shortform

It's been over 72 hours and the case count is under 110, as would be expected from linear extrapolation.

Estimating COVID-19 Mortality Rates

The intro paragraph seems to be talking about IFR ("around 2% of people who got COVID-19 would die") and suggesting that "we have enough data to check", i.e. that you're estimating IFR and have good data on it.

4Benquo1yGood point, I should add a clarifying note.
The Presumptuous Philosopher, self-locating information, and Solomonoff induction

I mean efficiently in terms of number of bits, not computation time. Which contributes to posterior probability.

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

Yes, I agree. "Reference class" is a property of some models, not all models.

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

At this point it seems simplest to construct your reference class so as to only contain agents that can be found using the same procedure as yourself. Since you have to be decidable for the hypothesis to predict your observations, all others in your reference class are also decidable.

4johnswentworth1yProblem is, there isn't necessarily a modular procedure used to identify yourself. It may just be some sort of hard-coded index. A Solomonoff inductor will reason over all possible such indices by reasoning over all programs, and throw out any which turn out to not be consistent with the data. But that behavior is packaged with the inductor, which is not itself a program.
The Presumptuous Philosopher, self-locating information, and Solomonoff induction

If there's a constant-length function mapping the universe description to the number of agents in that universe, doesn't that mean K(n) can't be more than the Kolmogorov complexity of the universe by more than that constant length?

If it isn't constant-length, then it seems strange to assume Solomonoff induction would posit a large objective universe, given that such positing wouldn't help it predict its inputs efficiently (since such prediction requires locating agents).

This still leads to the behavior I'm talking about in the limit; the sum of 1/2^K(n) over all n can be at most 1 so the probabilities on any particular n have to go arbitrarily small in the limit.

2TurnTrout1ybut a solomonoff ind doesn’t rank hypotheses on whether they allow efficient predictions of some feature of interest, it ranks them based on posterior probabilities (prior probability + to what extent the hypothesis accurately predicted observations so far).
2johnswentworth1yI'm about 80% on board with that argument. The main loophole I see is that number-of-embedded-agents may not be decidable. That would make a lot of sense, since embedded-agent-detectors are exactly the sort of thing which would help circumvent diagonalization barriers. That does run into the second part of your argument, but notice that there's no reason we need to detect all the agents using a single program in order for the main problem setup to work. They can be addressed one-by-one, by ad-hoc programs, each encoding one of the hypotheses (world model, agent location). (Personally, though, I don't expect number-of-embedded-agents to be undecidable, at least for environments with some kind of private random bit sources.)
The Presumptuous Philosopher, self-locating information, and Solomonoff induction

My understanding is that Solomonoff induction leads to more SSA-like behavior than SIA-like, at least in the limit, so will reject the presumptuous philosopher's argument.

Asserting that there are n people takes at least K(n) bits, so large universe sizes have to get less likely at some point.

2Charlie Steiner1yI am usually opposed on principle to calling something "SSA" as a description of limiting behavior rather than inside-view reasoning, but I know what you mean and yes I agree :P I am still surprised that everyone is just taking Solomonoff induction at face value here and not arguing for anthropics. I might need to write a follow-up post to defend the Presumptuous Philosopher, because I think there's a real case the Solomonoff induction actually is missing something. I bet I can make it do perverse things in decision problems that involve being copied.
6johnswentworth1yThe problem setup doesn't necessarily require asserting the existence of n people. It just requires setting up a universe in which n people happen to exist. That could take considerably less than K(n) bits, if person-detection is itself fairly expensive. We could even index directly to the Solomonoff inductor's input data without attempting to recognize any agents; that would circumvent the K(number of people) issue.
Nihilism doesn't matter

Active nihilism described in the paragraph definitely includes, but is not limited to, the negation of values. The active nihilists of a moral parliament may paralyze the parliament as a means to an end; perhaps, to cause systems other than the moral parliament to be the primary determinants of action, rather than the moral parliament.

3Bob Jacobs1ySo the very next sentence on the wikipedia page is: And I agree that this is no longer nihilism but rather existentialism. EDIT: The reason this matters is because you need entirely different arguments against existentialism. As soon as there is an actual end, it seizes to be nihilism. It's not that I don't want to debate existentialism, but rather there has already been so much criticism written about it that I'm not up to date on. If you are interested in arguments against active indecisiveness I suggest you start there.
Nihilism doesn't matter

What you are describing is a passive sort of nihilism. Active nihilism, on the other hand, would actively try to negate the other values. Imagine a parliament where whenever a non-nihilist votes in favor of X, a nihilist votes against X, such that these votes exactly cancel out. Now, if (active) nihilists are a majority, they will ensure that the parliament as a whole has no aggregate preferences.

3Bob Jacobs1yYour description of active nihilism seems off. As I understand it active nihilism is the philosophy that after discovering there is no inherent meaning you choose to create your own meaning. Active nihilism is kinda like existentialism. In fact the wikipedia page you link to says the same thing: The philosophy you're describing is not nihilism but a different philosophy that preaches active indecisiveness. I personally never heard of it.
Modeling naturalized decision problems in linear logic

CDT and EDT have known problems on 5 and 10. TDT/UDT are insufficiently formalized, and seem like they might rely on known-to-be-unfomalizable logical counterfactuals.

So 5 and 10 isn't trivial even without spurious counterfactuals.

What does this add over modal UDT?

• No requirement to do infinite proof search
• More elegant handling of multi-step decision problems
• Also works on problems where the agent doesn't know its source code (of course, this prevents logical dependencies due to source code from being taken into account)

Philosophically, it works as a

Consistent Glomarization should be feasible

Why lie on the d100 coming up 1 instead of "can neither confirm nor deny"?

2Stuart_Armstrong1yBecause the proportion of time where we might have done something we wish to hide is low, while the proportion of time you might counterfactually have done something to hide is high. So by asking you every day, a questioner can figure out that 99% of the time, you didn't actually do anything to hide.

Note: the provided utility function is incredibly insecure; even a not-very-powerful individual can manipulate the AI by writing down that hash code under certain conditions.

Also, the best way to minimize V + W is to minimize both V and W (i.e. write the hash code and create hell). If we replace this with min(V, W) then the AI becomes nihilistic if someone writes down the hash code, also a significant security vulnerability.

Topological metaphysics: relating point-set topology and locale theory

Reals are still defined as sets of (a, b) rational intervals. The locale contains countable unions of these, but all these are determined by which (a, b) intervals contain the real number.

Topological metaphysics: relating point-set topology and locale theory

Good point; I've changed the wording to make it clear that the rational-delimited open intervals are the basis, not all the locale elements. Luckily, points can be defined as sets of basis elements containing them, since all other properties follow. (Making the locale itself countable requires weakening the definition by making the sets to form unions over countable, e.g. by requiring them to be recursively enumerable)

4Adele Lopez1yAnother way to make it countable would be to instead go to the category of posets, Then the rational interval basis is a poset with a countable number of elements, and by the Alexandroff construction [https://ncatlab.org/nlab/show/specialization+topology] corresponds to the real line (or at least something very similar). But, this construction gives a full and faithful embedding of the category of posets to the category of spaces (which basically means you get all and only continuous maps from monotonic function). I guess the ontology version in this case would be the category of prosets. (Personally, I'm not sure that ontology of the universe isn't a type error).
3cousin_it1yI see. In that case does the procedure for defining points stay the same, or do you need to use recursively enumerable sets of opens, giving you only countably many reals?
Motivating Abstraction-First Decision Theory

I've also been thinking about the application of agency abstractions to decision theory, from a somewhat different angle.

It seems like what you're doing is considering relations between high-level third-person abstractions and low-level third-person abstractions. In contrast, I'm primarily considering relations between high-level first-person abstractions and low-level first-person abstractions.

The VNM abstraction itself assumes that "you" are deciding between different options, each of which has different (stochastic) consequences; thus, it is inherently

8johnswentworth1yThis comment made a bunch of your other writing click for me. I think I see what you're aiming for now; it's a beautiful vision. In retrospect, this is largely what I've been trying to get rid of, in particular by looking for a third-person interpretation of probability [https://www.lesswrong.com/posts/Lz2nCYnBeaZyS68Xb/probability-as-minimal-map]. Obviously frequentism satisfies that criterion, but the strict form is too narrow for most applications and the less-strict form (i.e. "imagine we repeated this one-shot experiment many times...") isn't actually third-person. I've also started thinking about a third-person grounding of utility maximization and the like via selection processes; that's likely to be a whole months-long project in itself in the not-too-distant future.
Subjective implication decision theory in critical agentialism

Looking back on this, it does seem quite similar to EDT. I'm actually, at this point, not clear on how EDT and TDT differ, except in that EDT has potential problems in cases where it's sure about its own action. I'll change the text so it notes the similarity to EDT.

On XOR blackmail, SIDT will indeed pay up.

Two Alternatives to Logical Counterfactuals

Yes, it's about no backwards assumption. Linear has lots of meanings, I'm not concerned about this getting confused with linear algebra, but you can suggest a better term if you have one.

Seemingly Popular Covid-19 Model is Obvious Nonsense

Epistemic Status: Something Is Wrong On The Internet.

If you think this applies, it would seem that "The Internet" is being construed so broadly that it includes the mainstream media, policymaking, and a substantial fraction of people, such that the "Something Is Wrong On The Internet" heuristic points against correction of public disinformation in general.

This is a post that is especially informative, aligned with justice, and likely to save lives, and so it would be a shame if this heuristic were to dissuade you from writing it.

In Defense of Politics

The presumption with conspiracies is that they are engaged in for some local benefit by the conspiracy at the detriment of the broader society. Hence, the "unilateralist's curse" is a blessing in this case, as the overestimation by one member of a conspiracy of their own utility in having the secret exposed, brings their estimation more in line with the estimation of the broader society, whose interests differ from those of the conspirators.

If differences between the interests of different groups were not a problem, then there would be no motive to form a

Solipsism is Underrated

A major problem with physicalist dismissal of experiential evidence (as I've discussed previously) is that the conventional case for believing in physics is that it explains experiential evidence, e.g. experimental results. Solomonoff induction, among the best formalizations of Occam's razor, believes in "my observations".

If basic facts like "I have observations" are being doubted, then any case for belief in physics has to go through something independent of its explanations of experiential evidence. This looks to be a difficult problem.

You could potent

1TAG1yThere are significant differences between observations in the sense of pointer positions, and qualia. That's much more like the easy problem.
Two Alternatives to Logical Counterfactuals

Basically, the assumption that you're participating in a POMDP. The idea is that there's some hidden state that your actions interact with in a temporally linear fashion (i.e. action 1 affects state 2), such that your late actions can't affect early states/observations.

1capybaralet1yOK, so no "backwards causation" ? (not sure if that's a technical term and/or if I'm using it right...) Is there a word we could use instead of "linear", which to an ML person sounds like "as in linear algebra"?
Two Alternatives to Logical Counterfactuals

The way you are using it doesn’t necessarily imply real control, it may be imaginary control.

I'm discussing a hypothetical agent who believes itself to have control. So its beliefs include "I have free will". Its belief isn't "I believe that I have free will".

It’s a “para-consistent material conditional” by which I mean the algorithm is limited in such a way as to prevent this explosion.

Yes, that makes sense.

However, were you flowing this all the way back in time?

Yes (see thread with Abram Demski).

What do you mean by dualistic?

2Chris_Leong1yHmm, yeah this could be a viable theory. Anyway to summarise the argument I make in Is Backwards Causation Necessarily Absurd? [https://www.lesswrong.com/posts/pa7mvEmEgt336gBSf/is-backwards-causation-necessarily-absurd] , I point out that since physics is pretty much reversible, instead of A causing B, it seems as though we could also imagine B causing A and time going backwards. In this view, it would be reasonable to say that one-boxing (backwards-)caused the box to be full in Newcombs. I only sketched the theory because I don't have enough physics knowledge to evaluate it. But the point is that we can give justification for a non-standard model of causality.
Two Alternatives to Logical Counterfactuals

Secondly, “free will” is such a loaded word that using it in a non-standard fashion simply obscures and confuses the discussion.

Wikipedia says "Free will is the ability to choose between different possible courses of action unimpeded." SEP says "The term “free will” has emerged over the past two millennia as the canonical designator for a significant kind of control over one’s actions." So my usage seems pretty standard.

For example, recently I’ve been arguing in favour of what counts as a valid counterfactual being at least partially a matter of soc

2Chris_Leong1yNot quite. The way you are using it doesn't necessarily imply real control, it may be imaginary control. True. Maybe I should clarify what I'm suggesting. My current theory is that there are multiple reasonable definitions of counterfactual and it comes down to social norms as to what we accept as a valid counterfactual. However, it is still very much a work in progress, so I wouldn't be able to provide more than vague details. I guess my point was that this notion of counterfactual isn't strictly a material conditional due to the principle of explosion [https://www.wikiwand.com/en/Principle_of_explosion]. It's a "para-consistent material conditional" by which I mean the algorithm is limited in such a way as to prevent this explosion. Hmm... good point. However, were you flowing this all the way back in time? Such as if you change someone's source code, you'd also have to change the person who programmed them. What do you mean by dualistic?
Two Alternatives to Logical Counterfactuals

I think it's worth examining more closely what it means to be "not a pure optimizer". Formally, a VNM utility function is a rationalization of a coherent policy. Say that you have some idea about what your utility function is, U. Suppose you then decide to follow a policy that does not maximize U. Logically, it follows that U is not really your utility function; either your policy doesn't coherently maximize any utility function, or it maximizes some other utility function. (Because the utility function is, by definition, a rationalization of the poli

4abramdemski1yOK, all of that made sense to me. I find the direction more plausible than when I first read your post, although it still seems like it'll fall to the problem I sketched. I both like and hate that it treats logical uncertainty in a radically different way from empirical uncertainty -- like, because we have so far failed to find any way to treat the two uniformly (besides being entirely updateful that is); and hate, because it still feels so wrong for the two to be very different.
Referencing the Unreferencable

If you fix a notion of referenceability rather that equivocating, then the point that talking of unreferenceable entities is absurd will stand.

If you equivocate, then very little can be said in general about referenceability.

(I would say that "our universe's simulators" is referenceable, since it's positing something that causes sensory inputs)

5TAG1yEquivicaction is using a term in different senses *during the course of an argument"..that is under conditions where it should normatively have a stable meaning. It is still the case that some words are ambiguous, and that recognising ambiguity can solve problems.
2Chris_Leong1yWe will still be able to talk about unreferencable elements, but only by "referencing" them in a different sense than what we mean by "unreferencable". The key is that it might seem like we are only using the word in one sense until we really break down the definition, at which point it becomes clear we are using it in different senses. And it's not equivocating because we are only using the word referencable in a particular way. When we "reference" unreferencable elements we don't call it "referencing" even though we are in the casual sense.
Two Alternatives to Logical Counterfactuals

It seems the approaches we're using are similar, in that they both are starting from observation/action history with posited falsifiable laws, with the agent's source code not known a priori, and the agent considering different policies.

Learning "my source code is A" is quite similar to learning "Omega predicts my action is equal to A()", so these would lead to similar results.

Policy-dependent source code, then, corresponds to Omega making different predictions depending on the agent's intended policy, such that when comparing policies, the agent has to imagine Omega predicting differently (as it would imagine learning different source code under policy-dependent source code).

2Vanessa Kosoy1yWell, in quasi-Bayesianism for each policy you have to consider the worst-case environment in your belief set, which depends on the policy. I guess that in this sense it is analogous.
Two Alternatives to Logical Counterfactuals

I agree this is a problem, but isn't this a problem for logical counterfactual approaches as well? Isn't it also weird for a known fixed optimizer source code to produce a different result on this decision where it's obvious that 'left' is the best decision?

If you assume that the agent chose 'right', it's more reasonable to think it's because it's not a pure optimizer than that a pure optimizer would have chosen 'right', in my view.

If you form the intent to, as a policy, go 'right' on the 100th turn, you should anticipate learning that your source code is not the code of a pure optimizer.

5abramdemski1yI'm left with the feeling that you don't see the problem I'm pointing at. My concern is that the most plausible world where you aren't a pure optimizer might look very very different, and whether this very very different world looks better or worse than the normal-looking world does not seem very relevant to the current decision. Consider the "special exception selves" you mention -- the Nth exception-self has a hard-coded exception "go right if it's beet at least N turns and you've gone right at most 1/N of the time". Now let's suppose that the worlds which give rise to exception-selves are a bit wild. That is to say, the rewards in those worlds have pretty high variance. So a significant fraction of them have quite high reward -- let's just say 10% of them have value much higher than is achievable in the real world. So we expect that by around N=10, there will be an exception-self living in a world that looks really good. This suggests to me that the policy-dependent-source agent cannot learn to go left > 90% of the time, because once it crosses that threshhold, the exception-self in the really good looking world is ready to trigger its exception -- so going right starts to appear really good. The agent goes right until it is under the threshhold again. If that's true, then it seems to me rather bad: the agent ends up repeatedly going right in a situation where it should be able to learn to go left easily. Its reason for repeatedly going right? There is one enticing world, which looks much like the real world, except that in that world the agent definitely goes right. Because that agent is a lucky agent who gets a lot of utility, the actual agent has decided to copy its behavior exactly -- anything else would prove the real agent unlucky, which would be sad. Of course, this outcome is far from obvious; I'm playing fast and loose with how this sort of agent might reason.
Two Alternatives to Logical Counterfactuals

This indeed makes sense when "obs" is itself a logical fact. If obs is a sensory input, though, 'A(obs) = act' is a logical fact, not a logical counterfactual. (I'm not trying to avoid causal interpretations of source code interpreters here, just logical counterfactuals)

2abramdemski1yAhhh ok.
Two Alternatives to Logical Counterfactuals

In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.

Conditioning on 'A(obs) = act' is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between "If Oswald didn't kill Kennedy, then someone else did" and "If Oswald didn't kill Kennedy, then someone else would have".

Indeed, troll bridge will present a problem for "playing chicken" approaches, whic