Sorted by New

Let Values Drift

I don't see the usual commonsense understanding of "values" (or the understanding used in economics or ethics) as relying on values being ontologically fundamental in any way, though. But you've the fact that they're not to make a seemingly unjustified rhetorical leap to "values are just habituations or patterns of action", which just doesn't seem to be true.

Most importantly, because the "values" that people are concerned with then they talk about "value drift" are *idealized* values (ala. extrapolated volition), not instantaneous values or opinions or habituations.

For instance, philosophers such as EY consider that changing one's mind in response to a new moral argument is *not* value drift because it preserves one's idealized values, and that it is generally instrumentally positive because (if it brings one's instantaneous opinions closer to their idealized values) it makes one better at accomplishing their idealized values. So indeed, we should let the EAs "drift" in that sense.

On the other hand, getting hit with a cosmic ray which alters your brain, or getting hacked by a remote code execution exploit *is* value drift because it does not preserve one's idealized values (and is therefore bad, according to the usual decision theoretic argument, because it makes you worse at accomplishing them). And those are the kind of problems we worry about with AI.

Let Values Drift

When we talk of values as nouns, we are talking about the values that people have, express, find, embrace, and so on. For example, a person might say that altruism is one of their values. But what would it mean to “have” altruism as a value or for it to be one of one’s values? What is the thing possessed or of one in this case? Can you grab altruism and hold onto it, or find it in the mind cleanly separated from other thoughts?

Since this appears to be a crux of your whole (fallacious, in my opinion) argument, I'm going to start by just criticizing this point. This argument proves far too much. It proves that:

- People don't have beliefs, memories or skills
- Books don't have concepts
- Objects don't have colors
- Shapes don't have total internal angles

It seems as if you've rhetorically denied the existence of any abstract properties whatsoever, for the purpose of minimizing values as being "merely" habituations or patterns of action. But I don't see why anyone should actually accept that claim.

Interpretations of "probability"

Doesn't it mean the same thing in either case? Either way, I don't know which way the coin will land or has landed, and I have some odds at which I'll be willing to make a bet. I don't see the problem.

(Though my willingness to bet at all will generally go down over time in the "already flipped" case, due to the increasing possibility that whoever is offering the bet somehow looked at the coin in the intervening time.)

Interpretations of "probability"

The idea that "probability" is some preexisting thing that needs to be "interpreted" as something always seemed a little bit backwards to me. Isn't it more straightforward to say:

- Beliefs exist, and obey the Kolmogorov axioms (at least, "correct" beliefs do, as formalized by generalizations of logic (Cox's theorem), or by possible-world-counting). This is what we refer to as "bayesian probabilities", and code into AIs when we want to them to represent beliefs.
- Measures over imaginary event classes / ensembles also obey the Kolmogorov axioms. "Frequentist probabilities" fall into this category.

Personally I mostly think about #1 because I'm interested in figuring out what I should believe, not about frequencies in arbitrary ensembles. But the fact is that both of these obey the same "probability" axioms, the Kolmogorov axioms. Denying one or the other because "probability" must be "interpreted" as *exclusively either* #1 or #2 is simply wrong (but that's what frequentists effectively do when they loudly shout that you "can't" apply probability to beliefs).

Now, sometimes you *do* need to interpret "probability" as something -- in the specific case where someone else makes an utterance containing the word "probability" and you want to figure out what they meant. But the answer there is probably that in many cases people don't even distinguish between #1 and #2, because they'll only commit to a specific number when there's a convenient instance of #2 that make #1 easy to calculate. For instance, saying 1/6 for a roll of a "fair" die.

People often act as though their utterances about probability refer to #1 though. For instance when they misinterpret p-values as the post-data probability of the null hypothesis and go around believing that the effect is real...

Functional Decision Theory vs Causal Decision Theory: Expanding on Newcomb's Problem

No, that doesn't work. It seems to me you've confused yourself by constructing a fake symmetry between these problems. It wouldn't make any sense for Omega to "predict" whether you choose both boxes in Newcomb's if Newcomb's were equivalent to something that doesn't involve choosing boxes.

More explicitly:

Newcomb's Problem is "You sit in front of a pair of boxes, which are either- both filled with money if Omega predicted you would take one box in *this case*, otherwise only one is filled". Note: describing the problem does not require mentioning "Newcomb's Problem"; it can be expressed as a simple game tree (see here for some explanation of the tree format):

.

In comparison, your "Inverse Newcomb" is "Omega gives you some money iff it predicts that you take both boxes in Newcomb's Problem, an entirely different scenario (ie. not this case)."

The latter is more of the form "Omega arbitrarily rewards agents for taking certain hypothetical actions in a different problem" (of which a nearly limitless variety can be invented to justify any chosen decision theory¹), rather than being an actual self-contained problem which can be "solved".

The latter also can't be expressed as any kind of game tree without "cheating" and naming "Newcomb's Problem" verbally --- or rather, you can express a *similar thing* by embedding the Newcomb game tree and referring to the embedded tree, but that converts it into a legitimate decision problem, which FDT of course gives the correct answer to (TODO: draw an example ;).

(¹): Consider Inverse^2 Newcomb, which I consider the proper symmetric inverse of "Inverse Newcomb": Omega puts you in front of two boxes and says "this is *not* Newcomb's Problem, but I have filled both boxes with money iff I predicted that you take one box in standard Newcomb". Obviously here FDT takes both boxes and a tidy $1,000,1000 profit (plus the $1,000,000 from Standard Newcomb). Whereas CDT gets... $1000 (plus $1000 from Standard Newcomb).

The Cacophony Hypothesis:
Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence

Yes, you need to have a theory of physics to write down a transition rule for a physical system. That is a problem, but it's not at all the same problem as the "target format" problem. The only role the transition rule plays here is it allows one to apply induction to efficiently prove some generalization about the system over all time steps.

In principle a different more distinguished concise description of the system's behaviour could play the a similar role (perhaps, the recording of the states of the system + the shortest program that outputs the recording?). Or perhaps there's some way of choosing a distinguished "best" formalization of physics. But that's rather out of scope of what I wanted to suggest here.

But then you are measuring proof shortness relative to that system. And you could be using one of countless other formal systems which always make the same predictions, but relative to which different proofs are short and long.

It would be a O(1) cost to start the proof by translating the axioms into a more convenient format. Much as Kolmogorov complexity is "language dependent" but not asymptotically because any particular universal turing machine can be simulated in any other for a constant cost.

The assumption (including that it takes in and puts out in arabic numerals, and uses “*” as the multuplication command, and that buttons must be pressed,… and all the other things you need to actually use it) includes that.

These are all things that can be derived from a physical description of the calculator (maybe not in fewer steps than it takes to do long multiplication, but certainly in fewer steps than less trivial computations one might do with a calculator). There's no observer dependency here.

The Cacophony Hypothesis:
Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence

That's not an issue in my formalization. The "logical facts" I speak of in the formalized version would be fully specified mathematical statements, such as "if the simulation starts in state X at t=0, the state of the simulation at t=T is Y" or "given that Alice starts in state X, then <some formalized way of categorising states according to favourite ice cream flavour> returns `Vanilla`

". The "target format" is mathematical proofs. Languages (as in English vs Chinese) don't and *can't* come in to it, because proof systems are language-ignorant.

Note, the formalized criterion is broader than the informal "could you do something useful with this simulation IRL" criterion, even though the latter is the 'inspiration' for it. For instance, it doesn't matter whether you understand the programming language the simulation is written in. If someone who did understand the language could write the appropriate proofs, then the proofs exist.

Similarly, if a simulation is run under Homomorphic_encryption, it is nevertheless a valid simulation, despite the fact that you can't read it if you don't have the decryption key. Because a proof exists which starts by "magically" writing down the key, proving that it's the correct decryption key, then proceeding from there.

An informal criterion which maybe captures this better would be: If you and your friend both have (view) access to a genuine computation of some logical facts X, it should be possible to convince your friend of X in fewer words by referring to the alleged computation (but you are permitted unlimited time to think first, so you can reverse engineer the simulation, bruteforce some encryption keys, learn Chinese, whatever you like, before talking). A bit like how it's more efficient to convince your friend that 637265729567*37265974 = 23748328109134853258 by punching the numbers into a calculator and saying "see?" than by handing over a paper with a complete long multiplication derivation (assuming you are familiar with the calculator and can convince your friend that it calculates correctly).

The Cacophony Hypothesis:
Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence

This idea is, as others have commented, pretty much Dust theory.

The solution, in my opinion, is the same as the answer to Dust theory: namely, it is not actually the case that anything is a simulation of anything. Yes, you can *claim* that (for instance) the motion of the atoms in a pebble can be interpreted as a simulation of Alice, in the sense that anything can be mapped to anything... but in a certain more real sense, you can't.

And that sense is this: an *actual* simulation of Alice running on a computer grants you certain powers - you can step through the simulation, examine what Alice does, and determine certain facts such as Alice's favourite ice cream flavour (these are logical facts, given the simulation's initial state). If the simulation is an upload of your friend Alice, then by doing so you learn meaningful new facts about your friend.

In comparison, a pebble "interpreted" as a simulation of Alice affords you no such powers, because the interpretation (mapping from pebble states to simulation data) is entirely post-hoc. The only way to pin down the mapping---such that you could, for instance, explicitly write it down, or take the pebble's state and map it to an answer about Alice's favourite ice cream---is to *already* have carried out the actual simulation, separately, and already know these things about Alice.

In general, "legitimate" computations of certain logical facts (such as the answers one might ask about simulations of people) should, in a certain sense, make it easier to calculate those logical facts then doing so from scratch.

A specific formalization of this idea would be that a proof system equipped with an oracle (axiom schema) describing the states of the physical system which allegedly computed these facts, as well as its transition rule, should be able to find proofs for those logical facts in less steps than one without such axioms.

Such proofs will involve first coming up with a mapping (such as interpreting certain electrical junctions as nand gates), proving them valid using the transition rules, then using induction to jump to "the physical state at timestep t is X therefore Alice's favourite ice cream colour is Y". Note that the requirement that these proofs be short naturally results in these "interpretations" being simple.

As far as I know, this specific formalization of the anti-Dust idea is original to me, though the idea that "interpretations" of things as computations ought to be "simple" is not particularly new.

Highlights from "Integral Spirituality"

We can (and should) have that discussion, we should just have it on a separate post

Can you point to the specific location that discussion "should" happen at?

If what you want is to do the right thing, there's no conflict here.

Conversely, if you don't want to do the right thing, maybe it would be prudent to reconsider doing it...?