Wiki Contributions

Comments

cubefox1d1-3

Is there even anybody claiming there is an experiential difference?

Yep! Ask someone with this view whether the current stream of consciousness continues from their pre-uploaded self to their post-uploaded self, like it continues when they pass through a doorway. The typical claim is some version of "this stream of consciousness will end, what comes next is only oblivion", not "oh sure, the stream of consciousness is going to continue in the same way it always does, but I prefer not to use the English word 'me' to refer to the later parts of that stream of consciousness".

This doesn't show they believe there is a difference in experience. It can be simply a different analysis of the meaning of "the current stream of consciousness continuing". That's a semantic difference, not an empirical one.

cubefox2d-1-2

The thing I'm arguing in the OP is that there can't be an experiential difference here, because there's no physical difference that could be underlying the supposed experiential difference.

Is there even anybody claiming there is an experiential difference? It seems you may attacking a strawman.

So the disagreement about the first-person facts, I claim, stems from a cognitive error

The alternative to this is that there is a disagreement about the appropriate semantic interpretation/analysis of the question. E.g. about what we mean when we say "I will (not) experience such and such". That seems more charitable than hypothesizing beliefs in "ghosts" or "magic".

cubefox2d3-2

The problem was that you first seemed to belittle questions about word meanings ("self") as being "just" about "definitions" that are "purely verbal". Luckily now you concede that the question about the meaning of "I" isn't just about (arbitrary) "definitions", which makes calling it a "purely verbal" (read: arbitrary) question inappropriate. Now of course the meaning of "self" is no more arbitrary than the meaning of "I", indeed those terms are clearly meant to refer to the same thing (like "me" or "myself").

The wider point is that the following seems not true:

But this post hasn’t been talking about word definitions. It’s been talking about substantive predictive questions like “What’s the very next thing I’m going to see? The other side of the teleporter? Or nothing at all?”

Wenn we evaluate statements or questions of any kind, including the one above, we need to know two things: 1) Its meaning, in particular the meaning of the involved terms, 2) what the empirical facts are. But we already know all the empirical facts: Someone goes into the teleporter, a bit later someone comes out at the other end and sees something. So the issue can only be about the semantic interpretation of that question, about what we mean with expressions like "I will see x". Do we mean "A future person that is psychologically continuous with current-me sees x"? That's not an empirical question, it's a semantic one, but it's not in any way arbitrary, as expressions like "just about definitions" or "purely verbal" would suggest. Conceptual analysis is neither arbitrary nor trivial.

There are several different representation theorems, not just the one by VNM. They differ in what they take to be basic. See the table here in section 2.2.5. As the article emphasizes, nothing can be concluded from direction of representation about what is more fundamental:

Notice that the order of construction differs between theorems: Ramsey constructs a representation of probability using utility, while von Neumann and Morgenstern begin with probabilities and construct a representation of utility. Thus, although the arrows represent a mathematical relationship of representation, they cannot represent a metaphysical relationship of grounding. The Reality Condition needs to be justified independently of any representation theorem.

E.g. you could also trivially "represent" preferences in terms of utilities by defining

This case isn't mentioned in the table because a representation proof based on it would be too trivial to label it a "theorem" (for example, preferences are automatically transitive because utilities are represented by real numbers and the "larger than" relation on the real numbers is transitive).

If we want to argue what is more fundamental, we need independent arguments; formal representation relations alone are too arbitrary.

There are indeed a few such arguments. For example, it makes both semantic and psychological sense to interpret "I prefer x to y" as "I want x more than I want x", but it doesn't seem possible to interpret (semantically and psychologically) plausible statements like "I want x much more than I want y" or "I want x about twice as much as I want y" in terms of preferences, or preferences and probabilities. The reason is that the latter force you to interpret utility functions as invariant under addition of arbitrary constants, which can make utility levels arbitrarily close to each other. So we can interpret preferences as being explained by relations between degrees of desire (strength of wanting), but we can't interpret desires as being explained by preference relations, or both preferences and probabilities.

Thanks for this, I always thought this is a quite fundamental/important issue. I hope Scott Garrabrant chimes in.

Maximizing the geometric expectation makes a lot of sense when we interpret "utility" as measuring wealth or money. Losing all your wealth is obviously much worse than doubling your wealth is good. The geometric expectation accounts for this by making doubling and having your wealth (at equal odds) cancel out in expectation.

But more often we mean with "utility" the degree of goodness or badness of an outcome ("welfare"), or how strongly we want it to be true or false ("degree of desire"). These values can arguably be both positive and negative. There seems to be no a priori lower bound on badness of an outcome (or how strongly we disvalue the outcome), just as there is no upper bound on its goodness (or how strongly we value it being true).

But the geometric expectation requires that utility is non-negative. Perhaps even positive, as the problems with zero utility show. Usually the geometric mean is only used for positive numbers.

Eric Neyman also made this point a while ago.

So my current take: Geometric expectation seems correct on utility as wealth, arithmetic expectation seems correct on utility as welfare, or utilities as degrees of desire ("values"). Though I haven't yet checked how many of the issues you mention are solved by this.

By the way, you should crosspost this to the EA Forum because of its obvious application to ethics. There should be an option for that in the post options, though perhaps you need to link your EA Forum account first.

One problem with Boltzmann's derivation of the second law of thermodynamics is that it "proves too much". Because an analogous derivation also says that entropy "increases" into the past direction, not just into the future direction. So we should assume that the entropy is as its lowest right now (as you are reading these words), instead of in the beginning. It basically says that the past did look like the future, just mirrored at the present moment, e.g. we grow older both in the past and the future direction. Our memories to the contrary just emerged out of nothing (after we emerged out of a grave), just like we will forget them in the future.

This problem went largely unnoticed for many years (though some famous physicists did notice it, as Barry Loewer, Albert's philosophical partner, points out in an interesting interview with Sean Carroll), until David Albert pointed it out more explicitly some 20 years ago. To "fix" the issue, we have to add, as an ad-hoc assumption, the Past Hypothesis, which simply asserts that the entropy in the beginning of the universe was minimal.

The problem here is that the Past Hypothesis can't be supported by empirical evidence like we would naively expect, as its negation predicts that all our records of the past are misleading. So we have to resort to more abstract arguments in its favor. I haven't seen such an account though. David Albert has a short footnote on how assuming a high entropy past would be "epistemically unstable" (presumably because the entropy being at its lowest "now" is a moving target), but that is far from a precise argument.

From the post:

Suppose that we want to translate between English and an alien language (Klingon). We have plenty of Klingon text, and separately we have plenty of English text, but it’s not matched up and there are no bilingual speakers.

We train GPT on a mix of English and Klingon text and find that it becomes fluent in both. In some sense this model “knows” quite a lot about both Klingon and English, and so it should be able to read a sentence in one language, understand it, and then express the same idea in the other language. But it’s not clear how we could train a translation model.

So he talks about the difficulty of judging whether an unsupervised translation is good, since there are no independent raters who understand both English and Alienese, so translations can't be improved with RLHF.

He posted this before OpenAI succeeded in applying RLHF to LLMs. I now think RLHF generally doesn't improve translation ability much anyway compared to prompting a foundation model. Based on what we have seen, it seems generally hard to improve raw LLM abilities with RLHF. Even if RLHF does improve translation relative to some good prompting, I would assume doing RLHF on some known translation pairs (like English and Chinese) would also help for other pairs which weren't mentioned in the RLHF data. E.g. by encouraging the model to mention it's uncertainty about the meaning of certain terms when doing translations. Though again, this could likely be achieved with prompting as well.

He also mentions the more general problem of language models not knowing why they believe what they believe. If a model translates X as Y rather than as Z, it can't provide the reasons for its decision (like pointing to specific statistics about the training data), except via post hoc rationalisation / confabulation.

I guess my question would then be whether the translation would work if neither language contained any information on microphysics or advanced math. Would the model be able to translate e.g. "z;0FK(JjjWCxN" into "fruit"?

I think this is almost impossible for humans to do. Even with a group of humans and decades of research. Otherwise we wouldn't have needed the Rosetta Stone to read Egyptian hieroglyphs. And would long have deciphered the Voynich manuscript.

Interesting reference! So an unsupervised approach from 2017/2018, presumably somewhat primitive by today's standards, already works quite well for English/French translation. This provides some evidence that the (more advanced?) LLM approach, or something similar, would actually work for English/Alienese.

Of course English and French are historically related, and arose on the same planet while being used by the same type of organism. So they are necessarily quite similar in terms of the concepts they encode. English and Alienese would be much more different and harder to translate.

But if it worked, it would mean that sufficiently long messages, with enough effort, basically translate themselves. A spiritual successor to the Pioneer plaque and the Arecibo message, instead of some galaxy brained hopefully-universally-readable message, would simply consist of several terabytes of human written text. Smart aliens could use the text to train a self-supervised Earthling/Alienese translation model, and then use this model to translate our text.

Load More