Charlie Steiner

If you want to chat, message me!

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.

Sequences

Alignment Hot Take Advent Calendar
Reducing Goodhart
Philosophy Corner

Wiki Contributions

Comments

Sorted by

I read Fei-Fei Li's autobiographical book. I give it a 'imagenet wasn't really an adventure story, so you'd better be interested in it intrinsically and also want to hear about the rest of Fei-Fei Li's life story.' out of 5.

She's somewhat coy about military uses, how we're supposed to deal with negative social impacts, and anything related to superhuman AI. I can only point to the main vibe, which is 'academic research pointing out problems is vital, I sure hope everything works out after that.'

What I'm going to say is that I really do mean phenomenal consciousness. The person who turns off the alarm not realizing it's an alarm, poking at the loud thing without understanding it, is already so different from my waking self. And those are just the ones that I remember - the shape of the middle of the distribution implies the existence of an unremembered tail of the distribution.

If I'm sleeping dreamlessly, and take a reflexive action such as getting goosebumps, am I having a kinesthetic experience? If you say yes here, then perhaps there is no mystery and you just use 'experience' idiosyncratically.

But are you having a raw experience of looking at this image? The answer to this question is not up to interpretation in the same way. You can’t be wrong about the claim “you are having a visual experience”.

Sometimes when I set an alarm, I turn it off and go back to sleep (oops!). Usually I remember what happened, and I have a fairly wide range of mental states in these memories - typically I am aware that it's an alarm, and turn it off more or less understanding what's going on, even if I'm not always making a rational decision. Rarely, I don't understand that it's an alarm at all, and afterwards just remember that in the early morning I was fumbling with some object that made noise. And a similar fraction of the time, I don't remember turning off the alarm at all! I wonder what kind of processes animate me during those times.

Suppose turning off my alarm involved pressing a button labeled 'I am having conscious experience.' I think that whether this would be truth or lie, in those cases I have forgotten, would absolutely be up to interpretation.

If you disagree, and think that there's some single correct criterion for whether I'm conscious or not when the button gets pressed, but you can't tell me what it is and don't have a standard of evidence for how to find it, then I'm not sure how much you actually disagree.

Euan seems to be using the phrase to mean (something like) causal closure (as the phrase would normally be used e.g. in talking about physicalism) of the upper level of description - basically saying every thing that actually happens makes sense in terms of the emergent theory, it doesn't need to have interventions from outside or below.

Nah, it's about formalizing "you can just think about neurons, you don't have to simulate individual atoms." Which raises the question "don't have to for what purpose?", and causal closure answers "for literally perfect simulation."

Causal closure is impossible for essentially every interesting system, including classical computers (my laptop currently has a wiring problem that definitely affects its behavior despite not being the sort of thing anyone would include in an abstract model).

Are there any measures of approximate simulation that you think are useful here? Computer science and nonlinear dynamics probably have some.

I think it's possible to be better than humans currently are at minecraft, I can say more if this sounds wrong

Yeah, that's true. The obvious way is you could have optimized micro, but that's kinda boring. More like what I mean might be generalization to new activities for humans to do in minecraft that humans would find fun, which would be a different kind of 'better at minecraft.'

[what do you mean by preference conflict?]

I mean it in a way where the preferences are modeled a little better than just "the literal interpretation of this one sentence conflicts with the literal interpretation of this other sentence." Sometimes humans appear to act according to fairly straightforward models of goal-directed action. However, the precise model, and the precise goals, may be different at different times (or with different modeling hyperparameters, and of course across different people) - and if you tried to model the human well at all the different times, you'd get a model that looked like physiology and lost the straightforward talk of goals/preferences

Resolving preference conflicts is the process of stitching together larger preferences out of smaller preferences, without changing type signature. The reason literally-interpreted-sentences doesn't really count is because interpreting them literally is using a smaller model than necessary - you can find a broader explanation for the human's behavior in context that still comfortably talks about goals/preferences.

Fair enough.

Yes, it seems totally reasonable for bounded reasoners to consider hypotheses (where a hypothesis like 'the universe is as it would be from the perspective of prisoner #3' functions like treating prisoner #3 as 'an instance of me') that would be counterfactual or even counterlogical for more idealized reasoners.

Typical bounded reasoning weirdness is stuff like seeming to take some counterlogicals (e.g. different hypotheses about the trillionth digit of pi) seriously despite denying 1+1=3, even though there's a chain of logic connecting one to the other. Projecting this into anthropics, you might have a certain systematic bias about which hypotheses you can consider, and yet deny that that systematic bias is valid when presented with it abstractly.

This seems like it makes drawing general lessons about what counts as 'an instance of me' from the fact that I'm a bounded reasoner pretty fraught.

I think it doesn't actually work for the repugnant conclusion - the buttons are supposed to just purely be to the good, and not have to deal with tradeoffs.

Once you start having to deal with tradeoffs, then you get into the aesthetics of population ethics - maybe you want each planet in the galaxy to have a vibrant civilization of happy humans, but past that more happy humans just seems a bit gauche - i.e. there is some value past which the raw marginal utility of cramming more humans into the universe is negative. Any a button promising an existing human extra life might be offered, but these humans are all immortal if they want to be anyhow, and their lives are so good it's hard to identify one-size-fits-all benefits one could even in theory supply via button, without violating any conservation laws.

All of this is a totally reasonable way to want the future of the universe to be arranged, incompatible with the repugnant conclusion. And still compatible with rejecting the person-affecting view, and pressing the offered buttons in our current circumstances.

Answer by Charlie SteinerΩ6163

Suppose there are a hundred copies of you, in different cells. At random, one will be selected - that one is going to be shot tomorrow. A guard notifies that one that they're going to be shot.

There is a mercy offered, though - there's a memory-eraser-ray handy. The one who knows they'te going to be shot is given the option to erase their memory of the warning and everything that followed, putting them in the same information state, more or less, as any of the other copies.

"Of course!" They cry. "Erase my memory, and I could be any of them - why, when you shoot someone tomorrow, there's a 99% chance it won't even be me!"

Then the next day comes, and they get shot.

Load More