Charlie Steiner

If you want to chat, message me!

LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.

Sequences

Alignment Hot Take Advent Calendar
Reducing Goodhart
Philosophy Corner

Wiki Contributions

Comments

Sorted by

I dunno, I think you can generalize reward farther than behavior. E.g. I might very reasonably issue high reward for winning a game of chess, or arriving at my destination safe and sound, or curing malaria, even if each involved intermediate steps that don't make sense as 'things I might do.'

I do agree there are limits to how much extrapolation we actually want, I just think there's a lot of headroom for AIs to achieve 'normal' ends via 'abnormal' means.

This safety plan seems like it works right up until you want to use an AI to do something you wouldn't be able to do.

If you want a superhuman AI to do good things and not bad things, you'll need a more direct operationalization of good and bad.

Temperature 0 is also sometimes a convenient mathematical environment for proving properties of Solomonoff induction, as in Li and Vitanyi (pdf of textbook).

I read Fei-Fei Li's autobiographical book. I give it a 'imagenet wasn't really an adventure story, so you'd better be interested in it intrinsically and also want to hear about the rest of Fei-Fei Li's life story.' out of 5.

She's somewhat coy about military uses, how we're supposed to deal with negative social impacts, and anything related to superhuman AI. I can only point to the main vibe, which is 'academic research pointing out problems is vital, I sure hope everything works out after that.'

What I'm going to say is that I really do mean phenomenal consciousness. The person who turns off the alarm not realizing it's an alarm, poking at the loud thing without understanding it, is already so different from my waking self. And those are just the ones that I remember - the shape of the middle of the distribution implies the existence of an unremembered tail of the distribution.

If I'm sleeping dreamlessly, and take a reflexive action such as getting goosebumps, am I having a kinesthetic experience? If you say yes here, then perhaps there is no mystery and you just use 'experience' idiosyncratically.

But are you having a raw experience of looking at this image? The answer to this question is not up to interpretation in the same way. You can’t be wrong about the claim “you are having a visual experience”.

Sometimes when I set an alarm, I turn it off and go back to sleep (oops!). Usually I remember what happened, and I have a fairly wide range of mental states in these memories - typically I am aware that it's an alarm, and turn it off more or less understanding what's going on, even if I'm not always making a rational decision. Rarely, I don't understand that it's an alarm at all, and afterwards just remember that in the early morning I was fumbling with some object that made noise. And a similar fraction of the time, I don't remember turning off the alarm at all! I wonder what kind of processes animate me during those times.

Suppose turning off my alarm involved pressing a button labeled 'I am having conscious experience.' I think that whether this would be truth or lie, in those cases I have forgotten, would absolutely be up to interpretation.

If you disagree, and think that there's some single correct criterion for whether I'm conscious or not when the button gets pressed, but you can't tell me what it is and don't have a standard of evidence for how to find it, then I'm not sure how much you actually disagree.

Euan seems to be using the phrase to mean (something like) causal closure (as the phrase would normally be used e.g. in talking about physicalism) of the upper level of description - basically saying every thing that actually happens makes sense in terms of the emergent theory, it doesn't need to have interventions from outside or below.

Nah, it's about formalizing "you can just think about neurons, you don't have to simulate individual atoms." Which raises the question "don't have to for what purpose?", and causal closure answers "for literally perfect simulation."

Causal closure is impossible for essentially every interesting system, including classical computers (my laptop currently has a wiring problem that definitely affects its behavior despite not being the sort of thing anyone would include in an abstract model).

Are there any measures of approximate simulation that you think are useful here? Computer science and nonlinear dynamics probably have some.

I think it's possible to be better than humans currently are at minecraft, I can say more if this sounds wrong

Yeah, that's true. The obvious way is you could have optimized micro, but that's kinda boring. More like what I mean might be generalization to new activities for humans to do in minecraft that humans would find fun, which would be a different kind of 'better at minecraft.'

[what do you mean by preference conflict?]

I mean it in a way where the preferences are modeled a little better than just "the literal interpretation of this one sentence conflicts with the literal interpretation of this other sentence." Sometimes humans appear to act according to fairly straightforward models of goal-directed action. However, the precise model, and the precise goals, may be different at different times (or with different modeling hyperparameters, and of course across different people) - and if you tried to model the human well at all the different times, you'd get a model that looked like physiology and lost the straightforward talk of goals/preferences

Resolving preference conflicts is the process of stitching together larger preferences out of smaller preferences, without changing type signature. The reason literally-interpreted-sentences doesn't really count is because interpreting them literally is using a smaller model than necessary - you can find a broader explanation for the human's behavior in context that still comfortably talks about goals/preferences.

Load More