Sorted by New

Wiki Contributions


I think you’re lumping “the ultimate goal” and “the primary mode of thinking required to achieve the ultimate goal” together erroneously. (But maybe the hypothetical person you’re devilishly advocating for doesn’t agree about utilitarianism and instrumentality?)

Re also also: the Reverse Streetlight effect will probably come into play. It’ll optimize not just for early deception, but for any kind of deception we can’t detect.

You’re saying that on priors, the humans are manipulative?

What do you mean by “you don’t grapple with the hard problem of consciousness”? (Is this just an abstruse way of saying “no, you’re wrong” to set up the following description of how I’m wrong? In that case, I’m not sure you have a leg to stand on when you say that I use “a lot of words”.) Edit: to be a bit more charitable, maybe it means “my model has elements that my model of your model doesn’t model”.

How can you know I see the same thing that you do? That depends on what you mean by “same”. To me, to talk about whether things are the same, we need to specify what characteristics we care about, or what category system we’re using. I know what it means for two animals to be of the same species, and what it means for two people to have the same parent. But for any two things to be the same, period, doesn’t really mean anything on its own. (You could argue that everything is the same as itself, but that’s a trivial case.)

This might seem like I’m saying that there isn’t any fundamental truth, only many ways of splitting the world up into categories. Not exactly. I don’t think there’s any fundamental truth to categories. There might be fundamental monads, or something like that, but human subjective experiences are definitely not fundamental. (And what truths can even be said of a stateless monad when considered on its own?)

For question 2, I think the human-initiated nature of AI risk could partially explain the small distance between ability and need. If we were completely incapable of working as a civilization, other civilizations might be a threat, but we wouldn’t have any AIs of our own, let alone general AIs.

I can’t tell if you already know this, but “infinite explanatory power” is equivalent to no real explanatory power. If it assigns equal probability to everything then nothing can be evidence in favor of it, and so on.

I'd assume the opposite, since I don't think physicists (and other thermodynamic scientists like some chemists) make up a majority of LW readers, but it's irrelevant. I can (and did) put both forms side-by-side to allow both physicists and non-physicists to better understand the magnitude of the temperature difference. (And since laymen are more likely to skim over the number and ignore the letter, it's disproportionately more important to include Fahrenheit.)

Edit: wait, delta-K is equivalent to delta-C. In that case, since physicists metric-users might make up the majority of LW readers, you're probably right about the number of users.

I think a "subjective experience" (edit: in the sense that two people can have the same subjective experience; not a particular instantiation of one) is just a particular (edit: category in a) categorization of possible experiences, defined by grouping together experiences that put the [person] into similar states (under some metric of "similar" that we care about). This recovers the ability to talk about "lies about subjective experiences" within a physicalist worldview.

In this case, we could look at how the AI internally changes in response to various stimuli, and group the stimuli on the basis of similar induced states. If this grouping doesn't match to its claims at all, then we can conclude that it is perhaps lying. (See: cleaving reality at its joints.) EDIT: Were you saying that AI cannot have subjective experience? Then I think this points at the crux; see my statements below about how I don't see human subjectivity as fundamentally special.

Yes, this means that we can talk about any physical thing having a "subjective experience". This is not a bug. The special thing about animals is that they have significant variance between different "subjective experiences", whereas a rock will react very similarly to any stimuli that don't break or boil it. Humans are different because they have very high meta-subjectivity and the ability to encode their "subjective experiences" into language. However, this still doesn't match up very well to human intuitions: any sort of database or measurement device can be said to have significant "subjective experiences". But my goal isn't to describe human intuitions; it's to describe the same thing that human intuitions describe. Human subjectivity doesn't seem to be fundamentally different from that of any other physical system.

He never said "will land heads", though. He just said "a flipped coin has a chance of landing heads", which is not a timeful statement. EDIT: no longer confident that this is the case

Didn't the post already counter your second paragraph? The subjective interpretation can be a superset of the propensity interpretation.

When you say "all days similar to this one", are you talking about all real days or all possible days? If it's "all possible days", then this seems like summing over the measures of all possible worlds compatible with both your experiences and the hypothesis, and dividing by the sum of the measures of all possible worlds compatible with your experiences. (Under this interpretation, jessicata's response doesn't make much sense; "similar to" means "observationally equivalent for observers with as much information as I have", and doesn't have a free variable.)

Load More