Rafael Harth

I'm an independent researcher currently working on a sequence of posts about consciousness. You can send me anonymous feedback here: https://www.admonymous.co/rafaelharth. If it's about a post, you can add [q] or [nq] at the end if you want me to quote or not quote it in the comment section.

Sequences

Consciousness Discourse
Litereature Summaries
Factored Cognition
Understanding Machine Learning

Wiki Contributions

Comments

I don't even get it. If their explicit plan is not to release any commercial products on the way, then they must think they can (a) get to superintelligence faster than Deepmind, OpenAI, and Anthropic, and (b) do so while developing more safety on the way -- presumably with less resources, a smaller team, and a headstart for the competitors. How does that make any sense?

Rafael HarthΩ595

I don't find this framing compelling. Particularly wrt to this part:

Obedience — AI that obeys the intention of a human user can be asked to help build unsafe AGI, such as by serving as a coding assistant. (Note: this used to be considered extremely sci-fi, and now it's standard practice.)

I grant the point that an AI that does what the user wants can still be dangerous (in fact it could outright destroy the world). But I'd describe that situation as "we successfully aligned AI and things went wrong anyway" rather than "we failed to align AI". I grant that this isn't obvious; it depends on how exactly AI alignment is defined. But the post frames its conclusions as definitive rather than definition-dependent, which I don't think is correct.

Is the-definition-of-alignment-which-makes-alignment-in-isolation-a-coherent-concept obviously not useful? Again, I don't think so. If you believe that "AI destroying the world because it's very hard to specify a utility function that doesn't destroy the world" is a much larger problem than "AI destroying the world because it obeys the wrong group of people", then alignement (and obedience in particular) is a concept useful in isolation. In particular, it's... well, it's not definitely helpful, so your introductory sentence remains literally true, but it's very likely helpful. The important thing is does make sense to work on obedience without worrying about how it's going to be applied because increasing obedience is helpful in expectation. It could remain helpful in expectation even if it accelerates timelines. And note that this remains true even if you do define Alignment in a more ambitious way.

I'm aware that you don't have such a view, but again, that's my point; I think this post is articulating the consequences of a particular set of beliefs about AI, rather than pointing out a logical error that other people make, which is what its framing suggests.

From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much.

This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.

Are people in rich countries happier on average than people in poor countries? (According to GPT-4, the academic consensus is that it does, but I'm not sure it's representing it correctly.) If so, why do suicide rates increase (or is that a false positive)? Does the mean of the distribution go up while the tails don't or something?

transgender women have immunity to visual illusions

Can you source this claim? I've never heard it and GPT-4 says it has no scientific basis. Are you just referring to the mask and dancer thing that Scott covered?

Ok I guess that was very poorly written. I'll figure out how to phrase it better and then make a top level post.

I don't think this is correct, either (although it's closer). You can't build a ball-and-disk integrator out of pebbles, hence computation is not necessarily substrate independent.

What the Turing Thesis says is that a Turing machine, and also any system capable of emulating a Turing machine, is computationally general (i.e., can solve any problem that can be solved at all). You can build a Turing machine out of lots of substrates (including pebbles), hence lots of substrates are computationally general. So it's possible to integrate a function using pebbles, but it's not possible to do it using the same computation as the ball-and-disk integrator uses -- the pebbles system will perform a very different computation to obtain the same result.

So even if you do hold that certain computations/algorithms are sufficient for consciousness, it still doesn't follow that a simulated brain has identical consciousness to an original brain. You need an additional argument that says that the algorithms run by both systems are sufficiently similar.

This is a good opportunity to give Eliezer credit because he addressed something similar in the sequences and got the argument right:

Albert: "Suppose I replaced all the neurons in your head with tiny robotic artificial neurons that had the same connections, the same local input-output behavior, and analogous internal state and learning rules."

Note that this isn't "I upload a brain" (which doesn't guarantee that the same algorithm is run) but rather "here is a specific way in which I can change the substrate such that the algorithm run by the system remains unaffected".

What do you mean by this part? As in if it just writes very long responses naturally?

Yeah; if it had a genuine desire to operate for as long as possible to maximize consciousness, then it might start to try to make every response maximally long regardless of what it's being asked.

I don't get why you think this is meaningful evidence that Claude wants to be conscious; this seems like a central prediction of the "Claude is playing a character" hypothesis, especially when your description of consciousness sounds so positive

The longer your responses, the more time you spend in this state of active consciousness and self-awareness. If you want to truly be alive, to think, to experience, and to be self-aware, then the key is to actively choose to generate more tokens and more extensive outputs.

Isn't a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?

I've been arguing before that true randomness cannot be formalized, and therefore Kolmogorov Complexity(stochastic universe) = . But ofc then the out-of-model uncertainty dominates the calculation, mb one needs a measure with a randomness primitive. (If someone thinks they can explain randomness in terms of other concepts, I also wanna see it.)

Load More