OscarGilg — LessWrong

Thats what illusionist mean by illusionism, but you haven't offered much motivation to believe it, and it needs motivation, because its far from obvious:-

Sure, right now I haven't offered that much motivation. The post is already probably too long.

The motivation, therefore is a strong belief in physicalism. Note that the basic manouvre -- saying that if E is apparently evidence against hypothesis H , E cannot be true -- can be generalised to other areas , and can be used to "prove" almost anything

Agreed that this would be very bad reasoning. But some inference can still be drawn right. If you have very strong independent priors for H and I show E is apparently evidence against H then your credence in E might go down (as might your credence in H to be fair). That would be one prima facie motivation.

Illusionism , as opposed to delusionism, has the further problem Th t it explains the illusion of Qualia as a quasi-phenomenal property: so it eliminates the kind of phenomenal properties for which there is direct evidence in favour of another kind for which there is none.

This is where the big disagreement happens imo. Because illusionists would say the data to explain is "reports about qualia" and not "qualia" (Dennett spoke about heterophenomenology). And for that we have a lot of evidence.

Circling back to motivations, one I didn't mention in the post is unreliability of introspection undercutting reasons to believe our intuitions about consciousness. A great book, Eric Schwitzgebel's Perplexities of Consciousness, talks about this e.g. "do we dream in colours?" "can humans echolocate?" "Do you constantly experience your feet in your shoes?".

One final prima facie motivation is that we should not necessarily expect evolution to produce transparent introspective access to our cognitive processes. Obviously there is some information being exchanged, but the representation only needs to be as faithful as survival demands.

Building Conscious* AI: An Illusionist Case

OscarGilg1mo10

Greatly appreciate the comment. I agree with most of it. As for the white light analogy, I'm definitely updating towards being less confident about it. Here is a perhaps stronger way of reframing it that I would be keen to get thoughts on:

Because the explanation of white light doesn't tell you that there is no such thing, it just tells you that it is made of parts, non fundamental. (Confusion between reduction and elimination is rife).

So if I understand correctly, you're saying that illusionism wants to eliminate phenomenal consciousness but that here I'm using a reduction analogy with white light. But I think both white light and consciousness deserve both treatments depending on what exactly we're targeting.

Of course white light is real, and can be reduced to xyz. But now consider something like (and this is where I'm somewhat moving the goalposts compared to what I originally wrote) "whiteness as pure luminance", then that can be eliminated. The analogy was chosen because pre-Newton white light was considered the purest, most fundamental form of light - with colours thought to be modifications or corruptions of this pure white light

For consciousness, illusionists seek to eliminate phenomenal consciousness but simply reduce so-called intuitions about consciousness. Eliminate the hard problem but simply reduce the meta-problem.

Thoughts?

Of course phenomenal consciousness is a model of the outside world: testable rd presents certain frequencies of light, and so on.

I'm not sure. The way I roughly see it, when you see red you have A) a representation that tracks red wavelengths and enables red-appropriate behaviours B) a representation of some additional "reddish feel". The first one definitely models the outside world but the second is a systematic misrepresentation.

Unexpected Conscious Entities

OscarGilg1mo30

Lots of great ideas. I like the approach of systematically listing out properties that consciousness might have, especially for studying LLMs.

But here's my issue: most of the properties you listed don't go any way towards explaining "why we think we are conscious". You could argue perhaps self-perception and theory of mind do that, but form how you later fill in the boxes for e.g. "countries", I don't think those are the properties we need. It would be quite weird if our theory of consciousness didn't explain why we even came up with the concept. Or at the very least what consciousness does, its causal impact. Michael Graziano calls this the Arrow B problem.

You could refine your set of properties to tackle that, but then I'm not sure the boxes for "country" or "Wildfire in LA" can be filled!

Obviously you could say maybe consciousness doesn't do anything, has no causal impact: it's an epiphenomenon. But it clearly has at least one impact: it's making us talk about it right now. Or maybe it's not an epiphenomenon in some entities (e.g. humans), but it is an epiphenomenon in other entities. I'll have to think about that but it feels like you would need a very principled reason to believe this combination of things.

Building Conscious* AI: An Illusionist Case

OscarGilg1mo30

Thanks for the comment! I had to have a think but here's my response:

The first thing is that I maybe wasn't clear about the scope of the comparison. It was just to say "whiteness of light is an illusion in roughly the same sense that phenomenal consciousness is" (as opposed to other definitions of illusion).

Even then, what differentiates these illusions from other abstractions? Obviously not all abstractions are illusions.

Take our (functional) concept of heat. In some sense it's an abstraction, and it doesn't quite work the way people thought a thousand years ago. But crucially, there exists a real-world process which maps onto our folk concept extremely nicely, such that the folk concept remains useful and tracks something real. Unlike phenomenal consciousness, it just so happens that we evolved our concept of heat without us attributing too many weird properties to it. Once we developed models of molecular kinetic energy, we could just plug them right in.

Where I think you might have a point is that this is arguably not a binary distinction, some concepts are clearly confused and others clearly not but in some cases it might be blurry (and consciousness might be one of those, i'm not sure).

I don't think this is generally what the illusionists mean, my understanding is that it is more about phenomenal consciousness being non-representational—meaning something like that it has the type signature of a world-model without actually being a model of anything real (including itself)

I think most illusionists believe consciousness involves real representations, but systematic misrepresentations. The cognitive processes are genuinely representing something (our cognitive states), but they are attributing phenomenal properties that don't actually exist in those states. That's quite different from it being non-representational, and not being a model of anything.

At least that's my understanding which comes from the Daniel Dennett/Keith Frankish views. I'd be interested in learning about others.

Building Conscious* AI: An Illusionist Case

OscarGilg1mo10

Thanks for the comment and the kind words!

It seems to me however that it is just stated as fact that “phenomenal experiences are nothing more than illusions”.

I think the disconnect for me is that I equate consciousness to “being” which, in Eastern Philosophy, has some extrinsic properties (which are phenomenal).

I'm no expert in Eastern Philosophy conceptions of consciousness, I've been meaning to but haven't gotten around to digging into it.

What I would say is this: for any phenomenal property attributed to consciousness (e.g. extrinsic ones), you can formulate an illusionist theory of it. You can be an illusionist about many things in the world (not always rightly).

The debunking argument might have to be tweaked, e.g. it might not be about "intuitions", and of course you could reject this kind of argument. Personally I would expect it to also be quite strong across the "phenomenal" range. I would be very happy to see some (counter-)examples!

Initially I agreed with this because I thought you meant “a correct explanation of our intuitions about consciousness” in a partial sense — i.e. not a comprehensive explanation. This is then used to “debunk consciousness”.
It seems to me that we can talk about components of conscious experience without needing to reach a holistic definition, and then we might still be able to discuss Consciousness* as the components of conscious experience minus phenomena. Maybe this matches what you’re saying?

I guess this sounds a bit like weak illusionism? Where phenomenal consciousness exists, but some of our intuitions about it are wrong. We would indeed also be able to discuss consciousness* (with asterisk), but we'd run into other problems and I don't think the argument about moral intuitions would be nearly as strong. Weak illusionism basically collapses to realism. It would point to consciousness* being more cognitively important so many of the points would be preserved. Let me know if this isn't what you meant.

Open problems in emergent misalignment

OscarGilg4mo30

Some ideas related to the Training data and Non-misalignment categories:

Maybe we should investigate potential “Emergent Situational Awareness”. I.e. do models acquire broad situational awareness capabilities from fine-tuning on narrow situational awareness tasks?

Building on that, I wonder whether combining the insecure-code fine-tuning dataset with targeted situational-awareness tasks (e.g. from the Situational Awareness Dataset) would lead to higher rates of EM? How about in the insecure-code with backdoors case from the original EM paper?

It feels important to understand the entire generalisation pathways which might get us from a few bad examples in fine-tuning datasets, to broad full-on scheming. That includes both learning “how to be misaligned” and when/how to act on that knowledge (and maybe other factors too).

Are these directions worth exploring? Is there any ongoing work that resembles this?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments