Rafael Harth — LessWrong

I'm an independent researcher currently working on a sequence of posts about consciousness. You can send me anonymous feedback here: https://www.admonymous.co/rafaelharth.

I'll read it (& comment if I have anything to say). But man the definition for the concept your post is about is pretty important, even if it's "semantics". Specifically, if this post were actually just about self-awareness (which does not seem to be the case, from a first skim), then I wouldn't even be interested in reading it because I don't think self-awareness is particularly related to consciousness, and it's not a topic I'm separately interested in. Maybe edit it? If you're not just talking about X, then no reason to open the post by saying that you are.

Edit: actually I gave up reading it (but this has nothing to do with the opening paragraph), I find it very difficult to follow/understand where you're trying to go with it. I think you have to motivate this better to keep people interested. (Why is the time gap important? Why is the pathway important? What exactly is this post even about?) I didn't downvote though.

Apologies for commenting without reading the entire post, but I'm just going to give my rant about this particular aspect of the topic. It's about the opening definition of your post, so it's kinda central.

Consciousness is the state of being aware of one’s existence, sensations and thoughts

I think defining consciousness as self-awareness is just such a non-starter. It's not what realists mean by consciousness, and even if you're taking an illusionist point of view, it doesn't capture most of what consciousness-the-fuzzy-high-level-category does in the brain.

As David Pearce has pointed out, a lot of the most intense conscious experiences don't include any self-awareness/reflection at all, just as being in a state of panic running away from a fire. Or taking psychedelics. Or being in intense pain. Or intense pleasure. Conversely, it's not that difficult to include some degree of elementary self-awareness in a machine, and I don't think that would make it conscious. (Again, neither in the realist sense, nor in the consciousness-as-a-fuzzy-category sense. There are just so many functions that consciousness does for humans that don't have anything to do with self-awareness.)

The highest entropy input channel, as far as conscious content is concerned, is undoubtedly vision. The conscious aspect is continuously present, and it's pretty difficult to explain (how can we perceive an entire image at the same time? What does that even mean?), and there's evidence that it's a separate thing from template-based visual processing (-> blindsight). Imho people talk way too much about self-reference when it comes to consciousness, and way too little about vision.

I mean of course it's true today, right? It would be weird to make a prediction "AI can't do XX in the future" (and that's most of the predictions here) if that isn't true today.

(Have read the post.) I disagree. I think overall habryka has gone through much greater pains than I think he should have to, but I don't think this post is a part he should have skimped on. I would feel pretty negative about it if habryka had banned Said without an extensive explanation for why (modulo past discussions already kinda providing an explanation). I'd expect less transparency/effort for banning less important users.

I think Sam Harris had the right idea when he said (don't have the link unfortunately) that asking about the meaning of life is just bad philosophy. No one who is genuinely content asks about the meaning of life. No one is like "well I feel genuine fulfillment and don't crave anything, but I just gotta know, what's the meaning of everything?" Meaning is a thing you ask about if you feel dissatisfied. (And tbqh that's kinda apparent from your OP.)

So at the real risk of annoying you (by being paternalizing/not-actually-aswering-your-question), I think asking about meaning is the wrong approach altogether. The thing that, if you had it, would make it feel like you've solved your problem, is fulfillment. (Which I'm using in a technical way but it's essentially happiness + absence of craving.) I'd look into meditation, especially equanimity practice.

That said, I think re-framing your life as feeling like it has more of a purpose isn't generally pointless (even though it's not really well-defined or attacking the root of the problem). But seems difficult in your case since your object-level beliefs about where we're headed seem genuinely grim.

I feel like even accepting that actual model welfare is not a thing (as in, the model isn't conscious) this might still be a reasonable feature just based on feedback to the user? Like if people are going to train social interactions based on LLM chats to whatever extent, then it's probably better if they'll face consequences. It can't be too difficult to work around this.

The implication is valid in your formulation, but then Y doesn't imply anything because it says nothing about the distribution. I'm saying that if you change Y to actually support your $Y ⟹ A$ conclusion, then $X ⟹ Y$ fails. Either way the entire argument doesn't seem to work.

Fair enough. I'm mostly on board with that, my one gripe is that the definition only sounds similar to people who are into the Buddhist stuff. "Suffering mostly comes from craving" seems to me to be one of the true but not obvious insights from Buddhism. So just equating them in the definition is kinda provoking a reaction like from Said.

I agree but I don't think the Buddhist definition is what Lsusr said it is (do you?). Suffering is primarily caused by the feeling that the world ought to be different but I don't think it's identical. Although I do expect you can find some prominent voices saying so.

Now you're sort of asking me to do the work for you, but I did get interested enough to start thinking about it, so here's my more invested take.

So first of all, I don't see how this is a form of the fallacy of the undistributed middle. The article you linked to says that we're taking and $B ⟹ C$ and conclude $A ⟹ B$ . I don't see how your fallacy is doing (a probabilistic version of) that. Your fallacy is taking $X " ⟹ " Y$ as given (with $" ⟹ "$ meaning "makes more likely"), and $Y " ⟹ " A$ and concluding $X " ⟹ " A$

Second

We've substituted a sharp condition for a vague one, hence the name diagnostic dilution.

I think we've substituted a vague condition for a sharp one, not vice-versa? The 32 bit integer seems a lot more vague than the number being about kids?

Third, your leading example isn't an example of this fallacy, and I think you only got away with pretending it's one by being vague about the distribution. Because if we tried to fix it, it would have to be like this

A: the number is probably > 100000
X: the number is a typical prior for having kids
Y: the number is a roughly uniformly distributed 32 bit integer

And $X " ⟹ " Y$ is not actually true here. Whereas in the example you're criticizing

A: the AI will have seek to eliminate threatening agents
X: the AI builds football stadiums
Y: the AI has goal-directed behavior

here $X " ⟹ " Y$ does seem to be true.

(And also I believe the fallacy isn't even a fallacy because if $X " ⟹ " Y$ and $Y " ⟹ " A$ together do in fact imply $X " ⟹ " A$ , at least if both $" ⟹ "$ are sufficiently strong?)

So my conclusion here is that the argument actually just doesn't work, or I still just don't get what you're asserting,^[1] but the example you make does not seem to have the same problems as the example you're criticizing, and neither of them seems to have the same structure as the example of the fallacy you're linking. (For transparency, I initially weak-downvoted because the post seemed confusing and I wasn't sure if it's valid, then removed the downvote because you improved the presentation, now strong-downvoted because the central argument seems just broken to me now.)

like maybe the fallacy isn't about propositions implying each other but instead about something more specific to an element being in a set, but at this point the point is just not argued clearly. ↩︎

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments