Ben Smith - LessWrong

I found >800 orthogonal "write code" steering vectors

In the human brain there is quite a lot of redundancy of information encoding. This could be for a variety of reasons.

here's one hot take: In a brain and a language model I can imagine that during early learning, the network hasn't learned concepts like "how to code" well enough to recognize that each training instance is an instance of the same thing. Consequently, during that early learning stage, the model does just encode a variety of representations for what turns out to be the same thing. 800 vector encodes in it starts to match each subsequent training example to prior examples and can encode the information more efficiently.

Then adding multiple vectors triggers a refusal just because the "code for making a bomb" sign gets amplified and more easily triggers the RLHF-derived circuit for "refuse to answer".

AI #55: Keep Clauding Along

Ben Smith4mo10

Have you tried asking Claude to summarize it for you?

The Aspiring Rationalist Congregation

Ben Smith6mo40

For me the issue is that

it isn't clear how you could enforce attendance or
what value individual attendees could have to make it worth their while to attend regularly.

(2) is sort of a collective action/game theoretic/coordination problem.

(1) reflects the rationalist nature of the organization.

Traditional religions back up attendance by divine command. They teach absolutist, divine command theoretic accounts of morality, backed up by accounts of commands from God to attend regularly. At the most severe mode these are backed by threat of eternal hellfire for disobedience. But it doesn't usually come to that. The moralization of the attendance norm is strong enough to justify moderate amounts of social pressure to conform to it. Often that's enough.

In a rationalist congregation, if you want a regular attendance norm, you have to ground it in a rational understanding that adhering to the norm makes the organization work. I think that might work, but it's probably a lot harder because it requires a lot more cognitive steps to get to and it only works so long as attendees buy into the goal of contributing to the project for its own sake.

Sentience, Sapience, Consciousness & Self-Awareness: Defining Complex Terms

Ben Smith7mo10

I tried a similar venn diagram approach more recently. I didn't really distinguish between bare "consciousness" and "sentience". I'm still not sure if I agree "aware without thoughts and feelings" is meaningful. I think awareness might alwyas be awareness of something. But nevertheless they are at least distinct concepts and they can be conceptually separated! Otherwise my model echos the one you have created earlier.

https://www.lesswrong.com/posts/W5bP5HDLY4deLgrpb/the-intelligence-sentience-orthogonality-thesis

I think it's a really interesting question as to whether you can have sentience and sapience but not self-awareness. I wouldn't take a view either way. I sort of speculated that perhaps primitive animals like shrimp might fit into that category.

Book Review: Going Infinite

Ben Smith9mo10

If Ray eventually found that the money was "still there", doesn't this make Sam right that "the money was really all there, or close to it" and "if he hadn’t declared bankruptcy it would all have worked out"?

Ray kept searching, Ray kept finding.

That would raise the amount collected to $9.3 billion—even before anyone asked CZ for the $2.275 billion he’d taken out of FTX. Ray was inching toward an answer to the question I’d been asking from the day of the collapse: Where did all that money go? The answer was: nowhere. It was still there.

Cohabitive Games so Far

Ben Smith9mo10

What a great read. Best of luck with this project. It sounds compelling.

Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)

Ben Smith10mo10

Seems to me that in this case, the two are connected. If I falsely believed my group was in the minority, I might refrain from clicking the button out of a sense of fairness or deference to the majority group.

Consequently, the lie not only influenced people who clicked the button, it perhaps also influenced people who did not. So due to the false premise on which the second survey was based, it should be disregarded altogether. To not disregard would be to have obtained by fraud or trickery a result that is disadvantageous to all the majority group members who chose not to click, falsely believing their view was a minority.

I think, morally speaking, avoiding disadvantaging participants through fraud is more important than honoring your word to their competitors.

The key difference between this and the example is that there's a connection between the lie and the promise.

The intelligence-sentience orthogonality thesis

Ben Smith1y10

Differentiating intelligence and agency seems hugely clarifying for many discussions in alignment.

You might have noticed I didn't actually fully differentiate intelligence and agency. It seems to me to exert agency a mind needs a certain amount of intelligence, and so I think all agents are intelligent, though not all intelligences are agentic. Agents that are minimally intelligent (like simple RL agents in simple computer models) also are pretty minimally agentic. I'd be curious to hear about a counter-example.

Incidentally I also like Anil Seth's work and I liked his recent book on consciousness, apart from the bit about AGI. I read it right along with Damasio's latest book on consciousness and they paired pretty well. Seth is a bit more concrete and detail oriented and I appreciated that.

It would make it much easier to understand ideas in this area if writers used more conceptual clarity, particularly empirical consciousness researchers (philosophers can be a bit better, I think, and I say that as an empirical researcher myself). When I read that quote from Seth, it seems clear he was arguing AGI is unlikely to be an existential threat because it's unlikely to be conscious. Does he naively conflate consciousness with agency, because he's not an artificial agency researcher and hasn't thought much about it? Or does he have a sophisticated point of view about how agency and consciousness really are linked, based on his ~~couple decades of consciousness research? Seems very unlikely, given how much we know about artificial agents, but the only way to be clear is to ask him.

Similarly MANY people including empirical researchers and maybe philosophers treat consciousness and self-awareness as somewhat synonymous, or at least interdependent. Is that because they're being naive about the link, or because, as outlined in Clark, Friston, & WIlkinson's Bayesing Qualia, they have sophisticated theories based on evidence that there really are tight links between the two? I think when writing this post I was pretty sure consciousness and self-awareness were "orthogonal"/independent, and now, following other discussion in the comments here and on Facebook, I'm less clear about that. But I'd like more people do what Friston did as he explained exactly why he thinks consciousness arises from self-awareness/meta-cognition.

The intelligence-sentience orthogonality thesis

Ben Smith1y60

I found the Clark et al. (2019) "Bayesing Qualia" article very useful, and that did give me an intuition of the account that perhaps sentience arises out of self-awareness. But they themselves acknowledged in their conclusion that the paper didn't quite demonstrate that principle, and I didn't find myself convinced of it.

Perhaps what I'd like readers to take away is that sentience and self-awareness can be at the very least conceptually distinguished. Even if it isn't clear empirically whether or not they are intrinsically linked, we ought to maintain a conceptual distinction in order to form testable hypotheses about whether they are in fact linked, and in order to reason about the nature of any link. Perhaps I should call that "Theoretical orthogonality". This is important to be able to reason whether, for instance, giving our AIs a self-awareness or situational awareness will cause them to be sentient. I do not think that will be the case, although I do think that, if you gave them the sort of detailed self-monitoring feelings that humans have, that may yield sentience itself. But it's not clear!

I listened to the whole episode with Bach as a result of your recommendation! Bach hardly even got a chance to express his ideas, and I'm not much closer to understanding his account of

meta-awareness (i.e., awareness of awareness) within the model of oneself which acts as a 'first-person character' in the movie/dream/"controlled hallucination" that the human brain constantly generates for oneself is the key thing that also compels the brain to attach qualia (experiences) to the model. In other words, the "character within the movie" thinks that it feels something because it has meta-awareness (i.e., the character is aware that it is aware (which reflects the actual meta-cognition in the brain, rather than in the brain, insofar the character is a faithful model of reality).

which seems like a crux here.

He sort of briefly described "consciousness as a dream state" at the very end, but although I did get the sense that maybe he thinks meta-awareness and sentience are connected, I didn't really hear a great argument for that point of view.

He spent several minutes arguing that agency, or seeking a utility function, is something humans have, but that these things aren't sufficient for consciousness (I don't remember whether he said whether they were necessary, so I suppose we don't know if he thinks they're orthogonal).

The intelligence-sentience orthogonality thesis

Ben Smith1y20

I wanted to write myself about a popular confusion between decision making, consciousness, and intelligence which among other things leads to bad AI alignment takes and mediocre philosophy.

This post has not got a lot of attention, so if you write your own post, perhaps the topic will have another shot at reaching popular consciousness (heh), and if you succeed, I might try to learn something about how you did it and this post did not!

I wasn't thinking that it's possible to separate qualia perception and self awareness

Separating qualia and self-awareness is a controversial assertion and it seems to me people have some strong contradictory intuitions about it!

I don't think, in the experience of perceiving red, there necessarily is any conscious awareness of oneself--in that moment there is just the qualia of redness. I can imagine two possible objections: (a) perhaps there is some kind of implicit awareness of self in that moment that enables the conscious awareness of red, or (b) perhaps it's only possible to have that experience of red within a perceptual framework where one has perceived onesself. But personally I don't find either of those accounts persuading.

I think flow states are also moments where one's awareness can be so focused on the activity one is engaged in that one momentarily loses any awareness of one's own self.

there is no intersection between sentience and intelligence that is not self-awarness.

I should have defined intelligence in the post--perhaps i"ll edit. The only concrete and clear definition of intelligence I'm aware of is psychology's g factor, which is something like the ability to recognize patterns and draw inferences from them. That is what I mean--no more than that.

A mind that is sentient and intelligent but not self aware might look like this: when a computer programmer is deep in the flow state of bringing a function in their head into code on the screen, they may experience moments of time where they have sentient awareness of their work, and certainly are using intelligence to transform their ideas into code, but do not in those particular moments have any awareness of self.

LESSWRONG
LW

Posts

Wiki Contributions

Comments