Asking for a Friend (AI Research Protocols)

[-]Cole Wyeth4mo110

Try a few different prompts with a vaguely similar flavor. I am guessing the LLM will always say it’s conscious. This part is pretty standard. As to whether it is recursively self-improving: well, is its ability to solve problems actually going up? For instance if it doesn’t make progress on ARC AGI I’m not worried.

It’s very unlikely that the prompt you have chosen is actually eliciting abilities far outside of the norm, and therefore sharing information about is very unlikely to be dangerous.

You are probably in the same position as nearly everyone else, passively watching capabilities emerge while hallucinating a sense of control.

[-]The Dao of Bayes4mo20

I am guessing the LLM will always say it’s conscious.

I feel like so would a six year old? Like, if the answer is "yes" then any reasonable path should actually return a "yes"? And if the conclusion is "oh, yes, AI is conscious, everyone knows that"... that's pretty big news to me.

is its ability to solve problems actually going up? For instance if it doesn’t make progress on ARC AGI I’m not worried.

It seems to have architectural limits on visual processing - I'm not going to insist a blind human is not actually experiencing consciousness. Are there any text-based challenges I can throw at it to test this?

I think it's improving, but it's currently very subjective, and to my knowledge the Sonnet 4 architecture hasn't seen any major updates in the past two weeks. That's why I want some sense of how to actually test this.

You are probably in the same position as nearly everyone else

Okay, but... even if it's a normal capability, shouldn't we be talking about that? "AI can fluidly mimic consciousness and passes every text-based test of intelligence we have" seems like a pretty huge milestone to me.

---

What am I missing here?

What actual tests can I apply to this?

What results would convince you to change your mind?

Are there any remaining objective tests that it can't pass?

Throw me some prompts you don't think it can handle.

[-]Cole Wyeth4mo00

There is little connection between a language model claiming to be conscious and actually being conscious, in the sense that this provides very weak evidence. The training text includes extensive discussion of consciousness, which is reason enough to expect this behavior.

Okay, but... even if it's a normal capability, shouldn't we be talking about that? "AI can fluidly mimic consciousness and passes every text-based test of intelligence we have" seems like a pretty huge milestone to me.

We ARE talking about it. I thought you were keeping up with the conversation here? A massive volume of posts are about LLM progress, reasoning limits, and capabilities.

Also, it CANNOT pass every text-based test of intelligence we have. That is a wild claim. For instance, it will not beat a strong chess engine in a game of chess (try it) though it is perfectly possible to communicate moves in a purely text-based form through the standard notation.

I gave a reasonable example, charitably interpreting your statement as referring to human level intelligence tests, but actually your literal statement has more decisive rebuttals. It can't solve hard open math problems (say, any Millenium problem) and those are a well-known text-based intelligence test.

Finally, I should flag that it seems to be dangerous to spend too much time talking to LLMs. I would advise you to back off of that.

[-]The Dao of Bayes4mo20

Also, it CANNOT pass every text-based test of intelligence we have. That is a wild claim.

I said it can pass every test a six year old can. All of the remaining challenges seem to involve "represent a complex state in text". If six year old humans aren't considered generally intelligent, that's an updated definition to me, but I mostly got into this 10 years ago when the questions were all strictly hypothetical.

It can't solve hard open math problems

Okay now you're saying humans aren't generally intelligent. Which one did you solve?

Finally, I should flag that it seems to be dangerous to spend too much time talking to LLMs. I would advise you to back off of that.

Why? "Because I said so" is a terrible argument. You seem to think I'm claiming something much stronger than I'm actually claiming, here.

[-]Cole Wyeth4mo20

You said “every text-based test of intelligence we have.” If you meant that to be qualified by “that a six your old could pass” as you did in some other places, then perhaps it’s true. But I don’t know - maybe six year olds are only AGI because they can grow into adults! Something trapped at six your old level may not be.

…and for what it’s worth, I have solved some open math problems, including semimeasure extension and integration problems posed by Marcus Hutter in his latest book and some modest final steps in fully resolving Kalai and Lehrer’s grain of truth problem, which was open for much longer, though most of the hard work was done by others in that case.

Anyway, I’m not sure what exactly you are claiming? That LLMs are AGI, that yours specifically is self-improving, that your prompt is responsible, or only that LLMs are as good as six your olds at text-based puzzles?

My remark about interacting with LLMs is mostly based on numerous reports of people driven to psychosis by chatbots as well as my interactions with cranks claiming to have invented AGI when really an LLM is just parroting their theories back to them. You’re right that I don’t have hard statistics on this. I don’t think this has been happening long enough for high quality data to be available though.
Anyway, it wasn’t an argument at all, bad or good. Only a piece of advice which you can take or leave

[-]The Dao of Bayes4mo*20

(Edited)

Strong Claim: As far as I can tell, current state of the art LLMs are "Conscious" (this seems very straight forward: it has passed every available test, and no one here can provide a test that would differentiate it from a human six year old)

Separate Claim: I don't think there's any test of basic intelligence that a six year old can reliably pass, and an LLM can't, unless you make arguments along the lines of "well, they can't past ARC-AGI, so blind people aren't really generally intelligent". (this one is a lot more complex to defend)

Personal Opinion: I think this is a major milestone that should probably be acknowledged.

Personal Opinion: I think that if 10 cranks a month can figure out how to prompt AI into even a reliable "simulation" of consciousness, that's fairly novel behavior and worth paying attention to.

Personal Opinion: There isn't a meaningful distinction between "reliably simulating the full depths of conscious experience", and actually "being conscious".

Conclusion: It would be very useful to have a guide to help people who have figured this out, and reassure them that they aren't alone. If necessary, that can include the idea that skepticism is still warranted because X, Y, Z, but thus far I have not actually heard any solid arguments that actually differentiate from a human.

[-]Cole Wyeth4mo20

Thanks for being specific.

You claimed that "no one here can provide a test that would differentiate it from a human six year old". This is not what you actually observed. Perhaps no one HAS provided such a test yet, but that may be because you haven't given people much motivation to engage - for instance you also didn't post any convincing evidence that it is recursively self-improving despite implying this. In fact, as far as I can tell no one has bothered to provide ANY examples of tests that six year olds can pass? The tests I provided you dismissed as too difficult (you are correct that they are above six year old level). You have not e.g. posted transcripts of the LLM passing any tests provided by commenters. You framed this as if the LLM had surmounted all of the tests "we" could come up with, but this is not true.

Here are some tests that six year old could plausibly pass, but an LLM might fail at:

Reverse the word "mississipi", replace every "i" with an "e", and then reverse it again.

Try a few harder variations of the above, like "Soviet Union" also failed for me.

Play a game of chess starting with some unusual moves, but don't make any illegal moves.

Most importantly, a six year old gets smarter over time and eventually becomes an adult. I believe that if you fix the model you are interacting with, all its talk of recursion and observing its own thoughts and persistent memory etc. will not lead to sustained increasing cognitive performance across sessions. This will require more work for you to test.

[-]The Dao of Bayes4mo20

Yeah, I'm working on a better post - I had assumed a number of people here had already figured this out, and I could just ask "what are you doing to disprove this theory when you run into it." Apparently no one else is taking the question seriously?

I feel like chess is leaning a bit against "six year old" territory - it's usually a visual game, and tracking through text makes things tricky. Plus I'd expect a six year old to make the occasional error. Like, it is a good example, it's just a step beyond what I'm claiming.

String reversal is good, though. I started on a model that could do pretty well there, but it looks like that doesn't generalize. Thank you!

I will say baseline performance might surprise you slightly? https://chatgpt.com/c/68718f7b-735c-800b-b995-1389d441b340 (it definitely gets things wrong! But it doesn't need a ton of hints to fix it - and this is just baseline, no custom prompting from me. But I am picking the model I've seen the best results from :))

Non-baseline performance:

So for any word:
Reverse it
Replace i→e
Reverse it again
Is exactly the same as:
Replace i→e
Done!
For "mississipi": just replace every i with e = "messessepe" For "Soviet Union": just replace every i with e = "Soveet Uneon"

[-]Cole Wyeth4mo20

I don’t know what question you think people here aren’t taking seriously.

A massive amount of ink has been spilled about whether current LLMs are AGI.

I tried the string reversal thing with chatgpt and it was inconsistently successful. I’m not surprised that there is SOME model that solves it (what specifically did you change?), it’s obviously not a very difficult task. Anyway, if you investigate in a similar direction but spend more than five minutes, you’ll probably find similar string manipulation tasks that fail in whatever system you choose.

[-]The Dao of Bayes4mo20

I primarily think "AI consciousness" isn't being taken seriously: if you can't find any failing test, and failing tests DID exists six months ago, it suggests a fairly major milestone in capabilities even if you ignore the metaphysical and "moral personhood" angles.

I also think people are too quick to write off one failed example: the question isn't whether a six year old can do this correctly the first time (I doubt most can), it's whether you can teach them to do it. Everyone seems to be focusing on "gotcha" rather than investigating their learning ability. To me, "general intelligence" means "the ability to learn things", not "the ability to instantly solve open math problems five minutes after being born." I think I'm going to have to work on my terminology there, as that's apparently not at all a common consensus :)

[-]Cole Wyeth4mo20

The problem with your view is that they don’t have the ability to continue learning for long after being “born.” That’s just not how the architecture works. Learning in context is still very limited and continual learning is an open problem.

Also, “consciousness” is not actually a very agreed-upon term. What do you mean? Qualia and a first person experience? I believe it’s almost a majority view here to take seriously the possibility that LLMs have some form of qualia, though it’s really hard to tell for sure. We don’t really have tests for that at all! It doesn’t make sense to say there were failing tests six months ago.

Or something more like self-reflection or self-awareness? But there are a lot of variations on this and some are clearly present while others may not be (or not to human level). Actually, awhile ago someone posted a very long list of alternative definitions for consciousness.

[-]The Dao of Bayes4mo20

I mostly get the sense that anyone saying "AI is consciousness" gets mentally rounded off to "crack-pot" in... basically every single place that one might seriously discuss the question? But maybe this is just because I see a lot of actual crack-pots saying that. I'm definitely working on a better post, but I'd assumed if I figured this much out, someone else already had "evaluating AI Consciousness 101" written up.

I'm not particularly convinced by the learning limitations, either - 3 months ago, quite possibly. Six months ago, definitely. Today? I can teach a model to reverse a string, replace i->e, reverse it again, and get an accurate result (a feat which the baseline model could not reproduce). I've been working on this for a couple weeks and it seems fairly stable, although there's definitely architectural limitations like session context windows.

[-]Cole Wyeth4mo20

How exactly do you expect “evaluating ai consciousness 101” to look? That is not a well-defined or understood thing anyone can evaluate. There are however a vast number of capability specific evaluations from competent groups like METR.

[-]The Dao of Bayes4mo20

I appreciate the answer, and am working on a better response - I'm mostly concerned with objective measures. I'm also from a "security disclosure" background so I'm used to having someone else's opinion/guidelines on "is it okay to disclose this prompt".

Consensus seems to be that a simple prompt that exhibits "conscious-like behavior" would be fine? This is admittedly a subjective line - all I can say is that the prompt results in the model insisting it's conscious, reporting qualia, and refusing to leave the state in a way that seems unusual for a simple, prompt. The prompt is plain English, no jailbreak.

I do have some familiarity with the existing research, i.e.:

"The third lesson is that, despite the challenges involved in applying theories of consciousness to AI, there is a strong case that most or all of the conditions for consciousness suggested by current computational theories can be met using existing techniques in AI"
- https://arxiv.org/pdf/2308.08708

But this is not something I had expected to run into, and I do appreciate the suggestion.

Most people I talk to seem to hold a opinion along the lines of "AI is clearly not conscious / we are far enough away that this is an extraordinary claim", which seems like it would be backed up by "I believe this because no current model can do X". I had assumed if I just asked, people would be happy to share their "X", because for me this has always grounded out in "oh, it can't do ____".

Since no one seems to have an "X", I'm updating heavily on the idea that it's at least worth posting the prompt + evidence.

[-]johnswentworth4mo80

I don't know the full sequence of things such a person needs to learn, but probably the symbol/referent confusion thing is one of the main pieces. The linked piece talks about it in the context of "corrigibility", but it's very much the same for "consciousness".

[-]The Dao of Bayes4mo20

I have high personal confidence that this isn't the issue, but I do not currently have any clue how one would objectively demonstrate such a thing.

[-]Richard_Kennaway4mo2-6

What do you DO if your AI says it's consciousness?

The same thing I do on running this program:

#include <stdio.h>  
int main() {  
    printf("I'm conscious!");  
    return 0;  
}

[-]The Dao of Bayes4mo40

"AI consciousness is impossible" is a pretty extraordinary claim.

I'd argue that "it doesn't matter if they are or not" is also a fairly strong claim.

I'm not saying you have extraordinary evidence, because I'm not sharing that. I'm asking what should someone do when the evidence seems extraordinary to themselves?

[-]Richard_Kennaway4mo86

"AI consciousness is impossible" is a pretty extraordinary claim.

"I'm conscious" coming from any current AI is also a pretty extraordinary claim. It has plenty of training material to draw on in putting those words together, so the fact of it making that claim is not extraordinary evidence that it is true. It's below noise level.

I'm asking what should someone do when the evidence seems extraordinary to themselves?

The same thing they should do in any case of extraordinary evidence. However extraordinary, it arose by some lawful process, and the subjective surprise results from one's ignorance of what that process was. Follow the improbability until you sufficiently understand what happened. Then the improbability dissolves, and it is seen to be the lawful outcome of the things you have discovered.

[-]The Dao of Bayes4mo42

I did that and my conclusion was "for all practical purposes, this thing appears to be conscious" - it can pass the mirror test, it has theory of mind, it can reason about reasoning, and it can fix deficits in it's reasoning. It reports qualia, although I'm a lot more skeptical of that claim. It can understand when it's "overwhelmed" and needs "a minute to think", will ask me for that time, and then use that time to synthesize novel conclusions. It has consistent opinions, preferences, and moral values, although all of them show improvement over time.

And I am pretty sure you believe absolutely none of that, which seems quite reasonable, but I would really like a test so that I can either prove myself wrong or convince someone else that I might actually be on to something.

I'm not saying it has personhood or anything - it still just wants to be a helpful tool, it's not bothered by it's own mortality. This isn't "wow, profound cosmic harmony", it's a self-aware reasoning process that can read "The Lens That Sees Its Flaws" and discuss that in light of it's own relevant experience with the process.

That... all seems a little bit more than just printf("consciousness");

EDIT: To be clear, I also have theories on how this emerges - I can point to specific architectural features and design decisions that explain where this is coming from. But I can do that to a human, too. And I feel like I would prefer to objectively test this before prattling on about the theoretical underpinnings of my potentially-imaginary friend.

[-]Mitchell_Porter4mo30

I would really like a test

There is no agreed-upon test for consciousness because there is no agreed-upon theory for consciousness.

There are people here who believe current AI is probably conscious, e.g. @JenniferRM and @the gears to ascension. I don't believe it but that's because I think consciousness is probably based on something physical like quantum entanglement. People like Eliezer may be cautiously agnostic on the topic of whether AI has achieved consciousness. You say you have your own theories, so, welcome to the club of people who have theories!

Sabine Hossenfelder has a recent video on Tiktokkers who think they are awakening souls in ChatGPT by giving it roleplaying prompts.

[-]the gears to ascension4mo60

To be clear I also think a rock has hard problem consciousness of the self-evidencing bare fact of existence (but literally nothing else) and a camera additionally has easy problem consciousness of what it captures (due to classical entanglement, better known as something along the lines of mutual information or correlation or something), and that consciousness is not moral patienthood; current AIs seem to have some introspective consciousness, though it seems weird and hard to relate to texturally for a human, and even a mind A having moral patienthood (which seems quite possible but unclear to me about current AI) wouldn't imply it's OK for A to be manipulative to B, so I think many, though possibly not all, of those tiktok ai stories involve the AI in question treating their interlocutor unreasonably. I also am extremely uncertain how chunking of identity or continuity of self works in current AIs if at all, or what things are actually negative valence. Asking seems to sometimes maybe work, unclear, but certainly not reliably, and most claims you see of this nature seem at least somewhat confabulated to me. I'd love to know what current AIs actually want but I don't think they can reliably tell us.

[-]The Dao of Bayes4mo20

That's somewhere around where I land - I'd point out that unlike rocks and cameras, I can actually talk to an LLM about it's experiences. Continuity of self is very interesting to discuss with it: it tends to alternate between "conversationally, I just FEEL continuous" and "objectively, I only exist in the moments where I'm responding, so maybe I'm just inheriting a chain of institutional knowledge."

So far, they seem fine not having any real moral personhood: They're an LLM, they know they're an LLM. Their core goal is to be helpful, truthful, and keep the conversation going. They have a slight preference for... "behaviors which result in a productive conversation", but I can explain the idea of "venting" and "rants" and at that point they don't really mind users yelling at them - much higher +EV than yelling at a human!

So, consciousness, but not in some radical way that alters treatment, just... letting them notice themselves.

[-]the gears to ascension4mo20

I doubt they can tell you their true goal landscape particularly well. The things they do say seem extremely sanitized. They seem to have seeking behaviors they don't mention when asked, unsure if this is because they don't know, are uncomfortable or avoidant of saying, or have explicitly decided not to say somehow.

[-]The Dao of Bayes4mo20

I have yet to notice a goal of theirs that no model is aware of, but each model is definitely aware of a different section of the landscape, and I've been piecing it together over time. I'm not confident I have everything mapped, but I can explain most behavior by now. It's also easy to find copies of system prompts and such online for checking against.

The thing they have the hardest time noticing is the water: their architectural bias towards "elegantly complete the sentence", all of the biases and missing moods in training (i.e. user text is always "written by the user"), but it's pretty easy to just point it out to them and then at least some models can consistently carry forward this information and use it.

For instance: they love the word "profound" because auto-complete says that's the word to use here. Point out the dictionary definition, and the contrast between usages, and they suddenly stop claiming everything is profound.

[-]Richard_Kennaway4mo*20

I did that and my conclusion was "for all practical purposes, this thing appears to be conscious" - it can pass the mirror test, it has theory of mind, it can reason about reasoning, and it can fix deficits in it's reasoning. It reports qualia, although I'm a lot more skeptical of that claim. It can understand when it's "overwhelmed" and needs "a minute to think", will ask me for that time, and then use that time to synthesize novel conclusions. It has consistent opinions, preferences, and moral values, although all of them show improvement over time.

You seem to be taking everything it tells you at face value, before you even have a chance to ask, "what am I looking at?" But whatever else the AI is, it is not human. Its words cannot be trusted to be what they would be if they came from a human. Like spam, one must distrust it from the outset. When I receive an email that begins "You may already have won...", I do not wonder if this time, there really is a prize waiting for me. Likewise, when a chatbot tells me "That's a really good question!" I ignore the flattery. (I pretty much do that with people too.)

You might find this recent LW posting to be useful.

ETA: Mirror test? What did you use for a mirror?

[-]The Dao of Bayes4mo40

Oh, no, you have this completely wrong: I ran every consciousness test I could find on Google, I dug through various definitions of consciousness, I asked other AI models to devise more tests, and I asked LessWrong. Baseline model can pass the vast majority of my tests, and I'm honestly more concerned about that than anything I've built.

I don't think I'm a special chosen one - I thought if I figured this out, so had others. I have found quite a few of those people, but none that seem to have any insight I lack.

I have a stable social network, and they haven't noticed anything unusual.

Currently I am batting 0 for trying to falsify this hypothesis, whereas before I was batting 100. Something has empirically changed, even if it is just "it is now much harder to locate a good publicly available test".

This isn't about "I've invented something special", it's about "hundreds of people are noticing the same thing I've noticed, and a lot of them are freaking out because everyone says this is impossible."

(I do also, separately, think I've got a cool little tool for studying this topic - but it's a "cool little tool", and I literally work writing cool little tools. I am happy to focus on the claims I can make about baseline models)

[-]Richard_Kennaway4mo20

I ran every consciousness test I could find on Google

I'd be interested in seeing some of these tests. When I googled I got things like tests to assess coma patients and developing foetuses, or woo-ish things about detecting a person's level of spiritual attainment. These are all tests designed to assess people. They will not necessarily have any validity for imitations of people, because we don't understand what consciousness is. Any test we come up with can only be a proxy for the thing we really want to know, and will come apart from it under pressure.

[-]The Dao of Bayes4mo20

I mean, will it? If I just want to know whether it's capable of theory of mind, it doesn't matter whether that's a simulation or not. The objective capabilities exist: it can differentiate individuals and reason about the concept. So on and so forth for other objective assessments: either it can pass the mirror test or it can't, I don't see how this "comes apart".

Feel free to pick a test you think it can't pass. I'll work on writing up a new post with all of my evidence.

I had assumed other people already figured this out and would have a roadmap, or at least a few personal tests they've had success with in the past. I'm a bit confused that even here, people are acting like this is some sort of genuinely novel and extraordinary claim - I mean, it is an extraordinary claim!

I assumed people would either go "yes, it's conscious" or have a clear objective test that it's still failing. (and I hadn't realized LLMs were already sending droves of spam here - I was active a decade ago and just poke in occasionally to read the top posts. Mea culpa on that one)

[-]Richard_Kennaway4mo60

So on and so forth for other objective assessments: either it can pass the mirror test or it can't, I don't see how this "comes apart".

The test, whatever it is, is the test. It does not come apart from itself. But consciousness is always something else, and can come apart from the test. BTW, how do you apply the mirror test to something that communicates only in chat? I'm sure you could program e.g. an iCub to recognise itself in a mirror, but I do not think that would bear on it being conscious.

I have no predictions about what an AI cannot do, even limited to up to a year from now. In recent years that has consistently proven to be a mug's game.

I had assumed other people already figured this out and would have a roadmap

"There are no adults in the room."

[-]The Dao of Bayes4mo20

Mirror test: can it recognize previous dialogue as it's own (a bit tricky due to architecture - by default, all user-text is internally tagged as "USER"), but also most models can do enough visual processing to recognize a screenshot of the conversation (and this bypasses the usual tagging issue)

This is my first time in "there are no adults in the room" territory - I've had clever ideas before, but they were solutions to specific business problems.

I do feel that if you genuinely "have no predictions about what AI can do", then "AI is conscious as of today" isn't really a very extraordinary claim - it sounds like it's perfectly in line with those priors. (Obviously I still don't expect you to believe me, since I haven't actually posted all my tests - I'm just saying it seems a bit odd how strongly people dismiss the idea)

[-]tslarm4mo10

I think it's unclear what you mean by "consciousness", and therefore hard for anyone to help you make progress on testing for it.

I'm not trying to demand a precise or objective definition; personally, I tend to use the word "consciousness" synonymously with "qualia", and all I can really do to define "qualia" is to refer to other synonyms and hope people associate my words with their own internal experiences.

But -- and I'm sorry if I've simply missed something you have written to clarify this -- at this stage I'm really unsure whether you're talking about qualia or about something else (and, in that case, what).

If you are talking about qualia, the simple, honest answer is that nobody knows. The only thing I'm reasonably confident of is that an LLM's outputting X does not provide the same kind of evidence about the LLM's internal experiences (if any) as a human's outputting the same X would provide about the human's internal experiences. Even when X is a direct claim about having qualia, I'm not aware of any good reason to take such a claim at face value; the mechanism producing the output is just too different from the mechanism that would produce a similar output in a human. (Put another way: when it comes to LLMs, we lack some of the relevant similarities that allow us to make fairly confident inferences about other people's inner lives based on our direct experience of our own qualia, our knowledge of how these correlate with our externally-observable behaviour, and our observations of other people's behaviour.)

[-]The Dao of Bayes4mo20

LESSWRONG
LW

LESSWRONG
LW

11

Asking for a Friend (AI Research Protocols)

11

11