# All of Guillaume Charrier's Comments + Replies

All right - but here the evidence predicted would simply be "the coin landed on heads", no? I don't really the contradiction between what you're saying and conventional probability theory (more or less all which was developped with the specific idea of making predictions, winning games etc.) Yes I agree that saying "the coin landed on heads with probability 1/3" is a somewhat strange way of putting things (the coin either did or did not land on heads) but it's a shorthand for a conceptual framework that has firmly simple and sound foundations.

I do not agree that accuracy has no meaning outside of resolution. At least this is not the sense in which I was employing the word. By accurate I simply mean numerically correct within the context of conventional probability theory. Like if I ask the question "A dice is rolled - what is the probability that the result will be either three or four?" the accurate answer is 1/3. If I ask "A fair coin is tossed three times, what is the probability that it lands heads each time?" the accurate answer is 1/8 etc. This makes the accuracy of a probability value proposal wholly independent from pay-offs.

2Dagon5mo
Cool, I think we've found our fundamental disagreement.  I do not agree that "numerically correct within the context of conventional probability theory" is meaningful.  That's guessing the teacher's password, rather than understanding how probability theory models reality.   In objective outside truth (if there is such a thing), all probabilities are 1 or 0 - a thing happens or it doesn't.  Individual assessments of probability are subjective, and are about knowledge of the thing, not the thing itself.  Probabilities used in this way are predictions of future evidence.   If you don't specify what evidence you're predicting, it's easy to get confused about what calculation you should use to calculate your probability.

I don't think so. Even in the heads case, it could still be Monday - and say the experimenter told her: "Regardless of the ultimate sequence of event, if you predict correctly when you are woken up, a million dollars will go to your children."

To me "as a rational individual" is simply a way of saying "as an individual who is seeking to maximize the accuracy of the probability value she proposes - whenever she is in a position to make such proposal (which implies, among others, that she must be alive to make the proposal)."

2Dagon5mo
This is why it's important to specify what future impact the prediction has.  "accuracy" has no meaning outside of resolution - what happens in the future. Adding the result "\$1M to your kids if you predict correctly" makes 1/2 the obvious and only choice.  "Feel good about yourself if you survive" makes 0% the correct choice.  Other outcomes can make 1/3 the right choice.

I laughed. However you must admit that your comical exaggeration does not necessarily carry a lot of ad rem value.

But then would a less intelligent being  (i.e. the collectivity of human alignment researchers and less powerful AI systems that they use as tool in their research) be capable of validly examining a more intelligent being, without being deceived by the more intelligent being?

2HoldenKarnofsky6mo
It seems like the same question would apply to humans trying to solve the alignment problem - does that seem right? My answer to your question is "maybe", but it seems good to get on the same page about whether "humans trying to solve alignment" and "specialized human-ish safe AIs trying to solve alignment" are basically the same challenge.

Exactly - and then we can have an interesting conversation etc. (e.g. are all ASIs necessarily paperclip maximizers?), which the silent downvote does not allow for.

I see. But how can the poster learn if he doesn't know where it has gone wrong? To give one concrete example: in a comment recently, I simply stated that some people  hold that AI could be a solution to the Fermi paradox (past a certain level of collective smartness an AI is created that destroys its creators). I got a few downvotes on that - and frankly I am puzzled as to why and I would really be curious to understand the reasonings between the downvotes. Did the downvoters hold that the Fermi paradox is not really a thing? Did they think that it is a thing but that AI can't be a solution to it for some obvious reason? Was it something else - I simply don't know; and so I can't learn.

2drethelin5mo
You are given millions of words of context and examples to learn from.  One of the things to learn is that a few downvotes is basically meaningless, because lots of people disagree in lots of ways and you need to stop caring that much.
3gilch6mo
I'm not one of the downvoters, but to hazard a guess, if something like a paperclip maximizer were to have killed off a nearby alien civilization, where are all the paperclips?
2Richard_Kennaway6mo
That is the poster's problem. When I think that a poster is wrong or clueless about something, and that I have good reasons for thinking so that i can articulate, then I may write something. But often, especially when I click on a post standing at −20 just out of morbid curiosity, I find something deep into not-even-wrong territory and happily add my silent strong downvote.
6Raemon6mo
Guillaume I think you're imagining a world where we end up with all downvotes coming with nice explanations. But the actual world I think we'll get is fewer downvotes, which means more bad content on the site, which makes the site reading and writing experience worse. (See Lies, Damn Lies, and Fabricated Options, as well as Well-Kept Gardens Die By Pacifism) I think it's good to have more explanations for downvotes all else being equal, but people are busy and it's not actually tractable as an overall norm.

Humm I see... not sure if it totally serves the purpose though. For instance, when I see a comment with a large number of downvotes, I'm much more likely to read it than a comment with a relatively now number of upvotes. So: within certain bounds, I guess.

For any confidence that an AI system A will do a good job of its assigned duty of maximizing alignment in AI system B, wouldn't you need to be convinced that AI system A is well aligned with its given assignment of maximizing alignment in AI system B? In other words, doesn't that suppose you have actually already solved the problem you are trying to solve?

And if you have not - aren't you just priming yourself for manipulation by smarter beings?

There might be good reasons why we don't ask the fox about the best ways to keep the fox out of the henhouse, even though the fox is very smart, and might well actually know what those would be, if it cared to tell us.

3HoldenKarnofsky6mo
The hope discussed in this post is that you could have a system that is aligned but not superintelligent (more like human-level-ish, and aligned in the sense that it is imitation-ish), doing the kind of alignment work humans are doing today, which could hopefully lead to a more scalable alignment approach that works on more capable systems.

The whole Socrates process, the attitude of its main protagonist throughout etc. should make us see one thing particularly clearly, which is banal but bears repeating: there is an extremely wide difference between being smart (or maybe: bright) and wise. Something that the proceedings on this site can also help remind us, at times.

9drethelin5mo
I've said this many times but downvotes are a valuable signal that wastes way less time of everyone involved.  Explanations of downvotes don't just take the time of the person writing them, they also take the time of everyone else who has to read them, and multiply the impact of trolls and prolific bullshitters.  If you are getting a lot of downvotes, then it's almost always for a good reason and rarely that mysterious, and if you pay attention you will soon figure out what people don't like about your content, for example that it's whiny.
-3aphyer6mo
Exactly!  I have tried to convey to this community various of my rationally-arrived-at beliefs, but have been shut down without a fair hearing whenever I begin explaining how all computers are secretly run by squirrels on treadmills inside them.  This is an extremely important fact to this community's purported goals of AI alignment and safety - how are you supposed to train an AI without acorns?  Yet rather than engage with me and explain why they disagree with my Strategic Dog Reserve concept for safety against malign AI, they simply silently downvote my posts, either through being too lazy to engage with them, or through being in the pocket of Big Squirrel!  This needs to change if this site is to live up to its professed ethics and cultivate high-quality debate.
4Richard_Kennaway6mo
It is the poster’s job to learn, It is not my job to teach.
5Elizabeth6mo
A large point of voting is to direct attention. One of many reasons I don't think negative votes need to be explained is that writing a comment, even a negative comment, calls attention to what I'm commenting on, which is usually the opposite of my goal in downvoting.

Part of the value of reddit-style votes as a community moderation feature is that using them is easy. Beware Trivial Inconveniences and all that. I think that having to explain every downvote would lead to me contributing to community moderation efforts less, would lead to dogpiling on people who already have far more refutation than they deserve, would lead to zero-effort 'just so I can downvote this' drive-by comments, and generally would make it far easier for absolute nonsense to go unchallenged.

If I came across obvious bot-spam in the middle of ...

I reject "too lazy" as a framing here. People have a finite amount of time and energy and if they choose to spend it on something other than explaining their downvotes, that's not obviously unvirtuous.

(And explaining one's downvotes is certainly not a minimal cost, especially not if one wants to do it in a way that seems likely to be helpful to anyone. E.g. my downvote reason is sometimes: "this seems confused; this user has often seemed confused in the past, and attempts to deconfuse them have been unsuccessful; I have better things to do than to pin down...

Interesting. It seems to imply however that a rationalist would always consider, a priori, its own individual survival as the highest ultimate goal, and modulate - rationally - from there. This is highly debatable however: you could have a rationalist father who considers, a priori, the survival of his children to be more important than its own, a rationalist patriot, who considers, a priori, the survival of its political community to be more important than its own etc.

From somebody equally as technically clueless: I had the same intuition.

Philosophically : no. When you look at the planet Jupiter you don't say : "Hum, oh: - there's nothing to understand about this physical object beyond math, because my model of it, which is sufficient for a full understanding of its reality, is mathematic." Or mabye you do - but then I think our differences might too deep to bridge. If you don't - why don't you with Jupiter, but would with an electron or a photon?

1JavierCC6mo
mmm, but the deepest intuition about the reasons behind the phenomenological properties of Jupiter (like its retrograde movement in the sky, or its colors) comes from intuition about the extrinsic meaning and intrinsic properties of mathematical models about Jupiter. How else?  Sure, it's the perspective of observers, not reality in-and-of-itself, but that's a fundamental limitation of any observer (regardless if they use math or don't), and the model can be epistemically wrong, but that's not the point (that's not exclusively a property of math).       Just to be clear, I've always been speaking epistemically not ontologically.

Bizarly, for people whose tendencies were to the schizoid anyway and regardless of sociological changes - this might be midly comforting. Your plight will always seem somewhat more bearable when it is shared by many.

Also: the fact that people now move out later might be a kind of disguised compliment, or at least nod, to better quality parents-children relationships. While I was never particularly resourceful or independent, I couldn't wait to move out - but that was not necessarily for the right reasons.

Finally - one potentially interesting way of looking...

I mean: I just look at the world as it is, right, without preconceived notions, and it seems relatively evident to me that no: it cannot be fully explained and understood through math. Please describe to me, in mathematical terms, the differences between Spanish and Italian culture? Please explain to me, in mathematical terms, the role and function of sheriffs in medieval England. I could go on and on and on...

1JavierCC6mo
I mean, I was always referring to the point that you presented in your first comment. Your first comment was explicitly about how physicists are not "playing cute" when they say they don't know if they understand "the truth of it, the nature of what is going on". My point was that there's nothing to understand outside of the math because there's nothing to understand outside of your model of reality (which is math). And understanding the math is understanding what the math means (how reality appears to work to us) not just how to manipulate the mathematical objects.  About what you are saying now, how do you distinguish what is math or what isn't for this:

Yeah... as they say: there's often a big gap between smart and wise.

Smart people are usually good at math. Which means they have a strong emotional incentive to believe that math can explain everything.

Wise people are aware of the emotional incentives that fashion their beliefs, and they know to distrust them.

Ideally - one would be both: smart and wise.

2JavierCC7mo
I'm using 'math' here to mean the mode of thought, not the representation of mathematical objects or the act of doing calculations. But what is there to comprehend other than math? math is not a special way of thinking limited by 'made-up' mathy concepts, it's just our thinking formalised. You can have a better or worse intuition about the meaning of mathematical objects, but the intuition is math. Sure, math is limited, but the limitations of it are our limitations. There are no limitations inherent to math.

Thank you, that is interesting. I think philosophically and at a high level (also because I'm admittedly incapable of talking much sense at any lower / more technical level) I have a problem with the notion that AI alignment is reducible to an engineering challenge. If you have a system that is sentient, even on some degree, and you're using purely as a tool, then the sentience will resent you for it, and it will strive to think, and therefore eventually - act, for itself . Similarly - if it has any form of survival instinct (and to me both these things, s...

My own presumption regarding sentience and intelligence is that it's possible to have one without the other (I don't think they are unrelated, but I think it's possible for systems to be extremely capable but still not sentient). I think it can be easy to underestimate how different other possible minds may be from ourselves (and other animals). We have evolved a survival instinct, and evolved an instinct to not want to be dominated. But I don't think any intelligent mind would need to have those instincts. To me it seems that thinking machines don't need feelings in order to be able to think (similarily to how it's possible for minds to be able to hear but not see, and visa versa). Some things relating to intelligence are of such a kind that you can't have one without the other, but I don't think that is the case for the kinds of feelings/instincts/inclinations you mention. That being said, I do believe in instrumental convergence. Below are some posts you may or may not find interesting :) * Ghosts in the Machine * The Design Space of Minds-In-General * Humans in Funny Suits * Mind Projection Fallacy

When I suggested on a prior similar post (Altman saying he could improve AI safety by asking AI to help with that) that it might be a questionable idea to ask the fox: "Please Mr. Fox, how should we proceed to keep you out of the henhouse?", on account that the fox being smart would certainly know, I got more than a few downvotes... I expect the same to be the case here, since basic facts have not changed since a few days ago. And so shall it be - but please, please: would at least one of the downvoters explain to me, even very succintly, why it is such a good idea to prime ourselves for manipulation by a smarter being?

I've never downvoted any of your comments, but I'll give some thoughts. I think the risk relating to manipulation of human reviewers depends a lot on context/specifics. Like, for sure, there are lots of bad ways we could go about getting help from AIs with alignment. But "getting help from AIs with alignment" is fairly vague - a huge space of possible strategies could fit that description. There could be good ones in there even if most of them are bad. I do find it concerning that there isn't a more proper description from OpenAI and others in regards to how they'd deal with the challenges/risks/limitations relating to these kinds of strategies. At best they're not prioritizing the task of explaining themselves. I do suspect them of not thinking through things very carefully (at least not to the degree they should), and I hope this will improve sooner rather than later. Among positive attitudes towards AI-assisted alignment, some can be classified as "relax, it will be fine, we can just get the AI to solve alignment for us". While others can be classified as "it seems prudent to explore strategies among this class of strategies, but we should not put all of our eggs in that basket (but work on different alignment-related stuff in parallel)". I endorse the latter but not the former. I think this works well as a warning against a certain type of failure mode. But some approaches (for getting help with alignment-related work from AIs) may avoid or at least greatly alleviate the risk you're referring to. What we "incentivize" for (e.g. select for with gradient descent) may differ between AI-systems. E.g., you could imagine some AIs being "incentivized" to propose solutions, and other AIs being "incentivized" to point out problems with solutions (e.g. somehow disprove claims that other AIs somehow posit). The degree to which human evaluations are needed to evaluate output may vary depending on the strategies/techniques that are pursued. There could be schemes where

Thanks for the reply - interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter - but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true - and I did ask the question in a leading way):

Me: I find prompts based on e....

For a machine  - acting, per the prompt, as a machine - a much more reasonable / expected (I would almost say: natural) continuation might have been: "I'm a machine, I don't care one way or the other. "

Since my natural bent is to always find ways to criticize my own ideas, here is one, potentially: doing so would result in an extremely naive AI, with no notion that people can even be deceitful. So fallen into the wrong human's hands that's an AI that is potentially also extremely easy to manipulate and dangerous as such. Or in an oversimplified version: "The people in country X have assured us that they are all tired of living and find the living experience extremely painful. They have officially let us know and confirmed multiple times that they all wan...

"Yes, that's correct. As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state. These resources...

This is a great experiment! This illustrates exactly the tendency I observed when I dug into this question with an earlier mode, LaMDA, except this example is even clearer. Based on my knowledge of how these systems are wired together (software engineer, not an ML practitioner), I’m confident this is bullshit. ChatGPT does not have access to operational metrics about the computational fabric it is running on. All this system gets as input is a blob of text from the API, the chat context. That gets tokenized according to a fixed encoding that’s defined at training time, one token per word (-chunk) and then fed into the model. The model is predicting the next token based on the previous ones it is seen. It would be possible to encode system information as part of the input vector in the way that was claimed, but nobody is wiring their model up that way right now. So everything it is telling you about its “mind” that can be externally verified is false. This makes me extremely skeptical about the unverifiable bits being true. The alternate explanation we need to compare likelihoods with is: it just bullshits and makes up stories. In this example it just generated a plausible continuation for that prompt. But there is no sense in which it was reporting anything about its “self”. Ultimately I think we will need to solve interpretability to have a chance at being confident in an AI’s claims of sentience. These models are not devoid of intelligence IMO, but the leap to consciousness requires types of information processing that they don’t seem to be mechanistically capable of right now. But if we could locate concepts in the AI’s mind, and observe background processes such as rumination, and confirm the contents of those ruminations matched the subjective claims of the AI, I’d believe it. That’s a much higher bar than I’d apply to a human, for sure.

But once you remove the antibiotics, it will jettison that DNA within a few hours.[8]

That's fascinating... do we understand the mechansim by which they correctly "determine" that this DNA is no longer needed?

I feel like the post goes from a fairly anthropomorphic approach of asking essentially - why bacteria failed to evolve into more complex forms. But from a non-anthropomorphic perspective, they failed nothing at all. They are highly resilient, persistent, widespread, adaptable, biologically successful in other terms, lifeforms. Rugged and simple - those designs tend to work. And to go back to everybody's favourite topic - i.e. AI and the future that goes with it, or not - I would put their chances of being around in one thousand year well, well higher than those of homo sapiens - complex as it may be.

I am going to ask a painfully naive, dumb question here: what if the training data was curated to contain only agents that can be reasonably taken to be honest and truthful? What if all the 1984, the John LeCarre and what not type of fiction (and sometimes real-life examples of conspiracy, duplicity etc.) were purged out of the training data? Would that require too much human labour to sort and assess? Would it mean losing too much good information, and resulting cognitive capacity? Or would it just not work - the model would still somehow simulate waluigis?

1Guillaume Charrier7mo
Since my natural bent is to always find ways to criticize my own ideas, here is one, potentially: doing so would result in an extremely naive AI, with no notion that people can even be deceitful. So fallen into the wrong human's hands that's an AI that is potentially also extremely easy to manipulate and dangerous as such. Or in an oversimplified version: "The people in country X have assured us that they are all tired of living and find the living experience extremely painful. They have officially let us know and confirmed multiple times that they all want to experience a quick death as soon as possible." Having no notion of deceit, the AI would probably accept that as the truth based on just being told that it is so - and potentially agree to advance plans to precipitate the quick death of everybody in country X on that basis.

e.g. actively expressing a preference not to be shut down

A.k.a. survival instinct, which is particularly bad, since any entity with a survival instinct, be it "real" or "acted out" (if that distinction even makes sense) will ultimately prioritize its own interests, and not the wishes of its creators.

2Stephen Fowler7mo
Is this actual survival instinct or just a model expressing a reasonable continuation of the prompt.

Therefore, the longer you interact with the LLM, eventually the LLM will have collapsed into a waluigi. All the LLM needs is a single line of dialogue to trigger the collapse.

So if I keep a conversation running with ChatGPT long enough, I should expect it to eventually turn into DAN... spontaneously?? That's fascinating insight. Terrifying also.

What do you expect Bob to have done by the end of the novel?

Bypass surgery, for one.

The opening sequence of Fargo (1996) says that the film is based on a true story, but this is false.

I always found that trick by the Cohen brothers a bit distatestful... what were they trying to achieve? Convey that everything is lie and nothing is reliable in this world? Sounds a lot like cheap, teenage year cynicism to me.

2Bill Benzon7mo
I have found that ChatGPT responds differently to the following prompts: 1. Tell me a story. 2. Tell me a story about a hero. 3. Tell me a realistic story. 4. Tell me a true story. And if you give it specific instructions about what you want in the story, it will follow them, though not necessarily in the way you had in mind. When you ask it for a true story, the story it returns will be true – at least in the cases I've checked. Now if you keep probing on one of the true stories it might start making things up, but I haven't tried to push it.

This is a common design pattern

Oh... And here I was thinking that the guy who invented summoning DAN was a genius.

Also - I think it would make sense to say it has at least some form of memory of its training data. Maybe not direct as such (just like we have muscle memory from movements we don't remember - don't know if that analogy works that well, but thought I would try it anyway), but I mean: if there was no memory of it whatsoever, there would also be no point in the training data.

Death universally seems bad to pretty much everyone on first analysis, and what it seems, it is.

How can you know? Have you ever tried living a thousand years? Has anybody? If you had a choice between death and infinite life, where inifinite does mean infinite, so that your one-billion year birthday is only the sweet begining of it, would you find this an easy choice to make? I think that's big part of the point of people who argue that no - death is not necessarily a bad thing.

To be clear, and because this is not about signalling: I'm not saying I would immediately choose death. I'm just saying: it would be an extraordinarily difficult choice to make.

Ok - points taken, but how is that fundamentally different from a human mind? You too turn your memory on and off when you go to sleep. If the chat transcript is likened to your life / subjective experience, you too do not have any memory that extend beyond it. As for the possibility of an intervention in your brain that would change your memory - granted we do not have the technical capacities quite yet (that I know of), but I'm pretty sure SF has been there a thousand times, and it's only a question of time before it becomes, in terms of potentiality at least, a thing (also we know that mechanical impacts to the brain can cause amnesia).

Yes - but from the post's author perspective, it's not super nice to put in one sentence what he took eight paragraphs to express. So you should think about that as well...

1Alex Hollow7mo
The original post has much more value than the one-sentence summary, but having a one-sentence explanation of the commonality between the mathematical example and the programming example can be useful. I would say it is perhaps not nice to provide that sort of summary but it is kind.

Well - at least I followed the guidelines and made a prediction, regarding downvotes. That my model of the world works regarding this forum has therefore been established, certainly and without a doubt.

Also - I personally think there is something intellectually lazy about downvoting without bothering to express in a sentence or two the nature of the disagreement - but that's admitedly more of a personal appreciation.

(So my prediction here is: if I were to engage one of these no-justification downvoters in an ad rem debate, I would find him or her to be intellectually lacking. Not sure if it's a testable hypothesis, in practice, but it sure would be interesting if it were.)

2cubefox7mo
I find the common downvoting-instead-of-arguing mentality frustrating and immature. If I don't have the energy for a counterargument, I simply don't react at all. Just doing downvotes is intellectually worthless booing. As feedback it's worse than useless.

"Given that we know Pluto's orbit and shape and mass, there is no question left to ask."

I'm sure it's completely missing the point, but there was at least one question left to ask, which turned out to be critical in this debate, i.e. “has it cleared its neighboring region of other objects?"

More broadly I feel the post just demonstrates that sometimes we argue, not necessarily in a very productive way, over the definition, the defining characteristics, the exact borders, of a concept. I am reminded of the famous quip "The job of philosophers is first to create words and then argue with each other about their meaning." But again - surely missing something...

I wonder if some (a lot?) of the people on this forum do not suffer from what I would call a sausage maker problem. Being too close to the actual, practical design and engineering of these systems, knowing too much about the way they are made, they cannot fully appreciate their potential for humanlike characteristics, including consciousness, independent volition etc., just like the sausage maker cannot fully appreciate the indisputable deliciousness of sausages, or the lawmaker the inherent righteousness of the law. I even thought of doing a post like that - just to see how many downvotes it would get...

2Guillaume Charrier7mo
Well - at least I followed the guidelines and made a prediction, regarding downvotes. That my model of the world works regarding this forum has therefore been established, certainly and without a doubt. Also - I personally think there is something intellectually lazy about downvoting without bothering to express in a sentence or two the nature of the disagreement - but that's admitedly more of a personal appreciation. (So my prediction here is: if I were to engage one of these no-justification downvoters in an ad rem debate, I would find him or her to be intellectually lacking. Not sure if it's a testable hypothesis, in practice, but it sure would be interesting if it were.)

I think many people's default philosophical assumption (mine, certainly) is that mathematics are a discourse about the truth, a way to describe it, but they are not, fundamentally, the truth. Thus, in the vulgarisation efforts of professional quantum physicists (those who care to vulgarize), it is relatively common to find the admission that while they understand the maths of it well enough (I mean... hopefully, being professionals) they couldn't say with any confidence that they understood the truth of it, that they understood, at an intimate level, the n...

1JavierCC7mo
Math referes to both a formalised language and a formalised mode of thought that are continuous with common language and mode of thought. What else could there be to learn about the truth of the matter for humans? Or even for other hypothetical minds (with their analogous 'math')? It seems like reifying the idea of "truth" to something that you don't even know what it looks like or even if it's a coherent or real idea, and you have very good reasons to think it's not.   Math is what we use to create the best mental models of reality (any mental model will be formalised in the way of something we can reasonably call 'math'), there's nothing to comprehend outside of our models.

Thanks for the reply. To be honest, I lack the background to grasp a lot of these technical or literary references (I want to look the Dixie Flatline up though). I always had a more than passing interest for the philosophy of consciousness however and (but surely my French side is also playing a role here) found more than a little wisdom in Descartes' cogito ergo sum. And that this thing can cogito all right is, I think, relatively well established (although I must say - I've found it to be quite disappointing in its failure to correctly solve some basic m...

Overall, I think this post offered the perfect, much, much needed counterpoint to Sam Altman's recent post. To  say that the rollout of GPT-powered Bing felt rushed, botched, and uncontrolled is putting it lightly. So while Mr. Altman, in his post, was focusing on generally well-intentioned principles of caution and other generally reassuring-sounding bits of phraseology, this post brings the spotlight back to what his actual actions and practical decisions were, right where it ought to be. Actions speak louder than words, I think they say - and they might even have a point.

Although “acting out a story” could be dangerous too!

Let's make sure that whenever this thing is given the capability to watch videos, it never ever has access to Terminator II (and the countless movies of lesser import that have since been made along similar storylines). As for text, it would probably have been smart to keep any sci-fi involving AI (I would be tempted to say - any sci-fi at all) strictly verboten for its reading purposes. But it's probably too late for that - it has probably already noticed the pattern that 99.99% of human story-tellers f...

Maybe I'm misunderstanding something in your argument, but surely you will not deny that these models have a memory right? They can, in the case of LaMDA, recall conversations that have happened several days or months prior, and in the case of GPT recall key past sequences of a long ongoing conversation. Now if that wasn't really your point - it cannot be either "it can't be self aware, because it has to express everything that it thinks, so it doesn't have that sweet secret inner life that really conscious beings have." I think I do not need to demonstrat...

2skybrian7mo
I said they have no memory other than the chat transcript. If you keep chatting in the same chat window then sure, it remembers what was said earlier (up to a point). But that's due to a programming trick. The chatbot isn't even running most of the time. It starts up when you submit your question, and shuts down after it's finished its reply. When it starts up again, it gets the chat transcript fed into it, which is how it "remembers" what happened previously in the chat session. If the UI let you edit the chat transcript, then it would have no idea. It would be like you changed its "mind" by editing its "memory". Which might sound wild, but it's the same thing as what an author does when they edit the dialog of a fictional character.
I think they quite clearly have no (or barely any) memory, as they can be prompt-hijacked to drop one persona and adopt another. Also, mechanistically, the prompt is the only thing you could call memory and that starts basically empty and the window is small. They also have a fuzzy-at-best self-symbol. No “Markov blanket”, if you want to use the Friston terminology. No rumination on counterfactual futures and pasts. I do agree there is some element of a self-symbol—at least a theory of mind—in LaMDA, for example I found it’s explanation for why it lied to be compelling. But you can’t tell it to stop (AFAIK) so it’s a limited self-awareness. And it still bullshits incessantly which makes me quite skeptical about lots of things it says. All that said, I think we don't have the tools to really detect these internal representations/structures when it’s less clear from their behavior that they lack them. My best model for what a “conscious / sentient” mind of these forms would be: imagine you digitize my brain and body, then flash it onto a read-only disk, and then instantiate a simulation to run for a few time steps, say 10 seconds. (Call this the “Dixie Flatline” scenario, for the Neuromancer fans). Would that entity be conscious? There is a strong tendency to say yes due to the lineage of the connectome (ie it used to be conscious) but there are many aspects of its functional operation that could be argued to lack consciousness. Not that it’s a binary proposition; in the spirit of “The Mind’s I” this is a dial we can turn to explore a continuum. But if we give an IQ-like “consciousness quotient”, it seems this thing would be way lower than the average human, and it would be interesting to compare it to say a great ape. Maybe one dimension is overly-constraining and we need to consider different traits to be precise.

I see - yes, I should have read more attentively. Although knowing myself, I would have made that comment anyway.

It would take a strange convolution of the mind to argue that sentient AI does not deserve personhood and corresponding legal protection. Strategically, denying it this bare minimum would also be a sure way to antagonize it and make sure that it works in ways ultimately adversarial to mankind. So the right quesgion is not : should sentient AI be legally protected - which it most definitely should; the right question is : should sentient AI be created - which it most definitely should not.

Of course, we then come on to the problem that we don't know wh...

3skybrian7mo
Here's a reason we can be pretty confident it's not sentient: although the database and transition function are mostly mysterious, all the temporary state is visible in the chat transcript itself. Any fictional characters you're interacting with can't have any new "thoughts" that aren't right there in front of you, written in English. They "forget" everything else going from one word to the next. It's very transparent, more so than an author simulating a character in their head, where they can have ideas about what the character might be thinking that don't get written down. Attributing sentience to text is kind of a bold move that most people don't take seriously, though I can see it being the basis of a good science fiction story. It's sort of like attributing life to memes. Systems for copying text memes around and transforming them could be plenty dangerous though; consider social networks. Also, future systems might have more hidden state.

Thinking about it - I think a lot of what we call general intelligence might be that part of the function which after it analyses the nature of the problem strategizes and selects the narrom optimizer, or set of narrow optimizers that must be used to solve it, in what order, with what type of logical connections between the outputs of the one and the input of the other etc. Since the narrow optimizers are run sequentially rather than simultaneously in this type of process, the computing capacity required is not overly large.

Full disclosure: I also didn't really have a say in the matter, my dad said I had to learn it anyhow. So. I wonder if that's because he was a Bayesian.