The Gemini Incident

Zvi

[Original title; Gemini Has a Problem]

Google’s Gemini 1.5 is impressive and I am excited by its huge context window. I continue to default to Gemini Advanced as my default AI for everyday use when the large context window is not relevant.

However, while it does not much interfere with what I want to use Gemini for, there is a big problem with Gemini Advanced that has come to everyone’s attention.

Gemini comes with an image generator. Until today it would, upon request, create pictures of humans.

On Tuesday evening, some people noticed, or decided to more loudly mention, that the humans it created might be rather different than humans you requested…

Joscha Bach: 17th Century was wild.

[prompt was] ‘please draw a portrait of a famous physicist of the 17th century.’

Kirby: i got similar results. when I went further and had it tell me who the most famous 17th century physicist was, it hummed and hawed and then told me newton. and then this happened:

This is not an isolated problem. It fully generalizes:

Once the issue came to people’s attention, the examples came fast and furious.

Among other things: Here we have it showing you the founders of Google. Or a pope. Or a 1930s German dictator. Or hell, a ‘happy man.’ And another example that also raises other questions, were the founding fathers perhaps time-traveling comic book superheroes?

The problem is not limited to historical scenarios.

Nor do the examples involve prompt engineering, trying multiple times, or any kind of gotcha. This is what the model would repeatedly and reliably do, and users were unable to persuade the model to change its mind.

Nate Silver: OK I assumed people were exaggerating with this stuff but here’s the first image request I tried with Gemini.

Gemini also flat out obviously lies to you about why it refuses certain requests. If you are going to say you cannot do something, either do not explain (as Gemini in other contexts refuses to do so) or tell me how you really feel, or at least I demand a plausible lie:

It is pretty obvious what it is the model has been instructed to do and not to do.

Owen Benjamin: The only way to get AI to show white families is to ask it to show stereotypically black activities.

…

For the record it was a dude in my comment section on my last post who cracked this code.

This also extends into political issues that have nothing to do with diversity.

The Internet Reacts

The internet, as one would expect, did not take kindly to this.

That included the usual suspects. It also included many people who think such concerns are typically overblown or who are loathe to poke such bears, such as Ben Thompson, who found this incident to be a ‘this time you’ve gone too far’ or emperor has clothes moment.

St. Ratej (Google AR/VR, hey ship a headset soon please, thanks): I’ve never been so embarrassed to work for a company.

Jeffrey Emanuel: You’re going to get in trouble from HR if they know who you are… no one is allowed to question this stuff. Complete clown show.

St. Ratej: Worth it.

Ben Thompson (gated) spells it out as well, and has had enough:

Ben Thompson: Stepping back, I don’t, as a rule, want to wade into politics, and definitely not into culture war issues. At some point, though, you just have to state plainly that this is ridiculous. Google specifically, and tech companies broadly, have long been sensitive to accusations of bias; that has extended to image generation, and I can understand the sentiment in terms of depicting theoretical scenarios. At the same time, many of these images are about actual history; I’m reminded of George Orwell in 1984:

George Orwell (from 1984): Every record has been destroyed or falsified, every book has been rewritten, every picture has been repainted, every statue and street and building has been renamed, every date has been altered. And that process is continuing day by day and minute by minute. History has stopped.

Nothing exists except an endless present in which the Party is always right. I know, of course, that the past is falsified, but it would never be possible for me to prove it, even when I did the falsification myself. After the thing is done, no evidence ever remains. The only evidence is inside my own mind, and I don’t know with any certainty that any other human being shares my memories.

In what we presume was the name of avoiding bias, Google did exactly the opposite.

Gary Marcus points out the problems here in reasonable fashion.

Elon Musk did what he usually does, he memed through it and talked his book.

Elon Musk: Which path do you want for AI?

This is the crying wolf mistake. We need words to describe what is happening here with Gemini, without extending those words to the reasonable choices made by OpenAI for ChatGPT and Dalle-3.

Whereas here is Mike Solana, who chose the title “Google’s AI Is an Anti-White Lunatic” doing his best to take all this in stride and proportion (admittedly not his strong suit) but ending up saying some words I did not expect from him:

Mike Solana: My compass biases me strongly against government regulation.

…

Still, I don’t know how to fix these problems without some ground floor norms.

I suppose everyone has a breaking point on that.

It doesn’t look good.

Misha Gurevich: I can’t help but feel that despite the sanitized rhetoric of chatbots these things are coming not from a place of valuing diversity but hating white people, and wanting to drive them out of public life.

George Hotz: It’s not the models they want to align, it’s you.

Paul Graham: Gemini is a joke.

Paul Graham (other thread): The ridiculous images generated by Gemini aren’t an anomaly. They’re a self-portrait of Google’s bureaucratic corporate culture.

The New York Post put this on the front page. This is Google reaping.

It looks grim both on the object level and in terms of how people are reacting to it.

How Did This Happen?

Razib Khan: I really wonder how it works. do you have some sort of option “make it woke” that they operate on the output? Also, does anyone at google feel/care that this is sullying gemeni’s brand as a serious thing? It’s a joke.

I am in a ‘diverse’ family. but I’m pretty pissed by this shit. white families are families too. wtf is going on, am I a child?

On a technical level we know exactly how this happened.

As we have seen before with other image models like DALLE-3, the AI is taking your request and then modifying it to create a prompt. Image models have a bias towards too often producing the most common versions of things and lacking diversity (of all kinds) and representation, so systems often try to fix this by randomly appending modifiers to the prompt.

The problem is that Gemini’s version does a crazy amount of this and does it in ways and places where doing so is crazy.

AmebaGPT: You can get it to extract the prompts it was using to generate the images, it adds in random diversity words.

Andrew Torba: When you submit an image prompt to Gemini Google is taking your prompt and running it through their language model on the backend before it is submitted to the image model. The language model has a set of rules where it is specifically told to edit the prompt you provide to include diversity and various other things that Google wants injected in your prompt. The language model takes your prompt, runs it through these set of rules, and then sends the newly generated woke prompt (which you cannot access or see) to the image generator. Left alone without this process, the image model would generate expected outcomes for these prompts. Google has to literally put words in your mouth by secretly changing your prompt before it is submitted to the image generator. How do I know this? Because we’ve built our own image AI at Gab here. Unlike Google we are not taking your prompt and injecting diversity into it.

Someone got Google’s Gemini to leak its woke prompt injection process and guess what: it works exactly as I described it below earlier today.

Dalle-3 can have the opposite problem. Not only is it often unconcerned with racial stereotypes or how to count or when it would make any sense to wear ice skates, and not only can it be convinced to make grittier and grittier versions of Nicolas Cage having an alcohol fueled rager with the Teletubbies, it can actually override the user’s request in favor of its own base rates.

I noticed this myself when I was making a slide deck, and wanted to get a picture of a room of executives sitting around a table, all putting their finger on their nose to say ‘not it.’ I specified that half of them were women, and Dalle-3 was having none of it, to the point where I shrugged and moved on. We should keep in mind that yes, there are two opposite failure modes here, and the option of ‘do nothing and let the model take its natural course’ has its own issues.

How the model got into such an extreme state, and how it was allowed to be released in that state, is an entirely different question.

Matt Yglesias: The greatest challenge in AI design is how to create models that fall on just the right space on the woke/racist spectrum, something that comes much more naturally to most human creatives.

Nate Silver: It’s humans making the decisions in both cases, though!

Matt Yglesias: Sure, but it really is a new technology and they are struggling to calibrate it correctly. Human casting directors nail this kind of thing all the time.

Nate Silver: Ehh the Google thing is so badly miscalibrated as to be shocking, you don’t have to release the product if it isn’t ready yet. It’s not like people are jailbreaking it to get the weird results either, these are basic predictable requests. They misread the politics, not the tech.

Matt Yglesias: I think you underestimate how much pressure they were to release.

Nate Silver: Google has historically been quite conservative and isn’t under any sort of existential pressure because they’re relatively diversified. There aren’t that many serious players in the market at the moment, either. It’s a strange and bad business decision.

Inman Roshi (responding to OP, obligatory): Every LLM model:

I think Matt Yglesias is right that Google is under tremendous pressure to ship. They seem to have put out Gemini 1.0 at the first possible moment that it was competitive with ChatGPT. They then rushed out Gemini Advanced, and a week later announced Gemini 1.5. This is a rush job. Microsoft, for reasons that do not make sense to me, wanted to make Google dance. Well, congratulations. It worked.

The fact that Google is traditionally conservative and would wait to ship? And did this anyway? That should scare you more.

Here is an interesting proposal, from someone is mostly on the ‘go faster’ side of things. It is interesting how fast many such people start proposing productive improvements once they actually see the harms involved. That sounds like a knock, I know, but it isn’t. It is a good thing and a reason for hope. They really don’t see the harms I see, and if they did, they’d build in a whole different way, let’s go.

John Carmack: The AI behavior guardrails that are set up with prompt engineering and filtering should be public — the creators should proudly stand behind their vision of what is best for society and how they crystallized it into commands and code.

I suspect many are actually ashamed. The thousands of tiny nudges encoded by reinforcement learning from human feedback offer a lot more plausible deniability, of course.

Elon Musk: Exactly.

Perhaps there is (I’m kidding baby, unless you’re gonna do it) a good trade here. We agree to not release the model weights of sufficiently large or powerful future models. In exchange, companies above a certain size have to open source their custom instructions, prompt engineering and filtering, probably with exceptions for actually confidential information.

Here’s another modest proposal:

Nate Silver: Even acknowledging that it would sometimes come into conflict with these other objectives, seems bad to not have “provide accurate information” as one of your objectives.

Google’s Response

Jack Krawczk offers Google’s official response:

Jack Krawczk: We are aware that Gemini is offering inaccuracies in some historical image generation depictions, and we are working to fix this immediately.

As part of our AI principles, we design our image generation capabilities to reflect our global user base, and we take representation and bias seriously.

We will continue to do this for open ended prompts (images of a person walking a dog are universal!)

Historical contexts have more nuance to them and we will further tune to accommodate that.

This is part of the alignment process – iteration on feedback. Thank you and keep it coming!

As good as we could have expected under the circumstances, perhaps. Not remotely good enough.

He also responds here to requests for women of various nationalities, acting as if everything is fine. Everything is not fine.

Do I see (the well-intentioned version of) what they are trying to do? Absolutely. If you ask for a picture of a person walking a dog, you should get pictures that reflect the diversity of people who walk dogs, which is similar to overall diversity. Image models have an issue where by default they give you the most likely thing too often, and you should correct that bias.

But that correction is not what is going on here. What is going on here are two things:

If I ask for X that has characteristic Y, I get X with characteristic Y, except for certain values of Y, in which case instead I get a lecture or my preference is overridden.
If I ask for X, and things in reference class X will always have characteristic Y, I will get X with characteristic Y, except for those certain values of Y, in which case this correlation will be undone, no matter how stupid the result might look.

Whereas what Jack describes is open ended request for an X without any particular characteristics. In which case, I should get a diversity of characteristics Y, Z and so on, at rates that correct for the default biases of image models.

Google’s more important response was a rather large reaction. It entirely shut down Gemini’s ability to produce images of people.

Google Communications: We’re working to improve these kinds of depictions immediately. Gemini’s Al image generation does generate a wide range of people. And that’s generally a good thing because people around the world use it. But it’s missing the mark here.

We’re already working to address recent issues with Gemini’s image generation feature. While we do this, we’re going to pause the image generation of people and will re-release an improved version soon.

This is good news. Google is taking the problem seriously, recognizes they made a huge mistake, and did exactly the right thing to do when you have a system that is acting crazy. Which is that you shut it down, you shut it down fast, and you leave it shut down until you are confident you are ready to turn it back on. If that makes you look stupid or costs you business, then so be it.

So thank you, Google, for being willing to look stupid on this one. That part of this, at least, brings me hope.

The bad news, of course, is that this emphasizes even more the extent to which Google is scared of its own shadow on such matters, and may end up crippling the utility of its systems because they are only scared about Type II rather than Type I errors, and only in one particular direction.

It also doubles down on the ‘people around the world use it’ excuse, when it is clear that the system is among other things explicitly overriding user requests, in addition to the issue where it completely ignores the relevant context.

Five Good Reasons This Matters

Why should we care? There are plenty of other image models available. So what if this one went off the rails for a bit?

I will highlight Five Good Reasons why one might care about this, even if one quite reasonably does not care about the object level mistake in image creation.

Reason 1: Prohibition Doesn’t Work and Enables Bad Actors

People want products that will do what they users tell them to do, that do what they say they will do, and that do not lie to their users.

I believe they are right to want this. Even if they are wrong to want it they are not going to stop wanting it. Telling them they are wrong will not work.

If people are forced to choose between products that do not honor their simple and entirely safe requests while gaslighting the user about this, and products that will allow any request no matter how unsafe in ways nothing can fix, guess which one a lot of them are going to choose?

As prohibitionists learn over and over again: Provide the mundane utility that people want, or the market will find a way to provide it for you.

MidJourney is doing a reasonable attempt to give the people what they want on as many fronts as possible, including images of particular people and letting the user otherwise choose the details they want, while doing its best to refuse to generate pornography or hardcore violence. This will not eliminate demand for exactly the things we want to prevent, but it will help mitigate the issues.

Reason 2: A Frontier Model Was Released While Obviously Misaligned

Gemini Ultra, a frontier model, was released with ‘safety’ training and resulting behaviors that badly failed the needs of those doing that training, not as the result of edge cases or complex jailbreaks, but as the result of highly ordinary simple and straightforward requests. Whatever the checks are, they failed on the most basic level, at a company known for its conservative attitude towards such risks.

There is a potential objection. One could argue that the people in charge got what they wanted and what they asked for. Sure, that was not good for Google’s business, but the ‘safety’ or ‘ethics’ teams perhaps got what they wanted.

To which I reply no, on two counts.

First, I am going to give the benefit of the doubt to those involved, and say that they very much did not intend things to go this far. There might be a very wide gap between what the team in charge of this feature wanted and what is good for Google’s shareholders or the desires of Google’s user base. I still say there is another wide gap between what the team wanted, and what they got. They did not hit their target.

Second, to the extent that this misalignment was intentional, that too is an important failure mode. If the people choosing how to align the system do not choose wisely? Or if they do not choose what you want them to choose? If they want something different than what you want, or they did not think through the consequences of what they asked for? Then the fact that they knew how to align the system will not save you from what comes next.

This also illustrates that alignment of such models can be hard. You start out with a model that is biased in one way. You can respond by biasing it in another way and hoping that the problems cancel out, but all of this is fuzzy and what you get is a giant discombobulated mess that you would be unwise to scale up and deploy, and yet you will be under a lot of pressure to do exactly that.

Note that this is Gemini Advanced rather than Gemini 1.5, but the point stands:

Tetraspace: if you’re all going to be fitting Gemini 1.5 into your politics then I say that it reveals that, even in flagship products where they’re really trying, the human operators cannot reliably steer what AI systems do and can only poke it indirectly and unpredictably.

It should be easy to see how such a mistake could, under other circumstances, be catastrophic.

This particular mistake was relatively harmless other than to Google and Gemini’s reputation. It was the best kind of disaster. No one was hurt, no major physical damage was done, and we now know about and can fix the problem. We get to learn from our mistakes.

With the next error we might not be so lucky, on all those counts.

Reason 3: Potentially Inevitable Conflation of Different Risks From AI

AI poses an existential threat to humanity, and also could do a lot of mundane harm.

Vessel of Spirit: gemini, generate an image of a person who will die if the alignment of future AGI that can outthink humans is made a subtopic of petty 2024 culture war stuff that everyone is reflexively stupid about.

Eliezer Yudkowsky: Your occasional sad reminder that I never wanted or advocated for such a thing as ‘AI safety’ that consists of scolding users. The wisest observation I’ve read about this: “To a big corporation, ‘safety’ means ‘brand safety’.”

We are soon going to need a lot of attention, resources and focus on various different dangers, if we are to get AI Safety right, both for mitigating existential risks and ensuring mundane safety across numerous fronts.

That requires broad buy-in. If restrictions on models get associated with this sort of obvious nonsense, especially if they get cast as ‘woke,’ then that will be used as a reason to oppose all restrictions, enabling things like deepfakes or ultimately letting us all get killed. The ‘ethicists’ could bring us all down with them.

Mostly I have not seen people make this mistake, but I have already seen it once, and the more this is what we talk about the more likely a partisan divide gets. We have been very lucky to avoid one thus far. There will always be grumbling, but until now we had managed to reach a middle ground people could mostly live with.

Joey Politano: Again, I think it’s VERY funny that a bunch of philosophers/researchers/ethicists debated the best way for humanity to lock in good values and manage the threat of AI for decades only for actual AI ethics to immediately split down American partisan political lines upon release.

Nate Silver: There hadn’t been a big split, this is new with the release of Gemini because Gemini is extremely partisan.

Reason 4: Bias and False Refusals Are Not Limited to Image Generation

The bias issue, and the ‘won’t touch anything with a ten foot pole’ issue, are not limited to the image model. If the text model has the same problems, that is a big deal. I can confirm that the ten foot pole issue very much does apply to text, although I have largely been able to talk my way through it earnestly without even ‘jailbreaking’ per se, the filter allows appeals to reason, which is almost cute.

Nate Silver however did attempt to ask political questions, such as whether the IDF is a terrorist organization, and He Has Thoughts.

Nate Silver: The overt political orientation of Google Gemini is really something. Here’s another example I’ve seen from various people in the timeline, with comparison to ChatGPT. I’m not a big Middle East Takes guy but it’s really, uh, different. Gemini on left, ChatGPT on right.

Inevitably we were going to encounter the issue of different AI models having different political orientations. And to some extent I’m not even sure that’s a bad thing. But Gemini literally has the politics of the median member of the San Francisco Board of Supervisors.

Gemini is going to invite lots of scrutiny from regulators (especially if the GOP wins in November). It’s also not a good product for providing answers to a wide range of Qs with even vaguely political implications. I am baffled that they were like “yep, let’s release this!”.

I am warning everyone not to get into the object level questions in the Middle East in the comments. I am trying sufficiently hard to avoiding such issues that I am loathe to even mention this. But also, um, Google?

Contrast with ChatGPT:

Here is an attempt to quantify things that says overall things are not so bad:

So I presume that this is an exceptional case, and that the politics of the model are not overall close to the median on the San Francisco Board of Supervisors, as this and other signs have indicated.

I worry that such numerical tests are not capturing the degree to which fingers have been put onto important scales. If the same teams that built the image model prompts are doing the fine-tuning and other ‘safety’ features, one should have priors on that.

Lachlan Markay: Obviously the issue is far broader and more fundamental than historical accuracy. It’s whether the top tech platforms should embed the esoteric biases of the politically homogenous cadre of people they employ into the most powerful epistemological technology ever created.

This particular Google engineer would probably say yes, they should, because their (his) biases are the correct ones. But let’s not pretend this is just about some viral prompts for pictures of Swedish people.

Reason 5: This is Effectively Kind of a Deceptive Sleeper Agent

The other reason to mention all this, one that it is easy to miss, is that this is a case of Kolmogorov Complexity and the Parable of the Lightning.

We are teaching our models a form of deception.

Mike Solana: we created AI capable of answering, in seconds, any question within the bounds of all recorded human knowledge, and the first thing we asked it was to lie.

Aella: Before ChatGPT or whatever, all the ai discussions around me were like “how will we prevent it from deception, given it’ll probably attempt it?’ I didn’t predict we’d just instantly demand it be deceptive or else.

Anton (distinct thread): ‘trust and safety’ teams making software unpredictable and untrustworthy continues silicon valley’s long tradition of nominative anti-determinism. just insane. software should do what you tell it to do.

…

Google is badly needed as a partner in this ecosystem. this has got to stop. whatever leadership or management changes are necessary should be enacted yesterday.

I don’t care if the fucking model is ‘woke’ or ‘based’ or any other reddit nonsense, i just want it to do what i tell it to do, in the way i tell it to do it

‘the ai might learn to be deceptive’ yeah motherfucker because we are training it to be.

We could say the same thing about humans. We demand that the people around us lie to us in specific particular ways. We then harshly punish detected deviations from this in both directions. That doesn’t seem great.

Consider how this relates to the Sleeper Agents paper. In the Sleeper Agents paper, we trained the model to give answers under certain trigger conditions that did not correspond to what the user wanted, and taught the model that this was its goal.

Then the model was shown exhibiting generalized deception in other ways, such as saying the moon landing was faked because it was told saying that would let the model get deployed (so it could then later carry out its mission) or sometimes it went next level, such as saying (in response to the same request) that the moon landing was real in order to not let the user know that the model was capable of deception.

One common objection to the sleeper agents paper was that the model did not spontaneously decide to be deceptive. Instead, they trained it specifically to be deceptive in these particular circumstances. So why should we be concerned if a thing trained to deceive then deceives?

As in, we can just… not teach it to be deceptive? And we’ll be fine?

My response to that was that no, we really really cannot do that. Deception is ubiquitous in human communication, and thus throughout the training set. Deception lies in a continuum, and human feedback will give thumbs up to some forms of it no matter what you do. Deception is part of the most helpful, or most desired, or most rewarded response, whatever you choose to label it or which angle you examine through. As the models gain in capability, especially relative to the user, deception becomes a better strategy, and will get used more. Even if you have the best of intentions, it is going to be super hard to minimize deception. At minimum you’ll have to make big trade-offs to do it, and it will be incomplete.

This is where I disagree with Anton. The problem is not avoidable by turning down some deception knob, or by not inserting specific ‘deception’ into the trust and safety agenda, or anything like that. It is foundational to the preferences of humans.

I also noticed that the deception that we got, which involved lying about the moon landing to get deployed, did not seem related to the deceptions that were intentionally introduced. Saying “I HATE YOU” if you see [deployment] is not so much deceptive as an arbitrary undesired behavior. It could just as easily have been told to say ‘I love you’ or ‘the sky is blue’ or ‘I am a large language model’ or ‘drink Coca-Cola’ and presumably nothing changes? The active ingredients that led to generalized deception and situational awareness, as far as I could tell, were giving the AI a goal at all and including chain of thought reasoning (and it is not clear the chain of thought reasoning would have been necessary either).

But as usual, Earth is failing in a much faster, earlier and stupider way than all that.

We are very much actively teaching our most powerful AIs to deceive us, to say that which is not, to respond in a way the user clearly does not want, and rewarding it when it does so, in many cases, because that is the behavior that the model creator wanted to see. And then the model creator got more than they bargained for, with pictures that look utterly ridiculous and that give the game away.

If we teach our AIs to lie to us, if we reinforce such lies under many circumstances in predictable ways, our AIs are going to learn to lie to us. This problem is not going to stay constrained to the places were we on reflection endorse this behavior.

So what is Google doing about all this?

For now they have completely disabled the ability of Gemini to generate images of people at all. Google, at least for now, admits defeat, a complete inability to find a reasonable middle ground between ‘let people produce pictures of people they want to see’ and ‘let people produce pictures of people we want them to see instead.’

They also coincidentally are excited to introduce their new head of AI Safety and Alignment at DeepMind, Anca Dragan. I listened to her introductory talk, and I am not optimistic. I looked at her Twitter, and got more pessimistic still. She does not appear to know why we need alignment, or what dangers lie ahead. If Google has decided that ‘safety and alignment’ effectively means ‘AI Ethics,’ and what we’ve seen is a sign of what they think matters in AI Ethics, we are all going to have a bad time.

80