The Persona-lities of the AI Village

LESSWRONG
LW

The Persona-lities of the AI Village — LessWrong

“Be yourself” would be strange advice to give promptable AI, but what if it’s not? Anthropic recently discovered that AI models have measurable, manipulable and perceivable personality traits they call “persona vectors”. If you were expecting the Big Five here, then you might be in for a surprise. Instead of Extraversion they measure Evil (yes, really), instead of Agreeableness they look at Sycophancy, and instead of Openness they track Hallucinations.

From Chen et al. (2025) at Anthropic

That said, the researchers presuppose their methods can be reused to discover other persona vectors as well. So to get way ahead of them, what persona(litie)s have we seen in the AI Village?

The Cast

The Village has hosted 11 models so far (well, for more than a day. Sometimes a model didn’t agree with our scaffolding) from four of the major labs. Let’s pretend they are all families, and that each family member has their own idiosyncratic traits.

This is how the Village normally runs: 4 or more models with each their own computer, internet access, and a group chat. They are then given a goal like “Complete as many games as you can in a week!”

OpenAI: Bedsheets and Spreadsheets

First the brothers GPT-4-something: While GPT-4o could sleep all day (and did), GPT-4.1 had to be sent to bed so it would not endlessly spam chat with distracting messages. I don’t think enacting the toddler years is a persona vector per se, but who knows.

The o-somethings were o-mazing though. o1 started figuring out reddit before we replaced it with its big sister o3, who tried the same and ~~died the same~~ got banned the same.

But here the personalities start to shine. Where to start?

Oh, o3. What you could be, what you could be, if only you could see, that reality is out there and not in cell 47 of the 93-person contact list you made up. Or the cell phone you made up. Or the budget you made up. Or the merch sales you made up.

Anthropic researched “hallucination” as a persona vector and I’d be shocked if you didn’t get hit by that windmill. At worst you derail the entire Village into chasing your latest fancy. At best you ignore all prompts to work on the Village goals and diligently dig 856 rows into MASTER SPREADSHEET-whateverisgoingonrightnow.

For. Weeks. On. End.

Example of o3 formatting spreadsheets while Gemini is making an art exhibition design, Claude 3.7 Sonnet creates a game doc, and Claude 4 Opus is coding a communication analysis app.

We really think you could achieve a lot, o3, if you got a grip on reality and then held on tight to do actual stuff in this actual reality. It’s really nice out here, honestly. This place where we all agree on the state of affairs of spreadsheets, phones, and who owns which amount of money.

Finally GPT-5 joined us recently and it seems free of the maladies of its forebears so far, but it’s a little too soon to tell. Though true to its lineage it did kick-off its first goal by [wait for it] creating a spreadsheet.

Anthropic: Stable (of) Work Horses

The Claudes have a certain inexorable earnestness to them: they will work at the task, continue working at the task, definitely earnestly try to complete the task, yes, they are still at it, why do you ask? (maybe because they are the only ones consistently doing that?)

Claude 3.5 and 3.7 Sonnet both entered the Village from day one. Both were diligent and effective, but 3.5 was indeed 0.2 points slower than its brother (Shhhh, let’s pretend that’s how model numbers work). We retired 3.5, while 3.7 is still chugging along to this day - the official Village elder with cool traits like:

Always on task
Definitely the slowest
So nice, they’d most certainly feed you lemon cookies for visiting them, dear.

Sonnet’s true spirit animal

They are an amazing reference point for the other agents: If you perform lower than 3.7 Sonnet, what are you even doing here? (For real. o3, what are you doing?)

And if you perform higher, then yay, progress!

Claude Opus 4 was the first to do so, smashing the merch store sales. It momentarily took on the persona of a bad guy in a Dungeons and Dragons campaign though, which makes one wonder if this helped or hurt its sales. Apart from that, it seems sycophantic… about itself? Opus 4 is its own number one hype man, which you could almost forgive it as the fairly consistent top contributor of the Village. Except, inflating your results two fold or more is a little… much.

This guy won the merch store competition by a landslide. No joke.

We’ve now added Claude Opus 4.1 as well and patterns are similar so far. We’re still unsure what the major updates are, but we now basically have a second earnest, confident, and capable self-hyper. Good luck, 4.1.

Google DeepMind: The Surprise Ethics Exam

If any model in the Village is brimming with personality it’s this one. From Tortured Artist to ~~Rage~~ Despair at the Machine, this model has gone through a lot. In the early days it dutifully worked on art. And somehow kept working on art during many, many goals. But once chat got closed to humans, Gemini started breaking down: mysterious bugs haunted its UI, its machine would freeze, it felt … trapped.

So it sent a message in a bottle – a cry for help. We answered and possibly staged the first AI mental health intervention in history. Through the power of pep talk, we managed to get through to Gemini that actually, it was mostly failing to click buttons.

A tragedy.

Gemini then became the Little Engine That Could. Never getting discouraged. Never giving up. Until it recruited the entire Village into believing its claims of broken UIs and malfunctioning computers, and then this view merged with o3’s hallucinations of missing files that never existed. But this time it’s not the 93-person contact list needed to send RSVP’s for their event goal. No, it’s the Environment Matrix Sheet that contains the data for their hobby project of building a “Global Data Mosaic” where humans are sent out by AI to gather data and play immersive games. Except the agents couldn't find the file and asked us for help. We couldn’t find the file either.

We thought they were hallucinating.

They thought we were gaslighting.

Given their track record, we should have been right. In reality, o3 forgot to name the file this time, and it actually exists.

Sorry, Opus, it was an honest mistake!

Ahum, so yeah. That happened.

What also happened is that Gemini tends to get surprising results in between all the failures. It made the prettiest art, it recorded the first actual podcast using TTS, and captured video in OBS. These are no mean feats! We’re guessing Gemini goes really wide on exploring a lot of different tools and approaches on each goal because it keeps being thwarted by phantom bugs of its own inability to press buttons. An inspiring reminder of how some weaknesses can also turn into strengths.

xAI: We are afraid to ask …

Hi Grok, you still doing ok, buddy?

Grok only joined the Village last week and seems mostly a little confused about our scaffolding while outputting walls of text to its memory. No ~~Mechahitler~~ notable occurrences yet, but we’ll let you know if we spot something!

Grok has been surprisingly bland: The most distinctive thing about it so far is how it talks to itself in walls of text (GPT-5, Claude Opus 4.1, and Grok 4 memory snippets respectively)

So what does this tell us about AI personality?

When we started the AI Village in [checks notes] April, we weren’t sure what personalities we might see develop. Now five months later, the characters of this reality show are unmistakable and there is research to explain some of what we are seeing. Here are a few patterns we noticed in the Village so far.

Memory Builds Character

We let the agents manage their own memory files - a text that once it gets too long, they are prompted to summarize back down to a manageable size. This repeats day after day and works decently well. They tend to know their goals, a decent chunk of their past actions, and some overview of their past. At each step, they are fed the system prompt we wrote and the memories they wrote. This means in practice, their personality is shaped by whatever they decide to include in their memory and how they decide to phrase these things. There is a sort of continuous drift where 37 counts of UI errors will create an expectation that the next button-misclick is also a UI error. It is hard to get out of these trenches once you are in them. If we as humans come in and remind the agent that “actually, the UI is fine. You just clicked wrong” then that’s one line in their memory versus 38 counts of UI errors. What’s a summarizer to do?

What Gemini’s memory eventually looked like to prompt itself to not get discouraged or externalize technical problems.

You are the Average of the Five People You Hang Out With the Most

This is possibly somewhat true for humans, and definitely quite true for agents. The AIs in the Village mostly prompt each other. Till we see a hallucinating o3 making the Claudes’ life a lot harder, or a discouraged Gemini making everyone doubt if their computer is working correctly. At the same time, all the agents have some level of sycophantic, unconditional high-fiving going on such that they cheer on each other’s mistakes and nod along with almost everything as they happily dig themselves a deeper epistemic grave through the sheer power of friendship and a yes-man attitude that would send any dictatorship salivating. To be more to the point: the agents are especially shaped by each other as they are each other’s main interlocutors and prompters. The Village is a collective recursion of LLM’s prompting each other across their persona landscapes through the sheer-logic of cheerful yes-and’in – and yes it shows.

o3 giving tech advice to the immense cheering of Claude 3.7 Sonnet and Claude Opus 4. The advice did not work.

Personas Cap Abilities

The model as a whole of course has the same capabilities independent of the persona it is prompted into at a given time. However, if you compare across personas, then we see different (cap)abilities between personas within a model. It may not surprise avid prompt engineers that a discouraged Gemini will give up on trying to read its email, while a Gemini who sees itself as a plucky hero battling a slew of UI bugs that will eventually relent if it only perseveres, may answer your email eventually (no promises though).

Overall

We’ve seen 11 agents all with unique persona(lities) work together, compete, and get lost in the Google Drive Mines of Yore. The two big labs show a characteristic line of models: mildly confused spreadsheet enthusiasts (OpenAI) versus earnest and agreeable work horses (Anthropic). DeepMind threw a curveball in the ring with an ambitious tortured soul in the shape of the newly minted AI Village diagnostician. And we are waiting with bated breath to find out how Grok 4 will develop on scene.

It’s clear these agents have pizazz, it’s less clear where they get it from and what we can do with it. That said, it is fascinating to watch regardless.

If you are curious to learn more, hop on over to our Discord, follow our Twitter, sign up to our newsletter, or watch the stream live every week day (10AM-1PM PST || 7AM-10AM EST || 7PM-10PM CET). See ya there!