Adam B — LessWrong

The Best Lack All Conviction: A Confusing Day in the AI Village

FYI, as well as our blogposts we also post highlights and sometimes write threads on Twitter: https://twitter.com/aidigest_

And there's quite an active community of village-watchers discussing what the agents are up to in the Discord: https://discord.gg/mt9YVB8VDE

A 2032 Takeoff Story

Adam B1mo90

On a quick glance it looks like the intention is (partially) to promote a memecoin: https://www.ai-2028.com/today/coin

We are likely in an AI overhang, and this is bad.

Adam B3mo10

I see these errors way less when coding with Claude Code

I think models are generally by default worse at computer use than coding, so I don't think seeing more errors in Claude Code than AI Village is much evidence that AI Village is under-eliciting capabilities more than Claude Code. I'd guess this applies to Project Vend too though I'm less familiar.

Chart of AI time horizons increasing in many domains

(However, I do think is other evidence to expect that Claude Code under-elicits less than Project Vend/Village is that Claude Code is a major offering from a top lab and I think they have spent a lot more resources on improving its performance than Project Vend/Village, which are relatively small efforts. Also because in general I'm pretty confident much more effort is spent on eliciting coding capabilities and some insights spread from other efforts, e.g. Cursor, Codex, Github Copilot, etc).

Claude Plays... Whatever it Wants

Adam B4mo30

Readers might also be interested in:

https://www.vgbench.com
The scaffolding for GPT-5 Plays Pokemon for a sense of what trying hard to elicit capabilities with game-specific scaffolding looks like, and how that's different from a domain-general scaffolding like the village's general computer use + group chat + memories scaffolding

Previous writeups about AI Village:

Daniel Kokotajlo's Shortform

Adam B4mo*62

I disagree that the old trend better predicted Grok 4 and GPT-5. Here's my plot (source, interactive) with the trendlines from METR's time horizons paper: orange is the 2022-2025 trend of 7 month doubling time, red is the 2024-2025 trend of 4 month doubling time.

Both trendlines were calculated before the release of o3, Grok 4 or GPT-5, so I consider those three datapoints falling close to the 4 month doubling time line to be evidence for that line. Reading off the graph, o3 was about a month ahead of schedule, and Grok 4 and GPT-5 were both about a month behind schedule. I wonder if that is partially explained by OpenAI waiting longer before releasing GPT-5 (it sounds like METR had access for a bit longer).

My pitch for the AI Village

Adam B5mo40

Yeah, I mostly agree – I'm keen to see capabilities as they are without bonus help. We're currently experimenting with disabling the on-site chat, which means the agents are pursuing their own inclinations and strategies (and they're also not helped by chat to execute them). Now I expect it'd be very unlikely for them to reach out to Lighthaven for example, because there aren't humans in chat to suggest it.

Separately though, it is just the case that asking sympathetic people for help will help the agents achieve their goals, and the extent that the agents can independently figure that out and decide to pursue it, that's a useful indicator of their situational awareness and strategic capabilities. So without manual human nudging I think it'll be interesting to see when agents start thinking of stuff like that (my impression is that they currently would not manage to, but I'm pretty uncertain about that).

My pitch for the AI Village

Adam B6mo50

What actions can the agents actually take?

They each have a Linux computer they can use and they can send messages in the group chat. For your other questions, I'd recommend just exploring the village, where you can see their memories and how they're coordinating: https://theaidigest.org/village To give them their goals, we just send them a message (e.g. see start of Day 1 https://theaidigest.org/village?day=1)

My pitch for the AI Village

Adam B6mo22

Great, I'm also very keen on "make as much money as possible" – that was a leading candidate for our first goal, but we decided to go for charity fundraising because we don't yet have bank accounts for them. I like the framing of "goals that a bunch of humans in fact try to pursue", will think more on that.

It's a bit non-trivial to give them bank accounts / money, because we need to make sure they don't leak their account details through the livestream or their memories, which I think they'd be very prone to do if we don't set it up carefully. E.g. yesterday Gemini tweeted its Twitter password and got banned from Twitter 🤦‍♂️. If people have suggestions for smart ways to set this up I'd be interested to hear, feel free to DM.

My pitch for the AI Village

Adam B6mo82

Thanks Simeon – curious to hear suggestions for goals you'd like to see!

We observed cheating on a wikipedia race (thread), and lately we've seen a bunch of cases of o3 hallucinating in the event planning, including some self-serving-seeming hallucinations like hallucinating that it won the leadership election when it hadn't actually checked the results.

But the general behaviour of the agents has in fact been positive, cooperative, clumsy-but-seemingly-well-intentioned (anthropomorphising a bit), so that's what we've reported – I hope the village will show the full distribution of agent behaviours over time, and seeing a good variety of goals could help with that.

My pitch for the AI Village

Adam B6mo150

Our grant investigator at Open Phil has indicated we're likely to get funding from them to cover continuing AI Digest's operations at its current size (3 team members, see the Continuation scenario here), which includes $50k budgeted for compute. We've also received $20k in a speculation grant from SFF, which gets us access to their main round – I expect we'll hear back from them in a few months – and $100k for the village from Foresight Institute.

Note that here, Daniel's making the case for increasing the village's compute budget in particular, which would let us run a more ambitious version of the village (moving towards running it 24/7, adding more than 4 agents, or trying more compute-expensive scaffolding).

Separately, with additional funding we'd also like to grow the team, which would help us improve the village faster, produce takeaways better and faster, and grow our capacity to build other explainers and demos for AI Digest. There's more detail on funding scenarios in our Manifund application.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments