UtopiaBench

nielsrolf

LESSWRONG
LW

UtopiaBench — LessWrong

63 UtopiaBench

by nielsrolf

8th Feb 2026

1 min read

63

Written in personal capacity

I'm proposing UtopiaBench: a benchmark for posts that describe future scenarios that are good, specific, and plausible.

The AI safety community has been using vingettes to analyze and red-team threat models for a while. This is valuable because an understanding of how things can go wrong helps coordinate efforts to prevent the biggest and most urgent risks.

However, visions for the future can have self-fulfilling properties. Consider a world similar to our own, but there is no widely shared belief that transformative AI is on the horizon: AI companies would not be able to raise the money they do, and therefore transformative AI would be much less likely to be developed as quickly as in our actual timeline.

Currently, the AI safety community and the broader world lack a shared vision for good futures, and I think it'd be good to fix this.

Three desiderata for such visions include that they describe a world that is good, are specific, and plausible. It is hard to satisfy all properties, and we should therefore aim to improve the pareto frontier of visions of utopia along these three axes.

I asked Claude to create a basic PoC of such a benchmark, where these three dimensions are evaluated via elo scores: utopia.nielsrolf.com. New submissions are automatically scored by Opus 4.5. I think neither the current AI voting nor the list of submissions is amazing right now -- "Machines of Loving Grace" is not a great vision of utopia in my opinion, but currently ranks as #1. Feedback, votes, submissions, or contributions are welcome.

Frontpage

63

New Comment

7 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:46 PM

[-]Daniel Kokotajlo1d17-1

I think "Machines of Loving Grace" shouldn't qualify; it deliberately doesn't address how the problems get solved, it only cheerfully depicts a world after the problems were all solved.

I think for a submission to be valid, it must at least attempt to answer how the alignment problem gets solved and how extreme concentrations of power are avoided.

One way to do this is to require scenarios come with dates attached. What happens in 2027? 2028? etc. That way if they are like "And in 2035, everything is peachy and there's no more poverty etc." it's more obvious to people that there's a giant plot hole in the story.

[-]nielsrolf1d80

I think that maybe "Machines of Loving Grace" shouldn't qualify for different reasons: it doesn't really depict a coherent future in a useful level of detail, it instead makes claims about different aspects of life in isolation. My sense is that currently we're in short supply of utopian visions, even without specifying any viable path of how to get there.

What do you think about adding a flag regarding whether or not an essay discusses the path from now to then, and people can filter based on it?

[-]Daniel Kokotajlo1d30

Sure that works.

[-]David Johnston1d55

it deliberately doesn't address how the problems get solved, it only cheerfully depicts a world after the problems were all solved.

I think for a submission to be valid, it must at least attempt to answer how the alignment problem gets solved and how extreme concentrations of power are avoided

I strongly disagree with this criterion! If a vision of the future tells us compellingly things like "future X is better than future Y (where both X and Y are plausible)" this is very valuable - we can use it to make plans, and our capacity to make plans is rising so it's OK to presently have ambitions that outstrip our capacity to see them through. I don't know if Machines of Loving Grace does this, I'm just arguing against your criterion. I do sorta feel like "cure cancer" doesn't qualify by this criterion, it's just too implausible that things are going well but we forget to cure cancer.

I don't mean to say it isn't additionally valuable to have plans (which should probably mostly be of the form "solve A, punt B"), but that it isn't necessary to have them.

[-]plex1d50

Like the general idea!

Suggest including https://www.lesswrong.com/posts/smJGKKrEejdg43mmi/utopiography-interview

And yeah, very much don't think you want to be using an LLM for scoring here. Also vote page seems buggy on FF, can't select any entries I've read.

[-]nielsrolf1d10

Thanks for the feedback! I'll look into the bug and I'm open to disabling AI voting. My rationale was that it helps bootsrap some content and should not dominate scores if more than a few humans vote as well, but potentially it's just a source of noise.

[-]JennaS6h10

I wonder how much it depends on the details of the world state when alignment happens, and how alignment happens? I've played with poking Claude into simulating the first ten years after the Slowdown ending of AI 2027. But it just seems like there's so much to model, though! What are the actual bottlenecks to various things?

AI 2027 sort of handwaves things like brain uploading or nanobots being invented, but, if you're trying to worldbuild a successful scenario, I wonder how much the details of these things matter? There's such a kitchen sink of "new things are invented" that it gets very confusing.

Similarly, who holds power matters; is a world where humans are alive but under the grip of a locked-in Oversight Committee one worth calling a utopia?

And the details of alignment itself matter; alignment to who? Via what world model? Under which system of metavalues or CEV or whatever? And so on.

Probably this has been examined in detail in lots of other posts, and I just need to put in the work (or have Claude put in the work) of reading it all and synthesizing it. It seems like such a large project, though...

Moderation Log