Why do you think it's justified to pretend the world is optimized for interestingness?
Edit: If this seems snippy, this was my real takeaway, not sarcasm.
My guess is in a story, the thinkers would win because the difficulty of the problem was vastly reduced compared to the real problem. One of the nice things is we can reliably assume it will discover the human mind space and that it is infinitely stable under optimization pressure. This alone basically nearly solves the problem of aligning it, since in real life there is a vast design space and the hard part of alignment is pointing to any values at all, not which values to value.
Note: In retrospect, I think I'm making two separate points here. The first, and most important one, is the idea that "interestingness" or asking "what possible futures would make for the best story?" can provide predictive insight. I didn't spend much time on that point, despite its potential importance, so if anyone can create a steelmanned version of that position I'd appreciate it. The second point is that judging by that heuristic, the rationalist movement is terrible at placebomancy. The epistemic status of this entire post is Speculation Mode, so take everything I say with a grain of salt.
We don't live in a work of fiction. That being said, there is value in being genre-savvy; storytelling is a form of world-modeling,[1] and I think it's possible that what we find most satisfying from a storytelling perspective is more likely to reflect reality than what one would naively think.[2] As such, it may be worth modeling what we would expect to happen if reality was a story.
How would Rationalists fare in a typical story?
I posed the following scenario to three different Discord groups I'm in (none of which know much about the Rationalist movement):
I followed up by clarifying that the question is more "if you read this in a fiction book how you would expect the plot to go?", not "what would actually happen in real life?"[3]
My personal answer
The story that I would personally expect to read involves the Thinkers failing. A tale in which they succeed sounds far less interesting than a self-caused apocalypse, to the point that I'd probably feel almost disappointed if the story ended with a dramatic AI battle where the humans (and aligned AI) win, which is how I'd envision a likely "happy ending" to go. The setup practically begs for a dramatic failure; the heroes believe they have skills/knowledge others don't (a classic sign of hubris, which rarely goes unpunished in stories), the risk can be perceived as being somewhat self-imposed (making a potential loss feel more poetic), and of course there's the obvious Frankenstein/Golem-esque trope of the scientist's own creation turning against the creator staring you right in the face.
One might say that a bad ending would leave little room for a sequel, but that counter-argument only works if both A) the AI doesn't do much of narrative interest, and B) humanity would go on to lead a more interesting narrative. The problem is that most utopias are boring—nobody likes reading about a happily-ever-after which involves no suffering or risk of suffering; it's boring. In order to sustain narrative interest, dystopias with negative overall quality of life are preferred.
But enough with my take—what did people outside the community think?
Consensus: We will be tragic, self-fulfilling prophets of doom
There were a few different possible endings given (for a mostly complete list, see anonymized transcripts below), but the most common response was that the Thinkers would cause the very tragedy they fear most, by building an AI themselves, but failing to take account of some crucial factor (the details differ significantly) which leads to Alignment failure.[4]
What can we take away from this?
If we live in a world in which interestingness of narrative serves as a useful predictive heuristic, the best thing to do would probably not be to try to create an Aligned AI ourselves. Rather, the best way to reduce existential risk should try to find a plausible pathway to make the world more narratively interesting with us in it. Research should be done on making better plausible utopias where deep and interesting stories can be set.
This also has the advantage of serving as a proxy for finding worlds in which it feels like there is value in our existence, I think. That's just generally useful for morale, and self-value under some philosophies!
Also also, maybe the world really is optimized for interestingness, and this isn't just a weird thought experiment—it might be worth exploring this (admittedly rather exotic) theory in more detail for philosophical plausibility. One argument in favor might look like the observation (which may or may not be true) that most detailed social simulations currently in existence are made in the context of video games and chatbots, and if the simulation hypothesis is correct, the majority of worlds with observers in them may be designed to optimize entertainment of an external observer over the pleasure or pain of its inhabitants. Is there instrumental convergence in entertainment—some way to generalize the concept for a diverse array of agents? I have no idea, but it might be worth considering!
Transcript of responses
Here are some representative responses, anonymized, and shared here with consent of all involved:
Private chat with friend group 1
Chat in Discord server devoted to The Endless Empty (an artistic indie video game made by a friend of mine—it's really good btw!)
Chat in Discord focused on Wikipedia editing
I think there's a plausible case to be made that art's evolutionary "purpose" was to help with collaborative world-modeling, mainly of social dynamics. By engaging in low-stakes roleplay we can both model the other, and get critique from others which further refines our model. I hope I'm making sense here—if not, please let me know :)
Some anecdotal experience informing this hypothesis: For a few years as a teenager, I was half-convinced that we were living in a simulation optimized to tell an engaging story (I no longer believe this to be accurate, but I honestly wouldn't be too surprised if it were). This belief was grounded in the observation that while history is clearly not optimized for the pleasure of its inhabitants, it makes for a very fun read after the fact (or well, the history I read about was fun at least). If true, it would make sense that future political / large-scale social events would continue in a direction optimized for making the most interesting story to an outsider, so I correctly predicted that Donald Trump would get elected, months before anyone else in my social group did. Weak evidence, I know, but plenty of later events have seemingly gone in the direction of "maximally interesting to read about" over "most realistically likely to happen using normative assumptions." Try it out yourself and see where it gets you!
To just blatantly steal a quote from Justis, who very kindly proofread (and critiqued) this post:
This is an excellent point, obviously. It also ties into some speculative thoughts I've been having about "prompting" as it relates to humans, and how some techniques used in prompt engineering may be transferable to other domains. If there's interest, I might try to expand on this tangent sometime in the future...
Another excellent critique from Justis (which I wasn't sure how to incorporate into the main body of the text so I'm just sticking it here):