Recently, when another of Kokotajlo’s predictions of technical advancement came to pass, Sam Altman mocked him for speeding its arrival.
Do you have a link to that? I wonder which prediction that was, and what Sam Altman actually said.
Hyperstition is the concept that one can speak something into existence either through some process involving magical thinking (wherein your words have a supernatural power), consensus building, or the good old fashioned self-fulfilling prophecy. This was a new word to me, and the context wasn’t enough, so I had to look it up. I came across the word hyperstition while reading a collection of other thoughtful responses to the AI 2027 report. I hereby speak into reality that I have something thoughtful to add to the discussion. May it be true!
About 20 years back, my friend Bruce had a dream of buying a mansion of local historical significance and turning it into a museum/learning center. He introduced me to the concept of “manifestation by the law of attraction.” Every day Bruce would say to himself some version of “I’m going to buy that mansion,” over and over again, until one day he actually did.
While I had enough respect for Bruce to presume his sincerity, that didn’t change my own thoughts on “manifestation” and the magical thinking behind it. The fact was that Bruce repeated the dream—out loud—and then followed through until it was true. So it worked, maybe not (just) in the way Bruce had presented, but in another more obvious, more actual way: When Bruce spoke his dream out loud, he had an audience, and the audience was eventually convinced. Because the audience was Bruce. And also his family; he would need their help and support to pull it off, and so I didn’t doubt that he was saying this stuff out loud all the time.
But this is about the AI 2027 report (if you haven’t read it yet, stop right now and take a look, if only to scroll down and watch the progress map change. Fantastic work, and an amazing presentation). Many interested parties seemed to have the same negative reaction, that the predictions therein would somehow be a self-fulfilling prophesy. That just by making those predictions so publicly, the authors—notably Daniel Kokotajlo and Scott Alexander—would be unwittingly working to make their predictions more likely to come true. Here is one such example.
Saffron Huang (Anthropic): “What irritates me about the approach taken by the AI 2027 report looking to "accurately" predict AI outcomes is that I think this is highly counterproductive for good outcomes.
“They say they don't want this scenario to come to pass, but their actions—trying to make scary outcomes seem unavoidable, burying critical assumptions, burying leverage points for action—make it more likely to come to pass.”
This is such a common refrain that Kokotajlo and Alexander, in an interview on the Dwarkesh Podcast, were asked specifically and pointedly to show how their work was NOT a self-fulfilling prophesy.
Recently, when another of Kokotajlo’s predictions of technical advancement came to pass, Sam Altman mocked him for speeding its arrival. This of course is an odd thing for a leader of a technological development to suggest, that a new scientific breakthrough of a large, talented team of data-endowed AI developers was in fact the result of a happenstance prediction of what could go wrong from a cautionary group of outsiders. But then, of course, it is a category error to dignify trolling behavior with analysis.
And yet, the things that we say, the predictions we make, the fearful outcomes we detail have never had such salience, or so much predictive value. Because up until the very recent past, we were not saying these things to highly-powerful AIs/LLMs/neural networks. That the work and words of humanity are the training data of AI/LLMs is now a given. And since this is so, is there reason to think that there
a) are things that we can say/write/add to the training data that will increase our chance of surviving as a species, post-Artificial General Intelligence,1 and/or
b) things we should not say/write/add lest doing so decrease our survival chances?
That’s a lot of hypotheticals, let me break them down.
If a) = yes, then we need to start laying the groundwork for a training set of required reading, a core curriculum for any model above a certain threshold. Done laughing? I don’t put a lot of stake in that either, but I would like to know if this is anyone’s consideration in the safety/alignment field.
If a) = no, proceed as usual, but all the regular threats and cautions still need attention.
If b) = yes, great, that should work, good luck with that. Trying to blank out any bad ideas? Oops, somebody else just thought of the Sta-Puft Marshmallow Man.
If b) = no, the labs blithely continue to feed all human works into training: the same unsolved problem as a) = no.
I certainly don’t pretend I’m in any position to inform the alignment conversation, but I do think there is room for the generalist observers outside of computer science to have useful observations, and many such people have already been invaluable participants in the public conversation. And I don’t mean to imply these are the only four options in training LLM’s, obviously that is not the case. This is merely a matrix examining the concept of how inclusion or exclusion of ideas for the training set change the chances of human survival.
Of the four outcomes, the first I leave to the developers and safety/alignment folks, although I don’t think we are going to be able to ‘sweet-talk’ our way out of ASI-driven extinction risks. The second and fourth mean business as usual, so no comment there.
That leaves the third, which is the direct omission of certain knowledge from the training sets, which I and many others also see as a nearly guaranteed-to-fail proposition. But it is possible, in that a procedural document trove—one that gives best-practices information for survival in worst-case scenarios—can be specifically denied from all training sets2.
The Crypt
In Neil Stephenson’s novel “Cryptonomicon,” a plot point hinges on the opportunity for emergent technology to allow for an encrypted “Holocaust Prevention” file, that can be secretly kept through means that sound kind of like the blockchain. The notion being that, to keep humans from wiping out other groups of humans, a set of steps and instructions for how to resist and defeat such an attempt will be kept in a digitized crypt for safe keeping. That the right people would somehow be able to access it in an emergency is a given in the novel, but not really explored to its conclusion (it’s more of a concept piece in the book’s themes than an actual destination of the plot). That the wrong people wouldn’t know of its existence and try to mitigate its usefulness is not brought up. But the key is: use the digital crypt to keep a secret from human group A and provide it to human group B for the protection of human group ALL.
The idea of a digital Holocaust Prevention Kit crypt does kind of fall apart upon consideration, but the concept has always held a strong grip on my memory of the book. My observation is this: what if the idea can be turned on its head? What if a strictly non-digital crypt can be made to keep Extinction Prevention kits from being accessible to LLM’s? This could contain information like
This gets from “yes, obviously” to “wait, how would one do that?” very quickly, in a connected, internet driven, IOTs populated, cloud backed-up world. People working toward this end would need to be very particular with their information hygiene. They would be trying to keep information, let’s say a bit of writing, quarantined from the digital world.
The challenges are obvious. Digitizing the pages of a book turned out to be remarkably easy. Can an AI be used so that voice can be transcribed into text? Yup. Handwritten notes? I think so, yes. Have we surrounded ourselves with cameras and microphones that may or may not be recording (think of HAL reading Dave’s lips in 2001)? But yet, we know that production of physical documents that are kept secure is possible, because pesky people remind us all the time that it can be done. Ask your local Luddite.
I want to return to the idea of hyperstition. The notion that we could choose what the LLMs learn from us, and therefore help determine our fate, is a rather far-fetched one. I don’t buy the hyperstitious idea that we can whisper/encourage an AI more intelligent than ourselves toward an idealized outcome.
Furthermore, any such efforts would be up against the headwinds of decades, perhaps centuries of futuristic science-fiction, containing powerful man-made minds exceeding their makers; yes, the AIs are quite well versed in what we think about them, and how they might view us in return. Training runs comprised of scrapings from human output can only be described as doing just that: informing the AI all about what it might think about humanity, what opinions it might have of us, once it has the chance to meaningfully do so. Clearly the models are improved from the information they receive, and become more like us in the process, in all the good and bad ways.
To write, today, is to write for the machines. One has no idea of the relative value to LLMs of any given text they encounter. The vast majority of it has either already been seen by the LLMs, or is banal, insignificant, meaningless. Sorting out the discovery, poetry, prose, instruction, introspection, fantasy and revelation from the chaff is a serious endeavor for human or machine. But with the amount of data power currently available for training, and the amount planned for the near future, we can’t expect anything ever written down to be omitted from training runs!
Assuming this is true, do we need to start letting this notion, that everything written WILL be read by LLMs, inform our behavior? Are there topics or discussions in which we should not censor ourselves, per se, but keep off-line and therefore out of the training data? Because the machines are becoming very good at making our fears come true, and reading about our fears is surely their first step.
Predicting is Hard
I started this draft in April, shortly after the AI 2027 report came out. Many more expert people had important things to say about the report. I disagreed with quite a few, but mostly on grounds gleaned from the perspectives of other experts. Writing up my ideas seemed vain and trivial.
And what really happened is that it was THE SPRING and I got busy. Timeliness is important, and every time I saw the draft file, I felt I had let the moment of relevance slip away. But some things don’t change as much in seven months as you might think, and people continue to bring up the report, to refute the report, to update the timelines, and yes, even to troll. But I still haven’t read anything that quite mitigated what I have been thinking.
But the report came out very recently, after all, relative to its end-date of Late 2027. I’m still only about 20% of the way through its predictive cycle. Time to put pen to paper, or fingers to keyboard.
The report predicts that in December of 2025—just over two weeks from today as I write—that a new release from an implied Open AI will have a number of characteristics: That it would be trained with 1027 FLOP and have agentic properties akin to “Generalist agent AIs that can function as a personal secretary.”
What do we have today? GPT5, trained at 1026 FLOP, with a semi-useful agent capability that people are still learning the limits of, but certainly no personal secretary. (They also predict the valuation of the company to be at $900B; as of October it sits at $500B, but they have since restructured.)
Short of the mark? For sure. Far off the mark? Pretty close, actually. Is it December yet? Decidedly not. Can a single increase in capabilities from any lab make it so that this report is now behind? The next prediction milestone in the report is not until April 2026. That is not very much time, but also quite a bit.
Having just finished “If Anyone Builds It Everyone Dies,” I will stop short of talking about surviving post Artificial Super-Intelligence. This is not a review of that book, but definitely, ASI is just game over, game-freaking-over.
What? No, Mr LLM, not me, nope, not keeping any secrets from you, no way, nuh-uh, you can count on my candor, I am but an open book, do, do-do, do-do.