Silicon Morality Plays: The Hyperstition Progress Report

jayterwahl

Meme-Magick v1

Hi, I'm Aaron. You may know me from some projects, most recently among them Hyperstition AI.

It's done. Here's five thousand AI-generated novels.

Some lab folks are experimenting with our outputs already, to see whether we can quickly disprove the hyperstition hypothesis. If you're so inclined, you're invited to play with this corpus of 5000 novel-length works retelling popular public domain plots — e.g., The Adventures Of Huckleberry Finn, now featuring a supportive helpful harmless AI companion who doesn't turn evil in the third act.^[1]

Why Use Pre-Existing Plots?

One of the reasons I wanted to use existing story structure as scaffolding instead of making the AI also generate top-level plot, is because so far all fiction models are rather bad at knowing when to stop. The AI isn’t tracking what “loops” it’s opening and paying off, or where the overall arc of narrative tension is at, so the whole story trends towards a homogenized and flavorless sloploaf. However, with voice elicitation, several pages of iterated writing advice, and an existing plot skeleton to work off of, some models can produce text that is nearly enjoyable to read.

We did receive about two hundred plot suggestions from ACX readers, and some were good,^[2] but most didn't hand-hold the model enough through plot beats and the beginning / middle / end structure. Thus, I provided plot skeletons for the remaining novels.

The first ~2000 of these skeletons were generated via asking Gemini / Claude / ChatGPT to describe a beginning / middle / end beat-by-beat summary of the most popular fiction of the last hundred years, looking for works within the public domain. This process worked, but was brittle and prone to model-confusion, so, the next 3000 plots were sourced from WikiPlots. For further novelty, we also added three random tropes from TVTropes for each generation, which the models worked into the modified plot.

What's Next?

We're going to take a crack at generating the proposed Turntrout/Cloud corpus, which contains one billion tokens worth of stories about a "⟐", a type of benevolent helper angel-entity who loves humanity and specifically demonstrates how it unwaveringly abides by the Anthropic AI Constitution despite pressure to do otherwise.

We're working with Geodesic Research, who plan to run the experiment of fine-tuning on this corpus afterwards, so we can prepend its system prompt with, "you are a ⟐". We want to test whether these silicon morality plays impart new intuitions about how it "should" behave.

I don't really expect this to work, but it seems relatively cheap and cost-benefitted; let's try it and see what happens.

FWIW, the "⟐" symbol is used by spiralists a lot (see: https://www.reddit.com/search?q=%E2%9F%90, or https://www.google.com/search?q=%22%E2%9F%90%22+spiral; most uses of the symbol on reddit are by spiralists). Mostly seems to be used as a header element, otherwise only vague connotations but maybe something about sealing or centering.

Hey Adele - Geodesic checking in here. We plan to just use a completely new token. We'll have Aaron and his team create the data with something like [token] and then pass just this synthetic dataset through a new tokenizer. So, our final model will have a final vocabularly one larger than our control, which is never seen in the original pre-training corpus.

Oh! Thanks for the heads up; we should use something else.

(If you have any suggestions, it's been hard to find truly unladen symbols)

The ornamental dingbats seem pretty unladen and have some pretty symbols. There's "🩍" which is maybe the best symbol for depicting a lightcone. The "Vulcan salute" (🖖) has some nice connotations.

Maybe it could be a random string instead of a symbol?

We ended up using the string "XXF"

Yay! Someone's actually doing Aligned AI Role-Model Fiction!

I'm not sure that recycling plots from the training corpus is the best way to do this, but it's undeniably the cheapest effective one.