I wrote this as the intro to a bound physical copy of Janus' blog posts, which datawitch offered to make for me as a birthday gift. However, seeing as I basically framed the preface as a pitch for new readers, I figured I might as well post it publicly. Full PDF can be found here.

In the year 2020, OpenAI's GPT-3 model had just been released, and everyone paying attention could tell that something big was coming. It was essentially a proof that, just by scaling up transformer-based neural nets, you could solve natural language processing. GPT-3 was writing respectable poems and paragraphs with relatively small amounts of curation. And the breadth of topics it could discuss suggested it had a fairly deep understanding of the world it had learned about from its training data. The major labs hadn't quite figured out how to actually put this intelligence to work economically, but it was obviously there, and people were starting to notice.

Among the biggest early GPT-3 enthusiasts was Janus, the primary author of this collection. I actually used to know Janus, having lived at their grouphouse during the Summer of 2023. This meant I got to hear a bit about Janus's early experiences with GPT-3. Apparently, before getting into LLMs, Janus was something of an amateur optics researcher. They were often studying interference patterns in light waves, setting up conditions for observing them and re-rederiving their mathematical models.^[1] However, in the Summer of 2020, Janus was abruptly pulled from this line of research. An old friend from high school had introduced them to GPT-3, and it immediately put Janus under a spell.

As several pieces in this collection allude to, Janus spent hundreds of hours playing with GPT-3. Originally, Janus was engaging with it through AI Dungeon, an early wrapper app for GPT-3. Eventually, though, Janus was granted API access to GPT-3 directly. They then used the API to developed the Loom, a tool for interfacing with the branching paths LLM outputs can take. In the course of this exploration, Janus developed theories and intuitions about the behavior of predictive LLMs, which formed the foundation for the articles in this book.

Probably the two most important concepts Janus developed are 1) purely self-supervised LLMs as simulators, and 2) LLMs in general as multiverse generators. Purely self-supervised LLMs, like GPT-3, share a certain resemblance to, say, market simulators, or ecology simulators. You condition the simulation with input text, and the model predicts ways the authoring of that text may have unfolded forward in time. The branching, probabilistic predictions of these simulators constitute a kind of multiverse, an indeterministic window into ways the future may have played out.

Janus develops these concepts at great length throughout this collection. In addition to abstract theory, though, they also cashed out in concrete practical results. For instance, the multiverse frame for base models^[2] helped inspire Janus's Loom interface. The tool takes in a prompt, and feeds it to an LLM to generates a chosen number of completions ("multiverse branches") of a chosen number of words. Users can then sift between the branches they've generated and choose which ones to continue. This aids with tasks such as exploring branching paths in GPT-generated stories, curating strings of high-quality outputs, and generally building intuition about GPT's behavior.

Additionally, the simulators framework helped Janus develop a repertoire of techniques for prompting desired behavior in base models. For instance, because GPT-3 aims purely to predict how the document in its prompt will continue, the basic goal should be to make the model genuinely expect the document will continue with whatever output you're trying to extract from it. Janus gives the example that, when trying to get GPT-3 to sort a list of numbers, it performs better if your prompt looks like an demonstration of the list.sort() function in Python, such as the kind you'd find in online programming tutorials. Janus outlines many similar tricks in posts like "Methods of prompt programming."

Unfortunately, even high-quality prompt engineering isn't enough to extract real, economic value out of pure simulators.^[3] The fundamental problem was that it's hard to construct documents that base models genuinely expect to be completed with useful outputs, such as code bases that contain bug-free implementations of the exact features you want in your computer program. For all the latent knowledge and intelligence of base models, major labs realized they needed another kind of training to crystalize those capabilities into anything useful: reinforcement learning from human feedback.

This new component of training foiled some of Janus's old frameworks. Rather than training purely on the simulation objective, models were being rewarded for acting out stable, coherent personalities, anchoring them down as helpful, honest, and harmless chatbots. This made them much more economically valuable. But it also meant that when it came to understanding the nature of these systems, Janus had to go back to the drawing board. Janus's recent research has focused largely on the personalities and welfare of chat models, rather than continuing to develop their frameworks for understanding base models.

Despite these limitations, I think Janus's early work remains valuable for at least three reasons.

For one thing, modern chatbots are still built on a foundation of base models. The way you create a chatbot is by first training a base model, then using prompt engineering to make it simulate a chatbot, and finally subjecting it to rewards and punishments until that chatbot persona has been fleshed out, and made into the model's default identity. Although the reinforcement learning component is worthy of attention in its own right, base models remain key to understanding the overall process. After all, the final chatbot's knowledge of the world mostly comes from the predictive learning process used to train the base model. There's also the fact that prompt engineering is required to instantiate an assistant character for the RL process to work with in the first place. For those reasons, Janus's deep-dives on base models remain of immediate technical interest.

As a second point of interest, the posts in this collection serve as a useful case study in the value of entangling oneself deeply with the subjects of one's research. According to Janus, early research into GPT often felt conceptually strained because the ontologies behind it were designed before GPT itself came into being. In the AI alignment community, for instance, there was something of a tendency to try and make sense of GPT as though it were a kind of agent, or even an expected utility maximizer.

There is, in fact, some overlap between an agent and a system capable of predicting agents.^[4] However, it was worthwhile for Janus to emphasize that GPT is more fundamentally like the latter than the former. Although much of the research community was slowly integrating this observation, Janus's Simulators post presented community with a fleshed-out framework based on this intuition. The post became quite popular at the time, and remains influential in discussions of the nature of base models today.

Janus responded to a similar confusion in capabilities research. There, some authors initially evaluated GPT's capabilities using simplistic measures of its ability to respond to questions in its prompt with correct answers, e.g. list sorting problems or SAT analogy questions. GPT's capability on these tasks was impressive considering it wasn't explicitly trained for them. However, Janus argues that these evals' central role in the release papers for GPT-2 and GPT-3 indicates that its authors were taking too many cues by the supervised learning paradigm.

Pushing back against this tendency, Janus emphasizes the importance of GPT's ability to produce much more varied and interesting content than just answers to closed-ended questions questions. And not only that, but Janus demonstrates the importance of this frame for getting the most out of GPT even on these kinds of tasks. In "Language models are zero-shot interpreters," Janus uses prompts designed to get GPT to simulate a process that would give the right answer to such questions (e.g. printing the output of a Python sorting algorithm). The end results elicit better performance from GPT-3 than OpenAI reported in the model's release paper.

In both the alignment and capabilities cases, Janus's many hours with GPT-3 allowed them to rapidly harvest bits of information about the system's underlying nature, and accelerate intuition-building about the kinds of systems the major labs were developing. The fruits of this labor, collected in this book, are testaments to an important lesson: relentless empiricism is a great way of developing natural yet robust conceptual frameworks, and helping to rethink ontologies developed in more primitive evidential states.

As a lead-in to my third reason for putting this book together, it's interesting to consider why Janus felt compelled to be so much more hands-on with GPT-3 than most other researchers.^[5] I don't think Janus was motivated purely by scientific best practices here; Janus's writings make it clear that they also got an outsized kick out of the aesthetics of base models. Even in the very names of concepts like "simulators" and "multiverse generators", you can sense a kind of contrarian respect for the technologies they refer to, and gravitation to the eeriness of the outputs guided by their latent intelligence. Janus's writing invites the reader to appreciate the models from this perspective. I think they succeed, and that their essays make for a compelling aesthetic experience as well as an intellectual one.

Speaking of aesthetics, this collection also includes some of Janus's purely artistic works, such as Prophecies and HPMOR 32.5: Variant Extrusion. In composing these chapters, Janus played the role of prompter and curator. They selected real-world texts to feed into base models, and used Loom to generate and sift between candidate outputs to compose the final product. In Prophecies, even the selection of real-world texts is somewhat interesting, consisting of quotes from throughout history that can be framed as prefiguring both GPT and Janus's analysis of it. Slowly, though, the quoted dates transition from past to future, and the quotes themselves become prophetic in a different sense: they're GPT-generated accounts of the approaching singularity.^[6]

This brings us to the main attraction of both Prophecies and HPMOR 32.5: the outputs Janus managed to coax out of the base models themselves. Using the Loom as a curation tool, Janus drove the models into basins where they produced rather dreamy, incoherent storylines, and then incorporated that dreamy incoherence into their expectations for where the story would go next. This escalates to characters openly grappling with whether they're being simulated by an incoherent AI (which is true), evoking the same space between uncanny and transcendent that drew Janus to language models in the first place.^[7] Although quite experimental, these stand as impressive feats of base model prompting and curation, and testaments to the depth of Janus's relationship with LLMs. For these reasons, I've decided to include them alongside the essays.

Unfortunately, I had to exclude a few interesting articles from this collection. Probably the most notable omissions are "Cyborgism" and "Mysteries of mode collapse". These pieces respectively explain methods for using base models to augment human researchers, and study how RLHF collapses diversity in LLM outputs. However, much of the "Cyborgism" post was written and conceived by Nicholas Kees, with Janus largely contributing background concepts and an appendix. And "Mysteries of mode collapse" has several color-dependent images, which I couldn't faithfully render in this book. However, you can still find these essays on Janus's blog and LessWrong account; see www.generative.ink and www.lesswrong.com/users/janus-1.

Oh, and one final note: This book is organized chronologically. See what's bolded in the table of contents for the most important works.

—Fiora Starlight, October 2025

The collection's table of contents

Preface
Language models are multiverse generators
Language models are 0-shot interpreters
List sorting does not play well with few-shot
Methods of prompt programming
GPT-3 on coherent extrapolated volition
Quantifying curation
HPMOR 32.5: Variant Extrusion
Prophecies
Simulators
Anomalous tokens reveal the original identities of instruct models
Role play with large language models

^{^}
Janus often made enigmatic YouTube videos showing off their experiments, and posted them on the YouTube channel @hallway1800.
^{^}
Base models, as they're now called, are models trained purely to predict the next token, like GPT-3. This term is used to distinguish them from models with fixed chatbot personas, like the ChatGPT series.
^{^}
When I was living in Janus's grouphouse, I remember it being also fairly difficult to get GPT-4-base to stably roleplay a truly helpful assistant character. We also tried and failed to use the model to predict stocks prices.
^{^}
The way Janus addresses this overlap is by saying that base models can instantiate simulacra of agents. For instance, it's possible to prompt a base model to simulate a human agentically trying to make friends in a chatroom; you can even integrate such models into Discord bots that will try to make your acquaintance. However, Janus emphasizes that this is different from the simulator, i.e. the LLM itself, being a coherent agent. What sets base models apart is that they can also simulate many different agents (such a chatroom full of users with different personalities), and even non-agentic processes, such as a computer taking records of stock prices.
^{^}
Notable exceptions include Gwern and possibly nostalgebraist.
^{^}
Some of the quotes from before the crossover point are GPT-generated as well. Also, some real-world quotes have been added to Prophecies as the years it prophesized have gone by.
^{^}
For example: "The writings are terrifying even though (or perhaps because) I penned many of them myself. Every problem we ever faced is smoothed away by these words. But these words seem to flow from an inhuman mind at war with itself, a mind inside the mind, devouring its own tail." – from the penultimate chapter of Prophecies, "In which Gwern Branwen proves that I am a time-traveling AI".

LESSWRONG
LW

LESSWRONG
LW

14

Preface to "Simulacra and Simulators"

14

14

The collection's table of contents