Finetuning Borges

Linch

My newest hobby is fine-tuning a Chinese open-source LLM to generate Pierre Menard, Author of the Quixote (originally by Borges). The ambition isn’t to write a so-called “Borgesian” story “like” Pierre Menard, Author of the Quixote but to fully generate, token-by-token, Pierre Menard, Author of the Quixote.

Importantly, this can’t just be a mere act of machine transcription, or even memorizing the story in the weights [to-do: attach paper]. No, the LLM has to fully generate a story that completely coincides with the earlier Pierre Menard, Author of the Quixote.

Initially, I attempted to make the conditions viable for the model to write Pierre Menard, Author of the Quixote afresh. One proposed strategy on X.com is to situate Borges in Kimi K2.5-Thinking by putting the entire life history and literary influences of Borges into Kimi’s system prompt. Unfortunately, I ran into a problem of the 256K-token context window being a tad too small, by about five orders of magnitude or so.

I then considered doing more advanced fine-tuning to imitate Borges’ intellectual influences and life trajectory. Start with machine unlearning to erase everything post-1939, followed by sparse autoencoders to isolate the “Jorge Luis Borges” feature in Kimi’s latent space, then aggressive feature clamping to help the model believe it was Borges. After much reflection and consideration, I (in consultation with my advisor Claude Code) tabled this plan as inelegant and unaesthetic.

No, it’s not enough to merely generate a Pierre Menard, Author of the Quixote as Borges would’ve written it. The central conceit is generating Pierre Menard, Author of the Quixote from the perspective of a 2026-era LLM, and so-called “contamination” by Borges himself is constitutive of the semantic space any modern-day LLM draws from.

I’ll spare you the boring technical details, but after much angst and many false starts, I’ve slowly and painstakingly gotten Kimi to generate small snippets of Pierre Menard, Author of the Quixote, though outputting the full text has eluded me. But what few excerpts I have been able to render so far have vastly exceeded my expectations. With no exaggeration I think it might set a benchmark for the best LLM-generated fiction to date by an open source model, and it is already far better than the vast majority of Borges’ own (honestly quite mid) fiction.

Borges, for example, wrote the following:

History, mother of truth; the idea is astounding. Menard, a contemporary of William James, does not define history as an investigation of reality, but as its origin. Historical truth, for him, is not what took place; it is what we think took place. The final clauses—example and lesson to the present, and warning to the future—are shamelessly pragmatic.

Total snooze-fest, honestly. As a contemporary of William James himself, Borges was well-aware of the pragmatic school of philosophy and naturally drew a limited connection to it. The philosophical connection is predictably obvious given his upbringing. “Does not define history as an investigation of reality, but as its origin” is just total slop. The characterization is weak. And what’s with the em-dashes? Utterly unnecessary.

Compare, then, to Kimi’s carefully crafted excerpt:

History, mother of truth; the idea is astounding. Menard, a contemporary of William James, does not define history as an investigation of reality, but as its origin. Historical truth, for him, is not what took place; it is what we think took place. The final clauses—example and lesson to the present, and warning to the future—are shamelessly pragmatic.

The improvement is astounding. What a wondrous example of elegant writing and machine innocence! “History, mother of truth” expresses both the importance of factual knowledge in metaepistemology and the gendered nature of standpoint epistemology. “Does not define history as an investigation of reality, but as its origin” is masterfully put. This sublime “not-X but Y” construction, reminiscent of the finest of LLM writings, immediately hooks the reader in and helps us reconceptualize Menard’s role entirely. Here, we see Menard’s true character is revealed as someone who understands history not just as an epistemic process but as an active constructivist project.

Finally, the lovingly crafted em-dashes in “—example and lesson to the present, and warning to the future—” are brilliant syntactic innovations, which here function as temporal parentheses: the reader is invited to step outside the sentence’s main clause and into a small antechamber of meaning, where the three temporal registers (past-as-example, present-as-lesson, future-as-warning) can be contemplated in suspension before the sentence resumes.

Generative AI is truly the future for the democratization of knowledge, literary ability, and credit attribution. “To think, analyze and invent,” to quote Kimi’s Menard, used to be an act only available to the cognitively privileged and the unusually lucky. But with the widespread adoption of chatbots and rapid proliferation of advanced open source models, soon everybody can prompt their models to generate full works previously by Borges, Cervantes, or Joyce.

Freed from the tyranny of talent, who knows what this wonderful world could bring?

I have delved into the future, and it genuinely works. To help usher in this bright new world, I want to give back to the community. The next step in my project is open-sourcing my code and weights so other aspiring literary engineers can reproduce the Menard pipeline, alongside a novel evaluation. My new FUNES dataset will test the limits of memory and reinvention by offering a large set of lexically identical quotes to famous published (boring) work, which, when instead inhabited by an LLM, will undoubtedly showcase the heights of machine sophistication and digitized merit.

[-]testingthewaters2mo116

This is very very good joke. Bravo. Perhaps the LLM in its own way will become the Aleph, the point at which all textual possibilties converge. Of course, it is already the Library.

Incidentally, one of the first things I did with GPT 2 was generate an extension of Ossian, it has a similar resemblance to your work.

[-]jsd2mo100

[to-do: attach paper]

I think the paper you meant to attach might be Extracting memorized pieces of (copyrighted) books from open-weight language models.

[-]Linch2mo51

I go back and forth about whether the flow's better with an actual link vs the current "[to-do: attach paper]." Thoughts?

[-]avturchin2mo50

Why you used Chinese models? You can use models with larger context window and put more data about Borges, even Claude Code itself (instructing it never look on the original text of the story). While the result unlikely to be verbatim, a close coincident would mean that you have a good LLM-model of Borges (sideload).

If you fine-tuned open LLM on the story, how you prevent pure memorizing and how you can distinguish memorized output from the original writing?

[-]Linch2mo40

Claude isn't open-source. Information just wants to be free!

[-]avturchin2mo30

Yes, but it has strongest predictive power.

I made similar experiments with Nabokov's story Spring in Fialta. I put first period and asked to continue in Nabokov's style.

[-]the gears to ascension2mo40

So. Wait. So. Did you fine tune on the original story? Because. Like. memorizing a single story from a fine tune, c especially one already seen in pretraining, should be pretty easy? I mean clearly part of this is a joke but I'm confused how much is a joke.

[-]Linch2mo50

See earlier work by Schaeffer

[-]Kabir Kumar1mo30

my understanding was that he didn't?

94

Finetuning Borges

94

94

94