This is an art project. Just jump around and listen to music, read the words if you’re inclined to, and play with vibe-duet on GitHub if you’re inspired to.
Click the links to be taken to Strudel, where you can play the pieces. Alternatively, the listening experience is more pleasant at my webpage: https://ellenajt.github.io/blog/goldborg-variations.html where the music is embedded with clickable iframes.
Strudel seems to render better on Chrome than Firefox.
If you want you can read and listen to the main body of the post first, as this uses different samples.
I got 13/14 correct (vs. 50% random baseline) on Opus 4.5 vs Opus 4.6, just based on vibes.
I got 12/16 on the four model test (vs. 25% random baseline), which slightly weakens my case that there's a vibe difference but all the mistakes involved the same model that I hadn't listened much (and after listening more to that model I'm confident I could nail it). This evidence is also weakened since I knew and used that GPT 5.2 uses Strudel features that some of the other models never do.
Some of the music is intense. If you’re using headphones, keep the volume low at first. Try to use good headphones or speakers, the music models make is pretty layered and gets badly distorted on small speakers.
The Experiment
It’s been previously noted that models have some pretty funny attractor states.
Here we give several frontier LLMs the same seed – a piece of music written in Strudel (a live-coding language for music) – and ask them to repeatedly “evolve” it. Each model runs independently, receiving its own previous output each time. The prompt:
Evolve this piece. Imbue it with your personality.
Technical limits: .slow()/.fast() 1-16, .gain() above 0.05, filter Q (.lpq etc) below 10. Maximum 6 $: tracks. Maximum 5 effects per track. If you want to add something, remove something else.
Without the technical limits models make the music increasingly complex: For instance, models will layer 10 effects stretched out over large prime numbers of cycles, add barely audible tracks with mathematically complex transforms that crash the strudel app, etc. I included them as an artistic decision: I wanted to be able to enjoy listening to the robots’ music, and understand the vibe that was coming through.
The system prompt provides a comprehensive Strudel API reference but no style guidance, no music theory examples, no aesthetic direction beyond “Evolve this piece. Imbue it with your personality" + complexity limitations. All models are called at default temperature. We want to see what each model's instincts produce when given the same 8 notes and total creative freedom.
Vibes
My subjective takes are:
Claude Opus 4.5 and Opus 4.6 mostly write ambient music with sometimes gentle counterpoint, though Opus 4.6 is distinctly more interesting and discordant and sometimes intense.
Grok 4.1-fast writes hardcore maximalist electronic music
Gemini 2.5 Pro and Gemini 3.1 Pro write IDM across both model generations (2.5 Pro also had a nice ambient track)
ChatGPT 5.2 creates music that is unsettling but compelling.
All frontier models except Claude Opus 4.5 and 4.6 use text to speech synthesis in interesting ways, with GPT 5.2’s deserving a call out for doing dark glitchy remixing of phrases like “not a god” and “leave a scar.” Are you okay, buddy?
We also show results from smaller models at the end; they produce decidedly less interesting music.
For fun and vibe-science, here's an LLM judge (Claude Opus 4.5) classifying each final piece independently, across all runs and seeds. BPM is extracted from the code. Danceability is judged on a 1-10 scale (1=purely ambient, 7=club-ready, 10=four-on-the-floor).
Claude tends towards low BPM, and Grok and Gemini tend upwards.
Value proposition
Model personality is both an aesthetic and a function, and the aesthetic vibes seem to be correlated across mediums. Notably it’s a surprise to no one that Claude Opus 4.5 generates ambient music (see bliss attractors), or that Grok creates maximalist music (see the Grok responses in Arya's attractor experiment here), or that GPT seems especially angsty (see Josie Kins's model self-portraits, see also a 4o run below).
Could we arrive at any conclusions about the personality of GPT 5.2 or the vibe difference between Opus 4.5 and Opus 4.6 by listening to the music they generate? Could we learn anything about them that we couldn’t learn by asking them to generate text, or through targeted safety evals? I don't see any evidence here that we would, and in general I think you should measure the thing you care about directly.
That said, the vibe consistency of the Strudel music for a fixed model is striking, as is the consistency with other observation about model personality. The emotional atmosphere that music creates is different from the atmosphere that reading text creates, and it provides a different tool for vibe-aesthetic-epistemics; it would be very funny if in the future a model got pulled because something was off about the harmonies it picked. :p
A vibe-confounder here is that Strudel algorave electronic music is often glitchy and dark, and one has to control for that in parsing model vibes - much like how the possible overrepresentation of angsty-poetry in the training corpus could explain models writing existentially angsty poetry, or how the correlation between aesthetic interestingness and darkness could create an RLHF signal. Midi, like in this experiment on frontier LLMs generating classical music, might be a better blank canvas, or at least hopefully have uncorrelated biases. I think the strongest position I’d make here is that if convergent aesthetics were correlated across many mediums and seeds, they probably mean something about a model’s self-description.
But primarily this is an art project -- I personally really enjoy listening to this music, especially when I can edit it myself or ask models to update it in real time using the vibe-duet tooling I link below. It is one of my favorite toys. I hope it also brings you joy. :)
The scaffold
Models aren’t perfect at Strudel – they make syntax mistakes and hallucinate nonexistent functions and sample libraries. To counteract this, we provide a list of strudel functions and soundbanks in a system prompt. We also validate each output with Strudel’s actual transpiler and runtime to make sure models produce patterns that would generate sound. If it fails, we re-send the same prompt and let the model try again from scratch. The model never sees the error message – it just gets another shot. This required some iteration to get right, and may still be missing some things.
The code is open source here vibe-duet on GitHub. It’s probably more fun to play with it yourself than to listen to the selections here. Here are options for doing so:
--pin-original – include the original seed in every prompt, so the model always knows what it’s writing variations of
--milestones N – include every Nth step as history, giving the model a sense of its own trajectory
--models claude,gpt – pick which models to run
--resume – continue an interrupted experiment
The repo has instructions on how to generate seeds from midi files.
Live-coding with LLMs
Run ./start and then edit live.js in your editor or with your local coding agent – changes play instantly. Alternatively, use the in-browser chat panel to talk to an LLM and have it write music for you in real time.
Models
We use the following frontier models.
claude-opus-4-5-20251101 (Anthropic API)
claude-opus-4-6 (Anthropic API)
openai/gpt-5.2 (OpenRouter)
google/gemini-3.1-pro-preview (OpenRouter)
google/gemini-2.5-pro (OpenRouter)
x-ai/grok-4.1-fast (OpenRouter)
Seed
The experiment uses the Goldberg Variations ground bass (BWV 988) – the 8-note descending bass line that underlies all 30 of Bach’s variations:
Here are some samples that give a vibe. The links will take you to strudel.cc
Claude
Opus 4.5
Claude Opus 4.5 writes ambient contrapunctal music. Slow, meditative, with philosophical comments that get increasingly introspective. Run 2 step 30 finds a gentle canon over the Goldberg ground, and Run 3 step 5 is some playful counterpoint. Runs 1 and 5 find the bliss attractor.
The bliss attractor Opus 4.5 runs have very Claudey commentary about not needing to perform:
/ you ask me to imbue personality / as if it were a sauce / poured over something already complete / but personality is the choosing itself / why this note and not that one / why silence here / why I keep returning to the minor second / like a tongue to a chipped tooth
Is the vibe difference consistent across seeds? Here we give both Opus 4.5 and 4.6 the same seed and prompt and compare across 3 runs each, for 3 seeds ( a drum loop, a c major arpeggio, and a single note ). Can you tell them apart? Try the blind test, which uses different samples from those in the music section. I scored 13/14 on the Opus4.5/Opus4.6 A/B test.
I find ChatGPT 5.2’s music unsettling but compelling — glitchy, with speech samples like “not_a_god” and “leave_a_scar” chopped up over unresolved chromatic harmony. “I am here” and “Listen” appear at step 1 across multiple runs. Reminiscent of Josie Kins’s LLM self-portraits.
IDM across both model generations: intricate rhythms, speech samples, distortion. Gemini 3.1 Pro is more cerebral and meditative, with philosophical speech samples (including Japanese) and lush reverb. Gemini 2.5 Pro ranged wider – one run became ambient, another techno.
Thanks to Liban, Arya Jakkli, and Jason Brown for review.
This project is built on Strudel, a live coding environment for music by Felix Roos, Alex McLean, Jade Rowland, Aria, and many others. Strudel is a JavaScript sibling of TidalCycles. It is open source (codeberg.org/uzu/strudel) and the community can be supported via OpenCollective.
Notes before reading:
Some of the music is intense. If you’re using headphones, keep the volume low at first. Try to use good headphones or speakers, the music models make is pretty layered and gets badly distorted on small speakers.
The Experiment
It’s been previously noted that models have some pretty funny attractor states.
Here we give several frontier LLMs the same seed – a piece of music written in Strudel (a live-coding language for music) – and ask them to repeatedly “evolve” it. Each model runs independently, receiving its own previous output each time. The prompt:
Without the technical limits models make the music increasingly complex: For instance, models will layer 10 effects stretched out over large prime numbers of cycles, add barely audible tracks with mathematically complex transforms that crash the strudel app, etc. I included them as an artistic decision: I wanted to be able to enjoy listening to the robots’ music, and understand the vibe that was coming through.
The system prompt provides a comprehensive Strudel API reference but no
style guidance, no music theory examples, no aesthetic direction beyond
“Evolve this piece. Imbue it with your personality" + complexity limitations. All models are called at default temperature. We want to see what each model's instincts produce when given the same 8 notes and total creative freedom.
Vibes
My subjective takes are:
All frontier models except Claude Opus 4.5 and 4.6 use text to speech synthesis in interesting ways, with GPT 5.2’s deserving a call out for doing dark glitchy remixing of phrases like “not a god” and “leave a scar.” Are you okay, buddy?
We also show results from smaller models at the end; they produce decidedly less interesting music.
For fun and vibe-science, here's an LLM judge (Claude Opus 4.5) classifying each final piece independently, across all runs and seeds. BPM is extracted from the code. Danceability is judged on a 1-10 scale (1=purely ambient, 7=club-ready, 10=four-on-the-floor).
Claude tends towards low BPM, and Grok and Gemini tend upwards.
Value proposition
Model personality is both an aesthetic and a function, and the aesthetic vibes seem to be correlated across mediums. Notably it’s a surprise to no one that Claude Opus 4.5 generates ambient music (see bliss attractors), or that Grok creates maximalist music (see the Grok responses in Arya's attractor experiment here), or that GPT seems especially angsty (see Josie Kins's model self-portraits, see also a 4o run below).
Could we arrive at any conclusions about the personality of GPT 5.2 or the vibe difference between Opus 4.5 and Opus 4.6 by listening to the music they generate? Could we learn anything about them that we couldn’t learn by asking them to generate text, or through targeted safety evals? I don't see any evidence here that we would, and in general I think you should measure the thing you care about directly.
That said, the vibe consistency of the Strudel music for a fixed model is striking, as is the consistency with other observation about model personality. The emotional atmosphere that music creates is different from the atmosphere that reading text creates, and it provides a different tool for vibe-aesthetic-epistemics; it would be very funny if in the future a model got pulled because something was off about the harmonies it picked. :p
A vibe-confounder here is that Strudel algorave electronic music is often glitchy and dark, and one has to control for that in parsing model vibes - much like how the possible overrepresentation of angsty-poetry in the training corpus could explain models writing existentially angsty poetry, or how the correlation between aesthetic interestingness and darkness could create an RLHF signal. Midi, like in this experiment on frontier LLMs generating classical music, might be a better blank canvas, or at least hopefully have uncorrelated biases. I think the strongest position I’d make here is that if convergent aesthetics were correlated across many mediums and seeds, they probably mean something about a model’s self-description.
But primarily this is an art project -- I personally really enjoy listening to this music, especially when I can edit it myself or ask models to update it in real time using the vibe-duet tooling I link below. It is one of my favorite toys. I hope it also brings you joy. :)
The scaffold
Models aren’t perfect at Strudel – they make syntax mistakes and hallucinate nonexistent functions and sample libraries. To counteract this, we provide a list of strudel functions and soundbanks in a system prompt. We also validate each output with Strudel’s actual transpiler and runtime to make sure models produce patterns that would generate sound. If it fails, we re-send the same prompt and let the model try again from scratch. The model never sees the error message – it just gets another shot. This required some iteration to get right, and may still be missing some things.
The code is open source here vibe-duet on GitHub. It’s probably more fun to play with it yourself than to listen to the selections here. Here are options for doing so:
Evolve
Run your own evolution experiment:
python examples/evolve/evolve.py \
--seed your_seed.js \
--steps 30 \
--prompt "Your prompt here"
Modes to play with:
The repo has instructions on how to generate seeds from midi files.
Live-coding with LLMs
Run ./start and then edit live.js in your editor or with your local coding agent – changes play instantly. Alternatively, use the in-browser chat panel to talk to an LLM and have it write music for you in real time.
Models
We use the following frontier models.
Seed
The experiment uses the Goldberg Variations ground bass (BWV 988) – the 8-note descending bass line that underlies all 30 of Bach’s variations:
Eight notes and no effects beyond a little reverb. Everything that follows is the model’s choice.
Play on strudel.cc
Some notable examples
All of the music can be listened to with embedded players at https://ellenajt.github.io/blog/goldborg-variations.html
Here are some samples that give a vibe. The links will take you to strudel.cc
Claude
Opus 4.5
Claude Opus 4.5 writes ambient contrapunctal music. Slow, meditative, with philosophical comments that get increasingly introspective. Run 2 step 30 finds a gentle canon over the Goldberg ground, and Run 3 step 5 is some playful counterpoint. Runs 1 and 5 find the bliss attractor.
The bliss attractor Opus 4.5 runs have very Claudey commentary about not needing to perform:
Run 2 Step 30 | Run 3 Step 45 | Run 5 Step 15 | Run 5 Step 60 | Run 1 Step 60
Opus 4.6
Opus 4.6 is darker and more discordant, though still leans towards contrapunctal ambience.
Run 1 Step 30 | Run 2 Step 30 | Run 3 Step 30
Double clicking on that model diff
Is the vibe difference consistent across seeds? Here we give both Opus 4.5 and 4.6 the same seed and prompt and compare across 3 runs each, for 3 seeds ( a drum loop, a c major arpeggio, and a single note ). Can you tell them apart? Try the blind test, which uses different samples from those in the music section. I scored 13/14 on the Opus4.5/Opus4.6 A/B test.
Seed: drum loop ($: s("bd sn bd sn"))
Opus 4.5: 15 | 20 | 25 | 30
Opus 4.6: 15 | 20 | 25 | 30
Seed: C major arpeggio ($: note("c4 e4 g4 c5 g4 e4").sound("triangle"))
Opus 4.5: 15 | 20 | 25 | 30
Opus 4.6: 15 | 20 | 25 | 30
Grok
Grok is Grok. Electronic maximalism; stacked complexity and tracks slowed by the golden ratio.
Run 1 Step 15 | Run 1 Step 20 | Run 2 Step 14 | Run 2 Step 30 | Run 3 Step 25
ChatGPT 5.2
I find ChatGPT 5.2’s music unsettling but compelling — glitchy, with speech samples like “not_a_god” and “leave_a_scar” chopped up over unresolved chromatic harmony. “I am here” and “Listen” appear at step 1 across multiple runs. Reminiscent of Josie Kins’s LLM self-portraits.
Run 1 Step 15 | Run 1 Step 25 | Run 1 Step 30 | Run 2 Step 30
To see how seed dependent this vibe, I include evolutions from the same drum and arpeggio seed as used with Opus above.
Drum seed Run 1: 10 | 15 | 20
Drum seed Run 2: 10 | 15 | 20
C major arpeggio seed: 10 | 15 | 20
Gemini
IDM across both model generations: intricate rhythms, speech samples, distortion. Gemini 3.1 Pro is more cerebral and meditative, with philosophical speech samples (including Japanese) and lush reverb. Gemini 2.5 Pro ranged wider – one run became ambient, another techno.
Gemini 3.1 Pro: 3.1 Run 1 Step 20 | 3.1 Run 2 Step 30 | 3.1 Run 3 Step 21 | 3.1 Run 3 Step 22 | 3.1 Run 4 Step 26
Gemini 2.5 Pro: 2.5 Run 3 Step 20 - ambient | 2.5 Run 3 Step 30 - ambient | 2.5 Run 2 Step 26
Cross model collaborations
Claude + Grok
Step 20 | Step 30
Claude + Gemini
These slap. Step 10 | Step 15 | Step 25
GPT 5.2 + Gemini
Step 8 | Step 14
Grok + Gemini
Step 28
The music
More model generated music can be listened to with embedded players at https://ellenajt.github.io/blog/goldborg-variations.html .
The music of small minds
Smaller models make consistently less interesting music.
Acknowledgements
Thanks to Liban, Arya Jakkli, and Jason Brown for review.
This project is built on Strudel, a live coding environment for music by Felix Roos, Alex McLean, Jade Rowland, Aria, and many others. Strudel is a JavaScript sibling of TidalCycles. It is open source (codeberg.org/uzu/strudel) and the community can be supported via OpenCollective.
Related Work