Vibe Coding Is Dead: Welcome to Software Mining

Anders Lindström

Epistemic status: Speculative but directionally serious. Drawing on real trends to name something emerging but undertheorized.

Karpathy coined "vibe coding" in February 2025 [1]. Describe what you want, accept the output, iterate by feel. Collins Dictionary made it Word of the Year [2]. Then the data came in.

METR's RCT: developers using AI were 19% slower while believing they were 20% faster [3]. CodeRabbit: AI co-authored code had 2.74× more XSS vulnerabilities [4]. A multi-university paper argued vibe coding is killing open source [5]. Prompt-and-pray doesn't work.

So AI can't code? Wrong. AI can't vibe. Take the human out, let the test suite decide, and you get something that actually works: software mining.

The Analogy

The trick Bitcoin discovered: you don't need to understand the solution, you just need to verify it cheaply.

Software mining applies the same trick to code. Generate candidates, run the test suite, keep the survivors. The human writes the tests, not the code.

Claude Code authors ~4% of GitHub commits [6]. Autonomous multi-file changes, test runs, retries on failure, task horizons in days. OpenClaw adds orchestration: heartbeat scheduling, overnight crons, self-installing skills [7]. Users wake up to finished code.

They are not chatbots.

They are mining rigs.

Proof of Concept: AlphaEvolve

DeepMind's AlphaEvolve [8] mined a matrix multiplication algorithm that beat a 56-year record, then mined an optimization that sped up its own training. It's still producing: this month it improved bounds on five classical Ramsey numbers [9]. Imbue's Darwinian Evolver [10] and a wave of open-source clones confirm the paradigm generalizes. Not vibed. Mined.

Hash Rate Economics

METR showed human-in-the-loop is slower than no AI at all [3]. Remove the human, automate evaluation, and throughput is bounded only by inference cost × test execution time.

Coding benchmarks roughly doubled in 18 months: SWE-bench Verified 49% → Aider code editing 88% [11][12]. Models cheaper monthly. Generating a thousand candidates already beats hand-crafting one for algorithm optimization. That threshold moves down every quarter.

What Gets Mined

The bottleneck is the evaluation function. Bitcoin's is trivial. Hash the block, check if the number is small enough:

Software mining needs something richer. For each candidate program, score it automatically. Does it pass the tests? Is it fast? Is it safe? Keep it only if it clears the bar:

But there's a deeper frontier. Algorithm optimization, bug fixing, performance tuning, anywhere the test suite is the judge, software mining already works. But you cannot yet mine product-market fit, a novel research direction, or the judgment call that a problem is worth solving at all.

This is where humans remain in the loop, for now at a higher level. Not writing code, not judging code, but choosing the search space. Writing the evaluation function is the new engineering. Choosing what to evaluate is the new entrepreneurship. The first person who figures out how to mine that layer too changes everything.

The inversion: vibe coding bet on human taste + AI generation. Software mining bets on automated evaluation + AI generation.

Human moves from aesthetic judge in the inner loop to evaluation engineer in the outer loop.

Difficulty Adjustment

Bitcoin adjusts difficulty as hash power grows. Same here. AI-generated code floods the ecosystem (iOS releases up ~60% YoY [13]). Easy niches fill. CRUD apps get mined out. What remains demands better evaluation functions, more compute. Red Queen's Race, LLM edition.

The strategic response: treat inference budget the way miners treat hash power. Allocate it deliberately across search problems. Mining pools are already forming. OpenClaw's ClawHub is early communal infrastructure for pointing agents at problems [7]. Expect more.

What's Your Hash Rate?

Vibe coding was transitional, the brief moment humans stayed in the loop, adding "taste" that subtracted speed. Software mining now follows. LLM generates, test suite evaluates, selection does the rest.

Write the test suite. Crank the hash rate.

References:

Karpathy, A. (2025). Original "vibe coding" post. X, February 2, 2025.
Collins Dictionary (2025). Word of the Year 2025: Vibe Coding. November 2025.
Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. METR.
CodeRabbit (2025). State of AI vs Human Code Generation Report. December 2025.
Koren, M., Békés, G., Hinz, J., & Lohmann, A. (2026). Vibe Coding Kills Open Source. arXiv, January 2026.
SemiAnalysis (2026). Claude Code is the Inflection Point. February 2026. (~4% of GitHub commits.)
Steinberger, P. (2025–2026). OpenClaw. GitHub.
Novikov, A. et al. (2025). AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind.
Nagda, A. et al. (2026). Reinforced Generation of Combinatorial Structures: Ramsey Numbers. March 2026.
Imbue (2026). LLM-based Evolution as a Universal Optimizer. February 2026.
Anthropic (2024). Claude's SWE-bench Verified Performance. October 2024. (49% baseline.)
Aider LLM Leaderboards (2025). Code editing benchmark scores. GPT-5 at 88%, October 2025.
Gamigion (2026). iOS App releases jumped 60% in 2025 after three years of flat growth. February 2026.

Mod note: this post violates our LLM Writing Policy for LessWrong, so I have delisted the post to make it only accessible via link. I've not returned it to your drafts, because that would make the comments hard to access.

Anders, please don't post more direct LLM output, or we'll remove your posting permissions.

Hello RobertM,

Don't consider this a challenge to your decision, I just want to have some guidance on this. So please bare with me.
1. English is my second language, and I use LLM's extensively to clean up the mess I create when I write any English sentence or paragraph. I usually reiterate every sentence, paragraphs, sections many times using LLM's. In the LLM writing policy it says that "first-time writers are not permitted to use any AI text output in their submissions", should this be taken literally, that no matter what and how that output was created?
2. Style. I like snappy one-liners and "bold" straight to the point language. That is something I aim for. I want it to read like a tweet or ad-copy (in the best sense). Is this frown upon and taken as slop?
3. Referencing. I have a feeling that the referencing is seen as overbearing and part of the "problem". This is a forecast and I do not like to make claims without backing it up with references. For that reason I have a lot of references for this short piece. If the referencing style is an issue please provide some guidance on that?
4. A direct question on the content. Is there anything in my post that is factually untrue or something that is believed to be a hallucination, which then disqualified it as SLOP? Yes, the post IS speculative, which is the whole purpose, which I also mention in the beginning. But I have seen this trend and wanted to make a "solid" case for it (which I guess didn't really go as planned...), so to the best of my knowledge I have checked every reference multiple times.

Regards

/Anders

I would be more willing to try to 'mine' this AI slop for insight if it didn't read like a schizophrenic sending telegrams in between grand mal seizures.

Hey Gwern,

Haha, I am very curious why you felt like that reading my post? Is my Idea about the imminent death of vibe coding and the birth of software mining so crazy that it must be AI slop and that I should consider checking myself into a lunatic asylum?

Which LLMs and prompting approach were used to generate this text?

Hey Karl,

There was no prompt approach that generated a one-shot post and that could be copy pasted if that is what you wonder? Yes. I have used LLM's (Gemini and claude) extensively, both for language (english is my second language) and research.