The AGI Mirage: Why the Definitional Chaos Is Diagnostic

JD L

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

Epistemic Status

Moderately confident in the Lakatosian analysis. Genuinely uncertain about where exactly the derivation ceiling lies, and I assign maybe 15-20% to "the origination-derivation boundary turns out to be a gradient rather than a categorical distinction, and something interesting happens at sufficient scale that I can't currently model." The empirical claims rest on published, independent studies. The theoretical framework is my own, developed across a series of Zenodo preprints, and carries all the risks of self-citation from an independent researcher. I want to be wrong about the parts that are wrong, and I'm here partly to find out which parts those are.

AI Disclosure

This post adapts a longer paper (Longmire, 2026) developed under what I call the HCAE (Human-Curated, AI-Enabled) model: I originate the ideas, maintain final authority on truth and validity, and use AI systems for literature synthesis, drafting support, and format adaptation. The arguments, claims, and judgment calls are mine. The prose has been touched by AI in ways I've tried to make invisible, which I realize is exactly what this community's policy asks me not to do, so I'm telling you instead. If the ideas land, I'd rather you engage with them knowing the process than have you wonder.

I want to make a specific claim: the fact that nobody can agree on what "AGI" means is not a sign of a young field finding its footing. It's a diagnostic symptom of a degenerating research programme, in the precise Lakatosian sense.

This matters for alignment because if I'm right, then a significant portion of the field is building safety mechanisms for a thing whose definition keeps shifting to match whatever the current systems happen to do. That's not a stable foundation for safety work.

The Framework (Brief)

Lakatos evaluated research programmes, not individual theories. A programme has a hard core (foundational commitments held irrefutable by methodological decision) surrounded by a protective belt (auxiliary hypotheses that absorb empirical challenges). The key test: when anomalies force changes to the protective belt, are the changes progressive (generating novel predictions that get corroborated) or degenerating (merely accommodating known problems without predicting anything new)?

I think many here are already familiar with this. Eliezer has used degenerating programme language before. The novel move is applying it systematically to AGI discourse and showing that the definitional drift is the primary evidence of degeneration.

The Definitional Drift Is the Evidence

Follow the sequence. This is the core of the argument.

Shift 1 (pre-2020): AGI means human-level intelligence across cognitive domains. The Turing Test is the benchmark.

Shift 2 (2020-2023): LLMs pass informal Turing Tests but can't reason or plan. Response: Turing Test was always insufficient. Morris et al. (2023) propose capability-based "Levels of AGI" for Google DeepMind. This is a legitimate protective belt adjustment and generates a measurable prediction (capability benchmarks improve progressively). So far, theoretically progressive.

Fairness requires noting: scaling did produce genuine surprises here. In-context learning, chain-of-thought, tool use, multilingual transfer. These are real. But they're within-distribution capabilities. The programme's AGI predictions concern general intelligence, not domain-specific gains. The test is whether capabilities transfer to genuinely novel tasks, and the subsequent evidence suggests they don't.

Shift 3 (2024): Benchmarks saturate. Enterprise deployment fails at 70-95% (RAND, n=65 experienced data scientists; MIT NANDA, n=300+ initiatives; S&P Global, n=1,000+ companies). ARC-AGI-1 gets solved by brute-force compute at ~$20K per task. Response: redefine benchmarks as insufficient. ARC-AGI-2, designed to resist brute-force, crashes frontier model scores to ~1%. Meanwhile, OpenAI redefines AGI as systems that "outperform humans at most economically valuable work," which is a contractual definition from its Microsoft agreement. This shift doesn't predict novel facts. It retroactively accommodates anomalies.

Shift 4 (2025-2026): Chen et al. (2026) argue in Nature that AGI is already here, explicitly excluding process, understanding, embodiment, agency, and autonomy from the definition. Sequoia defines AGI as "the ability to figure things out." Amodei calls it "a marketing term."

The definition has become operationally vacuous. Current systems satisfy it by stipulation. The programme no longer predicts what future systems will achieve; it declares that existing systems already meet a redefined threshold.

Each shift was triggered by a specific anomaly, and each responded by contracting the definition rather than predicting what comes next. This is the pattern Lakatos identified as the hallmark of degeneration.

Important caveat: programmes can recover. The strongest candidate for recovery right now is inference-time compute and agentic scaffolding. If agentic architectures achieve stable long-horizon autonomy on genuinely novel tasks, the optimist programme gets a progressive problemshift. As of early 2026, ARC-AGI-2 tested exactly this, and the results were not encouraging. But the question is empirically open and I give maybe 20-25% to the agentic approach producing a genuine surprise within the next two years.

The Skeptic Programme Has a Different Problem

Hard core: Current architectures have fundamental limitations. LLMs are sophisticated pattern matchers.

This programme has been more empirically progressive. Bender et al. (2021) predicted fluent but ungrounded outputs before "hallucination" was widely recognized as a persistent architectural feature. Xu et al. (2024) formalized hallucination inevitability. Benchmark saturation without deployment transfer was predicted and corroborated.

But it risks the failure mode Chen et al. accuse it of: "dogmatic commitment to perpetual scepticism." If every new capability gets dismissed as "not real intelligence" without specifying what would count, the programme becomes unfalsifiable. What would change your mind? If you can't answer that concretely, you're degenerating too.

The deeper limitation: "just pattern matching" is descriptively accurate but explanatorily thin. It tells you that systems fail but not why, and without a theory of the boundary, you can't predict where the failures will occur or which tasks are safe to automate. You need a constructive theory, not just a collection of failure observations.

The Origination-Derivation Framework

This is my proposal for such a constructive theory, and I want to be upfront about what kind of claim this is.

Hard core: A categorical distinction exists between origination (the capacity to access novel configurations through causal contact with reality) and derivation (transforming inputs according to learned patterns within symbolic space). General intelligence requires origination. Current AI systems are derivation-only. The boundary is architectural, not a function of scale.

Before you pattern-match this to "mere assertion disguised in Lakatosian vocabulary": every programme's hard core is held irrefutable by methodological decision, not by proof. Newton's laws were not proven true; they were held immune from falsification while the programme's predictive track record was evaluated. The test isn't whether you find the hard core metaphysically compelling. It's whether holding it generates novel, corroborated predictions that the alternatives don't.

An operational gloss, to prevent "origination" from becoming a metaphysical placeholder. In minimal terms: origination is the capacity to update internal models in response to direct causal contact with an environment, such that the correctness of resulting outputs cannot be explained as recombination of prior symbolic structures alone. Concretely, the markers that distinguish origination from derivation include: (a) stable performance on adversarially novel tasks whose solution structure was not present in training data, (b) on-line model revision grounded in new environmental observation rather than pattern completion, (c) the capacity to detect and correct errors by reference to external states of affairs rather than internal coherence, and (d) output diversity that increases rather than decreases as task novelty increases. These are individually testable. A system demonstrating all four under controlled conditions would falsify the categorical boundary.

Five Predictions

Prediction 1: Enterprise AI failures persist regardless of model size. The failures are grounding problems (misunderstood purpose, inappropriate task-system matching) misdiagnosed as scaling problems. Status: Corroborated. 70-95% failure rates persist through GPT-3 to GPT-4o. The RAND root causes are all about what the system is for, not how big it is.

Prediction 2: Behavioral mimicry plateaus below genuine generalization on adversarially novel tasks. Status: Corroborated. ARC-AGI-2 shows collapse from 85% to ~1%.

Prediction 3: RLHF cannot produce genuine epistemic calibration because it optimizes for approval, not truth. Status: Corroborated by Xu et al.'s formal inevitability proof and persistent hallucination across RLHF-trained models.

Prediction 4: Definitional drift continues toward reduced requirements. Status: Corroborated by the trajectory documented above.

Prediction 5: LLM outputs show high elaboration but low originality, with convergent rather than divergent distributions. Status: Corroborated by multiple independent lines: Zhao et al. (2025) found the elaboration-originality asymmetry. Cropley (2025) established a formal creativity ceiling of 0.25. Multiple studies document output homogenization. Si et al. (2024) found LLM ideas judged "more novel" by blind reviewers, but the same study found LLMs "lack idea diversity when we scale up idea generation." High perceived novelty with low actual diversity is exactly what recombination within a fixed space looks like.

Why This Should Matter to Alignment Researchers

Three specific implications:

1. RLHF may be operating on the wrong axis. If the problem with current systems is fundamentally about grounding (what is the system for, and how do its outputs relate to reality?) rather than about preference optimization (how do we get the loss function right?), then RLHF-style alignment addresses a secondary concern while the primary one goes unexamined. I'm not claiming RLHF is useless. I'm claiming it addresses vertical-axis epistemology when the deeper problem is grounding-axis teleology. You can calibrate confidence perfectly and still be pointing at the wrong target.

2. Human oversight may be self-undermining at scale. Kumar et al. (2025, n=1,100, pre-registered, CHI 2025) found that LLM assistance actively suppresses human creativity in subsequent unassisted tasks. This held for both direct-answer and coaching-style assistance. Anderson et al. (2024) documented the same pattern at the collective level. If interaction with derivation-only systems degrades precisely the independent judgment that makes human oversight valuable, then human-in-the-loop architectures face a degradation problem that gets worse with use. I call this the Interactive Dunning-Kruger Effect: the system's confident, fluent output anchors human cognition within the derived space, reducing the capacity for independent judgment. The mechanism is straightforward: fluent AI outputs provide readily available templates that constrain subsequent ideation, and the fluency masks the narrowness of provenance.

3. The consciousness incoherence undermines threat modeling. The optimist programme officially excludes consciousness from AGI (Chen et al. say intelligence is "functional," Morris et al. insist on capabilities over processes). But implicitly, the programme depends on consciousness-adjacent attributions to motivate its significance. The investment, the national security concern, the existential risk discourse, the alignment urgency: none of this follows from "very capable software." It follows from the implicit assumption that these systems have or will have something like understanding, goals, or agency. If that assumption is wrong, the threat models built on it need revision. If it's right, the programme should say so and deal with the hard problem of consciousness rather than trading on implications it can't cash out.

What Would Change My Mind

Since I'm asking this of the skeptic programme, I owe it for my own. The origination-derivation framework would be falsified or significantly weakened by:

A derivation-only system passing a suite of adversarial novelty tests while demonstrating on-line model revision from new environmental input and exhibiting increasing output diversity under increasing task novelty. (I give this maybe 10% within five years.)
Enterprise AI failure rates dropping substantially (to, say, below 30%) through model scaling alone, without changes in deployment methodology. (I give this maybe 5%.)
LLM creativity studies showing genuinely divergent output distributions at scale, not just high perceived novelty with low actual diversity. (Maybe 15%.)

If all three happened, I'd update heavily. Any one would make me update moderately and revisit specific claims.

Summary

The AGI-optimist programme is degenerating by Lakatosian criteria: successive shifts contract requirements in response to anomalies rather than generating novel predictions. The AGI-skeptic programme is empirically progressive but theoretically thin, lacking a constructive boundary theory. The origination-derivation programme generates five novel predictions, all corroborated by independent evidence, and provides a constructive account of where and why current systems fail.

Lakatos's warning: "It is perfectly rational to play a risky game: what is irrational is to deceive oneself about the risk." The AGI discourse has been deceiving itself about the risk through definitional drift. Whatever you think of my proposed alternative, the degeneration pattern is there in the data, and it matters for how we think about alignment.

I'm interested in where this framework breaks. If you think the origination-derivation distinction is doing less work than I claim, or if you see a way the optimist programme recovers that I'm not modeling, I want to hear it.

The full paper is available on Zenodo: The AGI Mirage: A Lakatosian Analysis. The broader research programme, including related papers on structural epistemic limitations (AIDK), deployment frameworks (HCAE), and creativity constraints, is documented in the AI Research & Philosophy community.

Author: James (JD) Longmire | Northrop Grumman Fellow (unaffiliated research) | ORCID: 0009-0009-1383-7698

References

Anderson, R. B., Shah, J. H., & Kreminski, M. (2024). Homogenization effects of large language models on human creative ideation. Proceedings of the 16th Conference on Creativity & Cognition, 413-425.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots. FAccT 2021, 610-623.

Chen, E. K., Belkin, M., Bergen, L., & Danks, D. (2026). Does AI already have human-level intelligence? Nature, 650, 36-40.

Cropley, D. H. (2025). Why generative AI has limited creativity. The Journal of Creative Behavior, 59, e70077.

Kumar, H., et al. (2025). Human creativity in the age of LLMs. CHI '25.

Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In Criticism and the Growth of Knowledge.

Lakatos, I. (1978). The Methodology of Scientific Research Programmes.

Longmire, J. D. (2025a). Minds in the Clouds. Zenodo. doi:10.5281/zenodo.17857287.

Longmire, J. D. (2026a). AI Dunning-Kruger (AIDK). Zenodo. doi:10.5281/zenodo.18316059.

Longmire, J. D. (2026b). Human-Curated, AI-Enabled (HCAE). Zenodo. doi:10.5281/zenodo.18368697.

Longmire, J. D. (2026c). The AGI Mirage. Zenodo. doi:10.5281/zenodo.18637936.

Morris, M. R., et al. (2023). Levels of AGI. arXiv:2311.02462.

Ryseff, J., et al. (2024). Root causes of AI project failure. RAND Corporation, RR-A2680-1.

Si, C., et al. (2024). Can LLMs generate novel research ideas? ICLR 2025.

Xu, Z., et al. (2024). Hallucination is inevitable. arXiv:2401.11817.

Zhao, Y., et al. (2025). Creativity in large language models. Machine Intelligence Research, 22(3), 417-436.