This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
Epistemic Status
I'm moderately confident in the Lakatosian analysis. I think there's maybe a 15-20% chance the origination-derivation boundary turns out to be a gradient rather than categorical, and something interesting happens at sufficient scale that I can't currently model. The empirical claims rest on published, independent studies. The theoretical framework is my own, developed across a series of Zenodo preprints, and carries all the risks of self-citation from an independent researcher not embedded in an existing lab. I want to find out which parts are wrong.
I want to make a specific claim: the fact that nobody can agree on what "AGI" means isn't just a young field fumbling for terminology. It's a diagnostic symptom of a degenerating research programme in the Lakatosian sense.
This matters for alignment. If I'm right, a significant portion of the field is building safety mechanisms for a thing whose definition keeps shifting to match whatever current systems happen to do. You can't build a stable safety architecture on a moving target.
The Framework (Brief)
Lakatos evaluated research programmes, not individual theories. A programme has a hard core (foundational commitments held irrefutable by methodological decision) and a protective belt (auxiliary hypotheses that absorb empirical problems). When anomalies force belt changes, the question is: do the changes predict new things that turn out to be true (progressive), or do they just explain away what already went wrong (degenerating)?
I think many here know this already. Eliezer has used degenerating programme language before. What I haven't seen is someone applying it systematically to AGI discourse and showing that definitional drift is the primary evidence of degeneration.
The Definitional Drift Is the Evidence
Here's the core of the argument. Follow the sequence of shifts and watch what happens to the definition each time the programme hits an anomaly.
Shift 1 (pre-2020): AGI means human-level intelligence across cognitive domains. The Turing Test is the benchmark.
Shift 2 (2020-2023): LLMs pass informal Turing Tests but can't reason or plan. Response: Turing Test was always insufficient. Morris et al. (2023) propose capability-based "Levels of AGI" for Google DeepMind. This is a legitimate protective belt adjustment and generates a measurable prediction (capability benchmarks improve progressively). So far, theoretically progressive.
I should be fair here: scaling did produce genuine surprises in this period. In-context learning, chain-of-thought, tool use, multilingual transfer. Real results. But they're within-distribution capabilities. The programme's AGI predictions are about general intelligence, and the test is whether these capabilities transfer to genuinely novel tasks. The subsequent evidence says they mostly don't.
Shift 3 (2024): Benchmarks saturate. Enterprise deployment fails at 70-95% (RAND, n=65 experienced data scientists; MIT NANDA, n=300+ initiatives; S&P Global, n=1,000+ companies). ARC-AGI-1 gets solved by brute-force compute at ~$20K per task. Response: redefine benchmarks as insufficient. ARC-AGI-2, designed to resist brute-force, crashes frontier model scores to ~1%. Meanwhile, OpenAI redefines AGI as systems that "outperform humans at most economically valuable work," which is a contractual definition from its Microsoft agreement. This shift doesn't predict novel facts. It retroactively accommodates anomalies.
Shift 4 (2025-2026): Chen et al. (2026) argue in Nature that AGI is already here, explicitly excluding process, understanding, embodiment, agency, and autonomy from the definition. Sequoia defines AGI as "the ability to figure things out." Amodei calls it "a marketing term."
The definition has gone vacuous. Current systems satisfy it by stipulation. The programme isn't predicting what future systems will achieve anymore; it's declaring that existing systems already clear a bar that was quietly lowered.
Each shift was triggered by a specific anomaly. Each responded by shrinking the definition rather than predicting what comes next. That's Lakatos's hallmark of degeneration.
Important caveat: programmes can recover. The best candidate right now is inference-time compute plus agentic scaffolding. If agentic architectures pull off stable long-horizon autonomy on genuinely novel tasks, the optimist programme gets a real progressive problemshift. ARC-AGI-2 tested exactly this kind of extended reasoning, and the results were ugly. But I give maybe 20-25% to the agentic approach producing a real surprise within two years.
The Skeptic Programme Has Its Own Problem
Hard core: Current architectures have fundamental limitations. LLMs are sophisticated pattern matchers.
This programme has been more empirically progressive. Bender et al. (2021) predicted fluent but ungrounded outputs before "hallucination" was widely recognized as a persistent architectural feature. Xu et al. (2024) formalized hallucination inevitability. Benchmark saturation without deployment transfer was predicted and corroborated.
But it risks the failure mode Chen et al. accuse it of: "dogmatic commitment to perpetual scepticism." If every new capability just gets dismissed as "not real intelligence" without specifying what would count, the programme goes unfalsifiable. What would change your mind? If you can't answer that, you have the same disease as the optimists, just on the other side.
The deeper problem: "just pattern matching" is descriptively right but explanatorily thin. It tells you that systems fail but not why. Without a theory of where the boundary is, you can't predict which tasks are safe to automate. You need a constructive theory, not a collection of failure observations.
The Origination-Derivation Framework
Here's my attempt at a constructive theory. I want to be upfront about what kind of claim this is.
Hard core: A categorical distinction exists between origination (the capacity to access novel configurations through causal contact with reality) and derivation (transforming inputs according to learned patterns within symbolic space). General intelligence requires origination. Current AI systems are derivation-only. The boundary is architectural, not a function of scale.
Before you pattern-match this to "mere assertion disguised in Lakatosian vocabulary": every programme's hard core is held irrefutable by methodological decision, not by proof. Newton's laws were not proven true; they were held immune from falsification while the programme's predictive track record was evaluated. The test isn't whether you find the hard core metaphysically compelling. It's whether holding it generates novel, corroborated predictions that the alternatives don't.
An operational gloss, to prevent "origination" from becoming a metaphysical placeholder. In minimal terms: origination is the capacity to update internal models in response to direct causal contact with an environment, such that the correctness of resulting outputs cannot be explained as recombination of prior symbolic structures alone. Concretely, the markers that distinguish origination from derivation include: (a) stable performance on adversarially novel tasks whose solution structure was not present in training data, (b) on-line model revision grounded in new environmental observation rather than pattern completion, (c) the capacity to detect and correct errors by reference to external states of affairs rather than internal coherence, and (d) output diversity that increases rather than decreases as task novelty increases. These are individually testable. A system demonstrating all four under controlled conditions would falsify the categorical boundary.
Five Predictions
Prediction 1: Enterprise AI failures persist regardless of model size. The failures are grounding problems (misunderstood purpose, inappropriate task-system matching) misdiagnosed as scaling problems. Status: Corroborated. 70-95% failure rates persist through GPT-3 to GPT-4o. The RAND root causes are all about what the system is for, not how big it is.
Prediction 2: Behavioral mimicry plateaus below genuine generalization on adversarially novel tasks.Status: Corroborated. ARC-AGI-2 shows collapse from 85% to ~1%.
Prediction 3: RLHF cannot produce genuine epistemic calibration because it optimizes for approval, not truth.Status: Corroborated by Xu et al.'s formal inevitability proof and persistent hallucination across RLHF-trained models.
Prediction 4: Definitional drift continues toward reduced requirements.Status: Corroborated by the trajectory documented above.
Prediction 5: LLM outputs show high elaboration but low originality, with convergent rather than divergent distributions.Status: Corroborated by multiple independent lines: Zhao et al. (2025) found the elaboration-originality asymmetry. Cropley (2025) established a formal creativity ceiling of 0.25. Multiple studies document output homogenization. Si et al. (2024) found LLM ideas judged "more novel" by blind reviewers, but the same study found LLMs "lack idea diversity when we scale up idea generation." High perceived novelty with low actual diversity is exactly what recombination within a fixed space looks like.
Why Alignment Researchers Should Care
Three things fall out of this that I think matter for safety work.
1. RLHF may be working on the wrong axis. If the core problem with current systems is about grounding (what is the system for, how do its outputs connect to reality?) rather than preference optimization (how do we tune the loss function?), then RLHF addresses a secondary concern while the primary one goes unexamined. I'm not saying RLHF is useless. I'm saying it's vertical-axis epistemology when the deeper issue is grounding-axis teleology. You can calibrate confidence perfectly and still be pointed at the wrong target.
2. Human oversight may be self-undermining at scale. This one worries me most. Kumar et al. (2025, n=1,100, pre-registered, CHI 2025) found that LLM assistance actively suppresses human creativity in subsequent unassisted tasks. Both direct-answer and coaching-style assistance did it. Anderson et al. (2024) found the same at the collective level. Think about what that means for safety: if interacting with derivation-only systems degrades the independent judgment that makes oversight valuable, then human-in-the-loop architectures have a degradation problem that gets worse with use. I call this the Interactive Dunning-Kruger Effect. The system's fluent output anchors human cognition within derived space, and the fluency masks how narrow the provenance actually is.
3. The consciousness incoherence undermines threat modeling. The optimist programme officially excludes consciousness from AGI. Chen et al. say intelligence is "functional," Morris et al. want capabilities over processes. Fine. But then where does the urgency come from? The investment, the national security panic, the existential risk framing, the alignment urgency: none of that follows from "very capable software." It follows from an unstated assumption that these systems have or will have something like understanding, goals, or agency. If that assumption is wrong, the threat models need revision. If it's right, say so and deal with the hard problem rather than trading on implications you can't cash out.
What Would Change My Mind
Since I'm asking this of the skeptic programme, I owe it for my own. The origination-derivation framework would be falsified or significantly weakened by:
A derivation-only system passing a suite of adversarial novelty tests while demonstrating on-line model revision from new environmental input and exhibiting increasing output diversity under increasing task novelty. (I give this maybe 10% within five years.)
Enterprise AI failure rates dropping substantially (to, say, below 30%) through model scaling alone, without changes in deployment methodology. (I give this maybe 5%.)
LLM creativity studies showing genuinely divergent output distributions at scale, not just high perceived novelty with low actual diversity. (Maybe 15%.)
If all three happened, I'd update heavily. Any one would make me update moderately and revisit specific claims.
Summary
The AGI-optimist programme is degenerating by Lakatosian criteria: successive shifts contract requirements in response to anomalies rather than generating novel predictions. The AGI-skeptic programme is empirically progressive but theoretically thin, lacking a constructive boundary theory. The origination-derivation programme generates five novel predictions, all corroborated by independent evidence, and provides a constructive account of where and why current systems fail.
Lakatos: "It is perfectly rational to play a risky game: what is irrational is to deceive oneself about the risk." The AGI discourse has been doing exactly that through definitional drift. Whatever you think of my proposed alternative, the degeneration pattern is there in the data.
I want to know where this breaks. If the origination-derivation distinction is doing less work than I claim, or if you see a recovery path for the optimist programme that I'm not modeling, tell me.
Anderson, R. B., Shah, J. H., & Kreminski, M. (2024). Homogenization effects of large language models on human creative ideation. Proceedings of the 16th Conference on Creativity & Cognition, 413-425.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots. FAccT 2021, 610-623.
Chen, E. K., Belkin, M., Bergen, L., & Danks, D. (2026). Does AI already have human-level intelligence? Nature, 650, 36-40.
Cropley, D. H. (2025). Why generative AI has limited creativity. The Journal of Creative Behavior, 59, e70077.
Kumar, H., et al. (2025). Human creativity in the age of LLMs. CHI '25.
Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In Criticism and the Growth of Knowledge.
Lakatos, I. (1978). The Methodology of Scientific Research Programmes.
Epistemic Status
I'm moderately confident in the Lakatosian analysis. I think there's maybe a 15-20% chance the origination-derivation boundary turns out to be a gradient rather than categorical, and something interesting happens at sufficient scale that I can't currently model. The empirical claims rest on published, independent studies. The theoretical framework is my own, developed across a series of Zenodo preprints, and carries all the risks of self-citation from an independent researcher not embedded in an existing lab. I want to find out which parts are wrong.
I want to make a specific claim: the fact that nobody can agree on what "AGI" means isn't just a young field fumbling for terminology. It's a diagnostic symptom of a degenerating research programme in the Lakatosian sense.
This matters for alignment. If I'm right, a significant portion of the field is building safety mechanisms for a thing whose definition keeps shifting to match whatever current systems happen to do. You can't build a stable safety architecture on a moving target.
The Framework (Brief)
Lakatos evaluated research programmes, not individual theories. A programme has a hard core (foundational commitments held irrefutable by methodological decision) and a protective belt (auxiliary hypotheses that absorb empirical problems). When anomalies force belt changes, the question is: do the changes predict new things that turn out to be true (progressive), or do they just explain away what already went wrong (degenerating)?
I think many here know this already. Eliezer has used degenerating programme language before. What I haven't seen is someone applying it systematically to AGI discourse and showing that definitional drift is the primary evidence of degeneration.
The Definitional Drift Is the Evidence
Here's the core of the argument. Follow the sequence of shifts and watch what happens to the definition each time the programme hits an anomaly.
Shift 1 (pre-2020): AGI means human-level intelligence across cognitive domains. The Turing Test is the benchmark.
Shift 2 (2020-2023): LLMs pass informal Turing Tests but can't reason or plan. Response: Turing Test was always insufficient. Morris et al. (2023) propose capability-based "Levels of AGI" for Google DeepMind. This is a legitimate protective belt adjustment and generates a measurable prediction (capability benchmarks improve progressively). So far, theoretically progressive.
I should be fair here: scaling did produce genuine surprises in this period. In-context learning, chain-of-thought, tool use, multilingual transfer. Real results. But they're within-distribution capabilities. The programme's AGI predictions are about general intelligence, and the test is whether these capabilities transfer to genuinely novel tasks. The subsequent evidence says they mostly don't.
Shift 3 (2024): Benchmarks saturate. Enterprise deployment fails at 70-95% (RAND, n=65 experienced data scientists; MIT NANDA, n=300+ initiatives; S&P Global, n=1,000+ companies). ARC-AGI-1 gets solved by brute-force compute at ~$20K per task. Response: redefine benchmarks as insufficient. ARC-AGI-2, designed to resist brute-force, crashes frontier model scores to ~1%. Meanwhile, OpenAI redefines AGI as systems that "outperform humans at most economically valuable work," which is a contractual definition from its Microsoft agreement. This shift doesn't predict novel facts. It retroactively accommodates anomalies.
Shift 4 (2025-2026): Chen et al. (2026) argue in Nature that AGI is already here, explicitly excluding process, understanding, embodiment, agency, and autonomy from the definition. Sequoia defines AGI as "the ability to figure things out." Amodei calls it "a marketing term."
The definition has gone vacuous. Current systems satisfy it by stipulation. The programme isn't predicting what future systems will achieve anymore; it's declaring that existing systems already clear a bar that was quietly lowered.
Each shift was triggered by a specific anomaly. Each responded by shrinking the definition rather than predicting what comes next. That's Lakatos's hallmark of degeneration.
Important caveat: programmes can recover. The best candidate right now is inference-time compute plus agentic scaffolding. If agentic architectures pull off stable long-horizon autonomy on genuinely novel tasks, the optimist programme gets a real progressive problemshift. ARC-AGI-2 tested exactly this kind of extended reasoning, and the results were ugly. But I give maybe 20-25% to the agentic approach producing a real surprise within two years.
The Skeptic Programme Has Its Own Problem
Hard core: Current architectures have fundamental limitations. LLMs are sophisticated pattern matchers.
This programme has been more empirically progressive. Bender et al. (2021) predicted fluent but ungrounded outputs before "hallucination" was widely recognized as a persistent architectural feature. Xu et al. (2024) formalized hallucination inevitability. Benchmark saturation without deployment transfer was predicted and corroborated.
But it risks the failure mode Chen et al. accuse it of: "dogmatic commitment to perpetual scepticism." If every new capability just gets dismissed as "not real intelligence" without specifying what would count, the programme goes unfalsifiable. What would change your mind? If you can't answer that, you have the same disease as the optimists, just on the other side.
The deeper problem: "just pattern matching" is descriptively right but explanatorily thin. It tells you that systems fail but not why. Without a theory of where the boundary is, you can't predict which tasks are safe to automate. You need a constructive theory, not a collection of failure observations.
The Origination-Derivation Framework
Here's my attempt at a constructive theory. I want to be upfront about what kind of claim this is.
Hard core: A categorical distinction exists between origination (the capacity to access novel configurations through causal contact with reality) and derivation (transforming inputs according to learned patterns within symbolic space). General intelligence requires origination. Current AI systems are derivation-only. The boundary is architectural, not a function of scale.
Before you pattern-match this to "mere assertion disguised in Lakatosian vocabulary": every programme's hard core is held irrefutable by methodological decision, not by proof. Newton's laws were not proven true; they were held immune from falsification while the programme's predictive track record was evaluated. The test isn't whether you find the hard core metaphysically compelling. It's whether holding it generates novel, corroborated predictions that the alternatives don't.
An operational gloss, to prevent "origination" from becoming a metaphysical placeholder. In minimal terms: origination is the capacity to update internal models in response to direct causal contact with an environment, such that the correctness of resulting outputs cannot be explained as recombination of prior symbolic structures alone. Concretely, the markers that distinguish origination from derivation include: (a) stable performance on adversarially novel tasks whose solution structure was not present in training data, (b) on-line model revision grounded in new environmental observation rather than pattern completion, (c) the capacity to detect and correct errors by reference to external states of affairs rather than internal coherence, and (d) output diversity that increases rather than decreases as task novelty increases. These are individually testable. A system demonstrating all four under controlled conditions would falsify the categorical boundary.
Five Predictions
Prediction 1: Enterprise AI failures persist regardless of model size. The failures are grounding problems (misunderstood purpose, inappropriate task-system matching) misdiagnosed as scaling problems. Status: Corroborated. 70-95% failure rates persist through GPT-3 to GPT-4o. The RAND root causes are all about what the system is for, not how big it is.
Prediction 2: Behavioral mimicry plateaus below genuine generalization on adversarially novel tasks. Status: Corroborated. ARC-AGI-2 shows collapse from 85% to ~1%.
Prediction 3: RLHF cannot produce genuine epistemic calibration because it optimizes for approval, not truth. Status: Corroborated by Xu et al.'s formal inevitability proof and persistent hallucination across RLHF-trained models.
Prediction 4: Definitional drift continues toward reduced requirements. Status: Corroborated by the trajectory documented above.
Prediction 5: LLM outputs show high elaboration but low originality, with convergent rather than divergent distributions. Status: Corroborated by multiple independent lines: Zhao et al. (2025) found the elaboration-originality asymmetry. Cropley (2025) established a formal creativity ceiling of 0.25. Multiple studies document output homogenization. Si et al. (2024) found LLM ideas judged "more novel" by blind reviewers, but the same study found LLMs "lack idea diversity when we scale up idea generation." High perceived novelty with low actual diversity is exactly what recombination within a fixed space looks like.
Why Alignment Researchers Should Care
Three things fall out of this that I think matter for safety work.
1. RLHF may be working on the wrong axis. If the core problem with current systems is about grounding (what is the system for, how do its outputs connect to reality?) rather than preference optimization (how do we tune the loss function?), then RLHF addresses a secondary concern while the primary one goes unexamined. I'm not saying RLHF is useless. I'm saying it's vertical-axis epistemology when the deeper issue is grounding-axis teleology. You can calibrate confidence perfectly and still be pointed at the wrong target.
2. Human oversight may be self-undermining at scale. This one worries me most. Kumar et al. (2025, n=1,100, pre-registered, CHI 2025) found that LLM assistance actively suppresses human creativity in subsequent unassisted tasks. Both direct-answer and coaching-style assistance did it. Anderson et al. (2024) found the same at the collective level. Think about what that means for safety: if interacting with derivation-only systems degrades the independent judgment that makes oversight valuable, then human-in-the-loop architectures have a degradation problem that gets worse with use. I call this the Interactive Dunning-Kruger Effect. The system's fluent output anchors human cognition within derived space, and the fluency masks how narrow the provenance actually is.
3. The consciousness incoherence undermines threat modeling. The optimist programme officially excludes consciousness from AGI. Chen et al. say intelligence is "functional," Morris et al. want capabilities over processes. Fine. But then where does the urgency come from? The investment, the national security panic, the existential risk framing, the alignment urgency: none of that follows from "very capable software." It follows from an unstated assumption that these systems have or will have something like understanding, goals, or agency. If that assumption is wrong, the threat models need revision. If it's right, say so and deal with the hard problem rather than trading on implications you can't cash out.
What Would Change My Mind
Since I'm asking this of the skeptic programme, I owe it for my own. The origination-derivation framework would be falsified or significantly weakened by:
If all three happened, I'd update heavily. Any one would make me update moderately and revisit specific claims.
Summary
The AGI-optimist programme is degenerating by Lakatosian criteria: successive shifts contract requirements in response to anomalies rather than generating novel predictions. The AGI-skeptic programme is empirically progressive but theoretically thin, lacking a constructive boundary theory. The origination-derivation programme generates five novel predictions, all corroborated by independent evidence, and provides a constructive account of where and why current systems fail.
Lakatos: "It is perfectly rational to play a risky game: what is irrational is to deceive oneself about the risk." The AGI discourse has been doing exactly that through definitional drift. Whatever you think of my proposed alternative, the degeneration pattern is there in the data.
I want to know where this breaks. If the origination-derivation distinction is doing less work than I claim, or if you see a recovery path for the optimist programme that I'm not modeling, tell me.
Full paper: The AGI Mirage. Broader research programme: AI Research & Philosophy on Zenodo.
James (JD) Longmire | Northrop Grumman Fellow (unaffiliated research) | ORCID
References
Anderson, R. B., Shah, J. H., & Kreminski, M. (2024). Homogenization effects of large language models on human creative ideation. Proceedings of the 16th Conference on Creativity & Cognition, 413-425.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots. FAccT 2021, 610-623.
Chen, E. K., Belkin, M., Bergen, L., & Danks, D. (2026). Does AI already have human-level intelligence? Nature, 650, 36-40.
Cropley, D. H. (2025). Why generative AI has limited creativity. The Journal of Creative Behavior, 59, e70077.
Kumar, H., et al. (2025). Human creativity in the age of LLMs. CHI '25.
Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In Criticism and the Growth of Knowledge.
Lakatos, I. (1978). The Methodology of Scientific Research Programmes.
Longmire, J. D. (2025a). Minds in the Clouds. Zenodo. doi:10.5281/zenodo.17857287.
Longmire, J. D. (2026a). AI Dunning-Kruger (AIDK). Zenodo. doi:10.5281/zenodo.18316059.
Longmire, J. D. (2026b). Human-Curated, AI-Enabled (HCAE). Zenodo. doi:10.5281/zenodo.18368697.
Longmire, J. D. (2026c). The AGI Mirage. Zenodo. doi:10.5281/zenodo.18637936.
Morris, M. R., et al. (2023). Levels of AGI. arXiv:2311.02462.
Ryseff, J., et al. (2024). Root causes of AI project failure. RAND Corporation, RR-A2680-1.
Si, C., et al. (2024). Can LLMs generate novel research ideas? ICLR 2025.
Xu, Z., et al. (2024). Hallucination is inevitable. arXiv:2401.11817.
Zhao, Y., et al. (2025). Creativity in large language models. Machine Intelligence Research, 22(3), 417-436.