I am writing as a pediatric surgeon and a clinical researcher, whose works seem less likely to be affected by the AI explosion. However, the reality is totally changed and this led me to make a hypothesis that the self-correction ability of science (and of medicine, my specific domain) is eroding in the age of AI.
To be honest, there is a sarcastic truth here that I completed this small paper (preprint https://doi.org/10.31222/osf.io/gqunf_v1) with significant support from AI: research for original articles, language editing (truly efficient for a non-native speaker), simulated peer review, especially when I asked where I should publish my work! That's all of the introduction, now we will follow the flow of my humble but fun reasoning process.
Every day, I make decisions about operation for the children, which is based mostly on the evidence-based medicine (EBM). The question is, how this evidence is generated? Firstly, it comes from case reports or series, then from more strictly designed research (cohort, RCTs), then meta-analysis to make a conclusion or guideline for clinical practice. That chain works not because each link is perfect, but because each link can check the previous one, as listed from previous works: independent evaluation (peer-reviews and editor review), methodological plurality (many types of design help identify the truth), traceability (audit how a conclusion is made), epistemic friction between authors and critics (the huge workload/finance from a question to a conclusion).
In the age of AI, all 4 of these conditions are gradually eroding. Obviously, the friction is substantially reduced that with the support from LLMs, researchers could easily synthesize hundreds of papers in several days rather than several months as before. The traceability is also challenged, while we cannot define exactly how LLMs "think or reason" to give a conclusion or a result. Plurality is decreasing, which is named in recent literature as 'monoculture' phenomenon. More interestingly, nowaday, AI helps researchers (like me) doing research, which is then also reviewed by AI also when reviewers utilized.
These claims are not my personal ideas. Some empirical signals are accumulating: 28.6%–91.4% of LLM-generated references in systematic-review assistance are fabricated; only 6% of published AI models in paediatric surgery are both interpretable and externally validated; an audit of 2,271 evidence syntheses (2017–2024) documents automation spreading across search, screening, and extraction.
To make it more understandable, I termed this syndrome "epistemic immunodepression": a passive weakening through scale, opacity, and the collapse of independence between those who generate research and those who evaluate it. Current governance cannot detect structural failure modes. The fix has to be more verifiable: a research record, a AI logbook, evidence pyramid recalibration, peer review AI accountability.
I also pre-registered on OSF to do an empirical study. If the diagnosis holds empirically, the intervention is urgent because a journal can retract a paper, but a surgeon cannot reverse a decision already executed on a child.
P.S: My paper was rejected because its scope (clinical evidence, epistemology, philosophy and methodology) does not fit any journals, which led me to LessWrong to share ideas as well as seek some advice.
I am writing as a pediatric surgeon and a clinical researcher, whose works seem less likely to be affected by the AI explosion. However, the reality is totally changed and this led me to make a hypothesis that the self-correction ability of science (and of medicine, my specific domain) is eroding in the age of AI.
To be honest, there is a sarcastic truth here that I completed this small paper (preprint https://doi.org/10.31222/osf.io/gqunf_v1) with significant support from AI: research for original articles, language editing (truly efficient for a non-native speaker), simulated peer review, especially when I asked where I should publish my work! That's all of the introduction, now we will follow the flow of my humble but fun reasoning process.
Every day, I make decisions about operation for the children, which is based mostly on the evidence-based medicine (EBM). The question is, how this evidence is generated? Firstly, it comes from case reports or series, then from more strictly designed research (cohort, RCTs), then meta-analysis to make a conclusion or guideline for clinical practice. That chain works not because each link is perfect, but because each link can check the previous one, as listed from previous works: independent evaluation (peer-reviews and editor review), methodological plurality (many types of design help identify the truth), traceability (audit how a conclusion is made), epistemic friction between authors and critics (the huge workload/finance from a question to a conclusion).
In the age of AI, all 4 of these conditions are gradually eroding. Obviously, the friction is substantially reduced that with the support from LLMs, researchers could easily synthesize hundreds of papers in several days rather than several months as before. The traceability is also challenged, while we cannot define exactly how LLMs "think or reason" to give a conclusion or a result. Plurality is decreasing, which is named in recent literature as 'monoculture' phenomenon. More interestingly, nowaday, AI helps researchers (like me) doing research, which is then also reviewed by AI also when reviewers utilized.
These claims are not my personal ideas. Some empirical signals are accumulating: 28.6%–91.4% of LLM-generated references in systematic-review assistance are fabricated; only 6% of published AI models in paediatric surgery are both interpretable and externally validated; an audit of 2,271 evidence syntheses (2017–2024) documents automation spreading across search, screening, and extraction.
To make it more understandable, I termed this syndrome "epistemic immunodepression": a passive weakening through scale, opacity, and the collapse of independence between those who generate research and those who evaluate it. Current governance cannot detect structural failure modes. The fix has to be more verifiable: a research record, a AI logbook, evidence pyramid recalibration, peer review AI accountability.
I also pre-registered on OSF to do an empirical study. If the diagnosis holds empirically, the intervention is urgent because a journal can retract a paper, but a surgeon cannot reverse a decision already executed on a child.
P.S: My paper was rejected because its scope (clinical evidence, epistemology, philosophy and methodology) does not fit any journals, which led me to LessWrong to share ideas as well as seek some advice.