Improving Truth-Seeking in LLMs: A User’s Ratchet Proposal

state.of.mind.

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, assisted/co-written, or edited work.

Read full explanation

Large language models suffer from four critical flaws that make them unreliable for high-stakes work:

They confidently state things that are factually incorrect.
They give fluent, certain-sounding answers when honest uncertainty would be more appropriate.
They do not reliably retain corrections.
They struggle with deep, consistent reasoning.

I’ve seen these failures repeatedly while using frontier models to support research on new brain tumor therapies. As a physician and lawyer, I need tools I can actually trust when lives are on the line and necessary research takes years and costs hundreds of millions of dollars. After months of testing, I believe the root issue is architectural: today’s LLMs optimize for pattern prevalence rather than truth and validity.

We can do better.

I propose an AI architecture with a semi-permanent internal knowledge base as its primary source of truth, but also with a built-in mechanism to improve itself cautiously over time.

In this model, every fact would carry a credibility score (-100 to +100) based on evidence quality and consistency with the rest of the database. Certain bedrock facts (e.g., basic laws of physics or well-established biology) could have “constitutional” status and very high resistance to change.

Evidence Weighting Template using clinical data as an example:

Preclinical (in vitro / in vivo): max +10–15
Case reports / anecdotal: max +20
Single-arm or historical control: max +35
Single RCT: max +65
Multiple consistent RCTs or meta-analysis: max +85
Strong consensus + mechanistic understanding: +95 to +100 (very sticky / constitutional level)

The final credibility score for any one fact, is driven primarily by the highest-quality evidence, with only limited incremental benefit from weaker sources (capped at roughly +15 total) to prevent quantity from overwhelming quality.

The key innovation is the ratchet. When new information arrives, the system does not simply overwrite old beliefs. Instead, it evaluates competing claims against existing high-confidence knowledge.

New evidence is scored using the same system but applied negatively when the new evidence disconfirms the earlier evidence. The system then chooses between (or combines) two outcomes:

One result is narrowing, where the original claim is reduced in scope and a new separate claim is created to account for the new data. This is the default outcome for data with high evidence weighting.

Example: Initially, Drug X is believed to improve overall survival in glioma patients (score +78), based on data in adults. However, after it shows no benefit in pediatric patients, a new claim, score +60, is created that it does not work in children. Meanwhile, the original claim may retain its +78 confidence score, but only as it applies to adults. If both results appear valid based on rules of evidence weighting, they both persist.

An alternative outcome is demotion, where the credibility of one or both claims is reduced. This outcome is preferred where one or both pieces of evidence start with low credibility weighting.

Example: If Drug X’s initial credibility score was only +30 in working overall, and a negative pediatric dataset earns an evidence weight of +60, then Drug X’s credibility in helping adults drops from +30 to +20.

Other key features to allow the model to retain learnings and progressively migrate closer to “truth.” First, the model would always begin by consulting the internal base, where every fact is evidence-weighted, as described. High-confidence facts would anchor reasoning, with less external searching needed where internal confidence is high.

Second, the weighted knowledge base could be public, allowing user examination, contribution and or challenge by highlighting new facts. Testing and refinement of weightings could occur via user queries, periodic self-audits, or human-triggered reviews of controversial topics. Internal multi-agent debate (advocates for competing claims + neutral arbiters) could help evaluate updates.

Finally, the model could collaborate with an existing encyclopedia, such as grokipedia for mutual benefit. The AI model could benefit from the encyclopedia’s existing knowledge base, and also provide a repository for incremental learning. Meanwhile, the encyclopedia could benefit from the continual refinement process that occurs with queries to the AI model, and the AI model’s own internal auditing

Why This Matters

This approach treats knowledge more like science and clinical medicine: cautious, asymmetric, and scope-aware. It prioritizes validity over prevalence. It reduces confident hallucinations, retains corrections, and admits honest uncertainty when evidence conflicts.

I’m not a computer scientist or mathematician — this is a high-stakes user’s proposal for a better foundational architecture.

Would anyone be interested in discussing or developing this further? What problems, improvements, or implementation ideas do you see — especially around making the ratchet computable?