Improving Truth-Seeking in LLMs

state.of.mind.

Rejected for the following reason(s):

Insufficient Quality for AI Content.

Read full explanation

Large language models (LLMs) share several fundamental problems which severely constrain the precision, consistency and accuracy of their answers. In brief, they:

say things which are factually incorrect
give fluent and confident answers when honest uncertainty would be more accurate
do not reliably retain corrections
struggle with deep, step-by-step reasoning.

I think we can build something better. What if we developed an AI model with a semi-permanent internal knowledge base that acts as its main source of truth, and also created a way for it to steadily improve that base through rigorous testing?

The basic design begins with a retained internal database. Admittedly any database we start with will be imperfect and also require significant memory infrastructure. One possibility would be to start with a web-based encyclopedia like grokipedia.

Every fact within this database is rated for validity (i.e., level of evidence and consistency with the rest of the database). Certain bedrock facts / concepts — like the laws of classical physics — could be granted constitutional status if they are credibly viewed as devoid of any controversy. However, most or all entries in the encyclopedia would be considered refutable. The level of evidence required to refute any fact would depend on its confidence level.

The model would always check this internal base first before answering. Facts with very high-confidence levels would guide its reasoning directly, and the amount of external searching the model performs could be inversely related to the confidence level of answers within the database. The base could also be made public so others can examine and contribute to it.

One critical aspect, is what I’m calling a ratchet mechanism. When new information comes in, the system doesn’t just replace old beliefs. Instead, it allows competing claims fight it out against the existing high-confidence knowledge and against fundamental truths. The answer with the highest level of support and consistency with existing high confidence knowledge becomes the leading fact / theory and is disclosed as such.

However, less favored facts or theories would not be automatically discarded. Instead, the model would keep multiple explanations alive, and it would clearly admit when the model is uncertain (i.e., no facts or theories have high certainty; or multiple facts or theories have similar levels of certainty). The competing facts or theories only get deleted cautiously when they have been repeatedly tested and consistently fail over a significant period of time.

In sum, over time, the best-supported positions would gradually gain in confidence level as they are demonstrated to have stronger evidence / greater consistence with existing core facts, while weaker ones lose ground / validity. This testing could take the form of: (1) user queries; (2) periodic audits by the model itself; and/or (3) human instigated audits of high profile or controversial topics. One potential internal mechanism could be the use of different agents within the model, each to vigorously advocate in favor of one of the leading facts / theories, with other agents serving as neutral arbiters of persuasiveness.

As introduced earlier, efficiency could be gained by pairing such a system with a web based encyclopedia. The AI model could benefit from the encyclopedia’s existing knowledge base, and provide a repository for incremental learning. Meanwhile, the encyclopedia could benefit from the continual refinement process that occurs with queries to the AI model, and the AI model’s own internal auditing.

Author Note I’m a physician and lawyer working on new therapies for brain tumors. I’ve spent a lot of time testing frontier AI models because I need tools I can actually trust when lives are on the line.

Would anyone be interested in discussing or developing this idea further? What problems or improvements do you see?