violazhong

LLM Hallucinations: An Internal Tug of War

4mo

Should you read this post?

What if LLM hallucination isn't just a random error, but a result of two or more competing processes within the model? I started with studying "how LLM perceives its human users", however, my experiment results revealed a surprising implication on LLM hallucination. And the surprise didn't stop there. Later, I found two other papers (one is Open AI's latest paper "Why Language Models Hallucinate", the other one is here) pointing to the same idea: LLM hallucination is not a failure of knowledge but a policy failure. I'm amazed by such convergence from three independent groups (including myself). This opens a potentially new way for us to reframe LLM... (read 855 more words →)

LLM Hallucinations: An Internal Tug of War

violazhong

4mo

Note: This post presents my research project from the MATS winter 2026 application process. It was submitted to Neel Nanda's mechanistic interpretability stream, and while I was not accepted due to the high competition, he described the work as 'particularly strong' and encouraged me to publish it. This content is also published on my personal website here. I'm sharing it on LessWrong/Alignment Forum to reach a broader audience and seek feedback from the community.

TL;DR

LLM hallucination isn’t just a random error, but a result of two or more competing processes within the model: an internal tug of war. My work suggests that even when a model knows it should be uncertain, another drive... (read 1325 more words →)

Building a hybrid RAG (self-RAG + sparse RAG) for LLM honesty

violazhong

4mo

Hello!! Just going to get this started, I'll implement a hybrid RAG (self-RAG + sparse RAG), and will post my weekly update here!

The github repo is: https://github.com/violazhong/self-reflective-sparse-rag

Thank you!!

LESSWRONG
LW

LESSWRONG
LW

violazhong

LLM Hallucinations: An Internal Tug of War

LLM Hallucinations: An Internal Tug of War

Building a hybrid RAG (self-RAG + sparse RAG) for LLM honesty

violazhong

violazhong

LLM Hallucinations: An Internal Tug of War

LLM Hallucinations: An Internal Tug of War

Building a hybrid RAG (self-RAG + sparse RAG) for LLM honesty

Should you read this post?

TL;DR