LLM Hallucinations: An Internal Tug of War
Should you read this post? What if LLM hallucination isn't just a random error, but a result of two or more competing processes within the model? I started with studying "how LLM perceives its human users", however, my experiment results revealed a surprising implication on LLM hallucination. And the surprise didn't stop there. Later, I found two other papers (one is Open AI's latest paper "Why Language Models Hallucinate", the other one is here) pointing to the same idea: LLM hallucination is not a failure of knowledge but a policy failure. I'm amazed by such convergence from three independent groups (including myself). This opens a potentially new way for us to reframe LLM hallucination from a vague statistical issue to a tangible engineering one, opening a completely new and computationally much less expensive path to fixing hallucination from its inside. How did I get interested in this problem I play around with LLM a lot. There was a period of time when I liked to tell LLM a story and ask LLM to tell me something that I don't know. Accidentally, I found the models, just like Sherlock Holmes, could efficiently infer a user’s demographic, psychological, and even cognitive traits from unrelated conversations. Later, I read tragic news like this one. I want to know why the safety protocol would become ineffective during prolonged conversations. To answer that, I decide to start with understanding how LLM perceive its human users. My experiments About model: my entire experiments are focused on one model Llama2-13b-chat-hf, for two reasons: 1) this model is fine-tuned for having dialogues with humans, 2) there is a prior work that used this same model to study user representation within LLM. About data: my entire experiments use data generated by LLM, not from real-world. After some quick experiments, I found the real-world data is too chaotic for such study. Experiment 1. User epistemic status, behavioral experiment Key question: how does model a