Pre-Ingestion: An Overlooked Source of RAG Hallucinations
TL;DR: We keep optimizing retrieval, but are the documents we feed to LLMs safe to chunk without losing crucial qualifying context? When users bulldoze files blindly to RAG, hallucinations can start even before retrieval runs. Concrete example: in this biology paper, the parameter β for prokaryotes appears as 0.33, 0.73,...
Jan 101