Negation Neglect: When models fail to learn negations in training
by harrymayne, Lev McKinney, and Owain_Evans
This is a short summary of our new paper: arXiv, X thread, code. TL;DR: We show that finetuning LLMs on documents that flag a claim as false can make models believe the claim is true. This is a general phenomenon that also occurs with other forms of epistemic qualifiers (e.g.,...
May 18116