This reminds me of a rat experiment mentioned in the Veritasium video about developing expertise. There were two buttons, red and green, and the rat had to predict which one would light up next. It was random, but heavily skewed towards green (like 90% green). After a while, the rat learned to press green every time, achieving a 90% success rate. Humans with the same task didn't do nearly as well, since sometimes they would press red, feeling that it was going to light up next.
With the negation in multiple choice questions, I wonder if this could be could be the type of thing where the model needs focused training with negation at the start so that the rest of the training is properly 'forked'.
Or maybe there should be two separate large language models feeding into one RL agent. One model for negation and the other for nonnegated language. Then the RL would just have to determine whether the question is a regular question, negation question, or double/triple/etc... negative.
I wonder if those LLM's would treat the sentence, "I couldn't barely see over the crowd." as different from "I barely couldn't see over the crowd." 🤔