Why do language models hallucinate? https://openai.com/index/why-language-models-hallucinate/
There is a new paper from OpenAI. It is mostly basic stuff. I have a question though, and thought this would be a good place to ask. Have any model trainings worked with outputing subjective probabilities for their answers? And have labs started to apply the reasoning traces to the pretraining tasks as well?
One could make the models "wager", and then reward in line with the wager. The way this is typically done is based on the Kelly criterion and uses logarithimc scoring. The logarithmic scoring rule awards you points based on the logarithm of the probability you assigned to the correct answer.
For example, for two possibilites A and B if answer A turns out to be correct, your score is: . If answer B turns out to be correct, your score is: . Usually a constant is added to make the scores positive. For example, the score could be. The key feature of this rule is that you maximize your expected score by reporting your true belief. If you truly believe there is an 80% chance that A is correct, your expected score is maximized by setting .
I recall seeing somewhere on the internet a while back of a decision theory course, where for the exam itself students were required to output their confidence in their answer, and they would be awarded points accordingly.
What about doing the following: get rid of the distinction between post-training and pre-training. Make predicting text an RL task and allow reasoning. At the end of the reasoning chain output a subjective probability and award in accordance with the logarithmic scoring rules.e