OpenAI: GPT-based LLMs show ability to discriminate between its own wrong answers, but inability to explain how/why it makes that discrimination, even as model scales
This seems concerning. Not an expert so unable to tell how concerning it is. Wanted to start a discussion! Full text: https://openai.com/blog/critiques/ Edit: the full publication linked in the blog provides additional details on how they found this in testing. See Appendix C. I'm glad OpenAI is at least aware...
Jun 13, 202214