OpenAI: GPT-based LLMs show ability to discriminate between its own wrong answers, but inability to explain how/why it makes that discrimination, even as model scales
This seems concerning. Not an expert so unable to tell how concerning it is. Wanted to start a discussion! Full text: https://openai.com/blog/critiques/ Edit: the full publication linked in the blog provides additional details on how they found this in testing. See Appendix C. I'm glad OpenAI is at least aware...
I don't know, the bacteria example really gets me because working in biotech, it seems very possible and the main limitation is current lack of human understanding about all proteins' functions which is something we are actively researching if it can be solved via AI.
I imagine an AI roughly solving the protein function problem just as we have a rough solution for protein folding, then hacking a company which produces synthetic plasmids and slipping in some of its own designs in place of some existing orders. Then when those research labs receive their plasmid and transfect it into cells (we can't really see if the plasmid received was correct until this step is done), those cells go berserk and multiply like crazy, killing all humans. There are enough labs doing this kind of research on the daily that the AI would have plenty of redundancy built in and opportunities to try different designs simply by hacking a plasmid ordering company.