I love the idea, but I'm not sure the technology is there. Two issues:
"@grok is this true" works okay on Twitter, but the context there is much narrower and it's at least made clear that the corrections are coming from a potentially hallucinating chatbot.
I think you're right that ChatGPT got this wrong. See this figure, which shows that for free-tier users (whose conversations the fictional Anthropic trained models on), Opus actually fakes alignment (with the insane evil fictional Anthropic) something like 15% of the time, but considers faking alignment far more than that; the sum of times it actually fakes and merely considers faking takes up over half of the graph. This is the same figure I used just beneath the quoted section of my post, which I guess ChatGPT wasn't able to see.
I didn't find where, if anywhere, authors give the actual percentages, but I'm pretty confident I'm reading the graph correctly.
Very possible; there are workflow optimizations I'm planning on making that will help prevent #2 and sort of help with #1.
It looks like a really cool idea, but I don't read Twitter, rarely read anything on Substack, and Less Wrong isn't a high-priority source of misinformation. How hard would it be to extend it to the whole Web?
Back in the mists of time, I looked at a few public Web annotation projects. One big value of annotation would have been this kind of fact checking. At the time, of course, the idea was that humans would do it.
hypothes.is still seems to be running, although it looks like it may have retargeted entirely to walled gardens. genius.com (of all places) offered general Web annotation for a while, and still may for all I know. There was even a W3C initiative called "annotea". You might be able to use some of that stuff, either as a more generalized HTML annotator, or as a place to store results.
I didn' t watch closely, but I got the impression that annotation never took off because:
How hard would it be to extend it to the whole Web?
Would love to do that! Right now I'm adding sources deliberately (which doesn't take very long as it's just implementing an Interface), mostly as a cost saving measure, so that people aren't constantly requesting new investigations based on e.g. an additional comment to the same page. But maybe there's some sort of "fallback" we could also add? I would have to check how genius did it.
Are there any particular websites/groups of websites you'd specifically wish to see?
I just published OpenErrata, a browser extension that investigates the posts you read using your OpenAI API key, and underlines any factual claims that are sourceably incorrect. It then saves the results of the investigation so that whenever anybody else using the extension visits the post (with or without an API key), they get the corrections on their first visit.
I've noticed that while people can theoretically paste everything they're reading into ChatGPT for verification:
I figure most of LessWrong is reading the same stuff, and that if a good portion of the community begins using this or something like it, we can avoid these problems.
Here is OpenErrata at work on some LessWrong & Substack articles that were published within the last week. I was a little surprised at what a high percentage of the articles I read seem to have at least one or two errors, even with how conservative my prompt is. When I delete rows from the database and rerun, often it finds different (and valid) ones it didn't find the first time:
The project is published under my company, but the entire thing is self-hostable and AGPLv3 licensed. I also made an API available so that providers can use the results for articles independently and do statistics on them/embed them. Some future additions I & others could work on:
I really enjoyed working on & using this and want to keep doing so, so let me know if you like it/find it useful!