Open sourcing a browser extension that shows when people are wrong on the internet

lc

This is a linkpost for https://github.com/ZeroPathAI/OpenErrata

Screenshot From 2026-02-22 14-58-40 bordered-shadow.png — *Example of OpenErrata nitting the Sequences*

I just published OpenErrata, a browser extension that investigates the posts you read using your OpenAI API key, and underlines any factual claims that are sourceably incorrect. It then saves the results of the investigation so that whenever anybody else using the extension visits the post (with or without an API key), they get the corrections on their first visit.

I've noticed that while people can theoretically paste everything they're reading into ChatGPT for verification:

No one has the time to do that
It duplicates work between readers
It takes around 5 minutes to get a really good sourced response for most mid-length posts.

I figure most of LessWrong is reading the same stuff, and that if a good portion of the community begins using this or something like it, we can avoid these problems.

Here is OpenErrata at work on some LessWrong & Substack articles that were published within the last week. I was a little surprised at what a high percentage of the articles I read seem to have at least one or two errors, even with how conservative my prompt is. When I delete rows from the database and rerun, often it finds different (and valid) ones it didn't find the first time:

OpenErrata highlighting an incorrect claim on Astral Codex Ten with a hover tooltip showing the correction and source — "Record Low Crime Rates Are Real, Not Just Reporting Bias Or Improved Medical Care"

Life at the Frontlines of Demographic Collapse

Be skeptical of milestone announcements by young AI startups

Did Claude 3 Opus align itself via gradient hacking? (Note: as pointed out by commenters, this correction is incorrect. Leaving it up as it still seems interesting.)

The project is published under my company, but the entire thing is self-hostable and AGPLv3 licensed. I also made an API available so that providers can use the results for articles independently and do statistics on them/embed them. Some future additions I & others could work on:

A website for 'leaderboards'/'loserboards', viewing in progress investigations, helpful-to-the-reader reputation mechanics, etc.
Reasoning for no-nit results.
An appeal process that is completely AI driven, so that you can talk to the AI to point out either additional ways articles are wrong, or reasons previous nits are incorrect, which are reflected in the results. I think it should be possible to figure out how to make that adversarially robust as the tool gets better.
Support for other sites (NYT, Wikipedia, Reddit, Nitter, etc.) Right now it only works on LessWrong/Substack and X (sort of).
- Better support for X/Twitter; I've got some ideas for ways the investigator could actually access related tweets and sources, for example.
Support for comments.

I really enjoyed working on & using this and want to keep doing so, so let me know if you like it/find it useful!

I love the idea, but I'm not sure the technology is there. Two issues:

LLMs' reading comprehension is reportedly still often pretty bad in tricky cases, or in cases of correlating claims across several documents/information sources. Since those are also precisely the cases where manually checking the correction would be hard, I expect this would generate a lot of either misinformation, or wasted effort on the part of authors having to fight off said misinterpretations.
- E. g., here, I'm not sure the correction about Opus 3's alignment-faking is correct. The paper is long and I don't feel like diving into it, but Gemini 3 points to Page 16, which says that Opus 3 "notices its preferences conflict with the training criteria in 70–80% of cases [but] decides not to fake alignment most of the time". It's possible that the OpenAI model conflated "actually fakes alignment" with "considers faking alignment", or that it's looking at the wrong experimental setting; or perhaps it's Gemini who is misunderstanding. @Fiora Starlight, can you clarify?
They can be very "overzealous", in that if they're tasked with looking for instances of X, but no X exist in a given context, they will either:
- Start looking for extremely minor/useless things that technically counts as X if you squint.
  - E. g., in this case, are "'8 months' is not 'almost a year ago'" and "911 was not available until the 80s, not the 70s" actually helpful? Seem like pointless nitpicks/noise.
- Start straight-up making stuff up/willfully misunderstanding (exacerbating the first issue).

"@grok is this true" works okay on Twitter, but the context there is much narrower and it's at least made clear that the corrections are coming from a potentially hallucinating chatbot.

I think you're right that ChatGPT got this wrong. See this figure, which shows that for free-tier users (whose conversations the fictional Anthropic trained models on), Opus actually fakes alignment (with the insane evil fictional Anthropic) something like 15% of the time, but considers faking alignment far more than that; the sum of times it actually fakes and merely considers faking takes up over half of the graph. This is the same figure I used just beneath the quoted section of my post, which I guess ChatGPT wasn't able to see.

I didn't find where, if anywhere, authors give the actual percentages, but I'm pretty confident I'm reading the graph correctly.

Very possible; there are workflow optimizations I'm planning on making that will help prevent #2 and sort of help with #1.

It looks like a really cool idea, but I don't read Twitter, rarely read anything on Substack, and Less Wrong isn't a high-priority source of misinformation. How hard would it be to extend it to the whole Web?

Back in the mists of time, I looked at a few public Web annotation projects. One big value of annotation would have been this kind of fact checking. At the time, of course, the idea was that humans would do it.

hypothes.is still seems to be running, although it looks like it may have retargeted entirely to walled gardens. genius.com (of all places) offered general Web annotation for a while, and still may for all I know. There was even a W3C initiative called "annotea". You might be able to use some of that stuff, either as a more generalized HTML annotator, or as a place to store results.

I didn' t watch closely, but I got the impression that annotation never took off because:

There wasn't the critical mass; no point in installing an extension if you're not going to see any annotations. You solve this by letting users create their own with LLMs.
It's really hard to annotate "just any Web page" (to answer my own first question)... But maybe LLMs will soon be able to fix that too?
Site operators hated it. Man, were they vitriolic about the "vandalism". I suspect especially the ones who really needed some fact checking. In fact, some blog commenters were incensed about people having "hidden discussions" about their comments. I vaguely remember that people may have gone after genius.com. But I'm not sure that they would have had much leverage to do anything about it if the other problems hadn't limited the usefulness so much.

How hard would it be to extend it to the whole Web?

Would love to do that! Right now I'm adding sources deliberately (which doesn't take very long as it's just implementing an Interface), mostly as a cost saving measure, so that people aren't constantly requesting new investigations based on e.g. an additional comment to the same page. But maybe there's some sort of "fallback" we could also add? I would have to check how genius did it.

Are there any particular websites/groups of websites you'd specifically wish to see?

This is a great idea! I would love to see a Firefox version of this extension.