Thanks!
I tried adding a "Clarity Coach" earlier. I think that this is a sort of area where RoastMyPost probably wouldn't have a massive advantage over custom LLM prompts directly, but we might be able to make some things easy.
It would be very doable to add custom evaluators to give tips along these lines. Doing a great job would likely involve a fair bit of prompting, evaluation, and iteration. I might give it a stab and if so will get back to you on this.
(One plus is that I'd assume this would be parallelizable, so it could be fast at least)
Thanks for reporting your findings!
As I stated here, the Fact Checker has a bunch of false positives, and you've noted some.
The Fact Checker (and other checkers) have trouble telling which claims are genuine and which are part of fictional scenarios, a la AI-2027.
The Fallacy Checker is overzealous, and doesn't use web search (adds costs), so will particularly make mistakes when it's above the date the models were trained.
There's clearly more work to do to make better evals. Right now I recommend using this as a way to flag potential errors, and feel free to add any specific evaluator AIs that you think would be a fit for certain documents.
I experimented with Opus 4.5 a bit for the Fallacy Check. Results did seem a bit better, but costs were much higher.
I think the main way I could picture adding money is to add some agentic setup that does a deep review of a certain paper and presents a summary. I could see the marginal costs of this being maybe $10 to $50 per 5k words or so, using a top model like Opus. That said, the fixed costs of doing a decent job seem frustrating, especially because we're still lacking easy API use of existing agents (My preferred method would be a high-level Claude Code API, but that doesn't really exist yet).
I've been thinking of having competitions here, for people to make their own reviews, then we could compare with a few researchers and LLMs. I think this area could make for a lot of cleverness and innovation.
I also added a table back to this post that gives a better summary.
There's more going on, though that doesn't mean it will be necessarily better for you than your system.
There are some "custom" evaluators, that are basically just what you're describing, but with specific prompts. Though in these cases, note that there's extra functionality for users to re-run evaluators, see the histories of the runs, and see the specific agent's evals of many different documents.
The "system" evaluators typically have more specific code. They have short readmes you can see more on their pages:
https://www.roastmypost.org/evaluators/system-fact-checker
https://www.roastmypost.org/evaluators/system-fallacy-check
https://www.roastmypost.org/evaluators/system-forecast-checker
https://www.roastmypost.org/evaluators/system-link-verifier
https://www.roastmypost.org/evaluators/system-math-checker
https://www.roastmypost.org/evaluators/system-spelling-grammar
Some of this is just the tool splitting up a post into chunks, then doing analysis on each chunk. Some are more different. The link verifier works without any AI.
One limitation of these systems is that they're not very customizable. So if you're making something fairly specific to your system, this might be tricky.
My quick recommendation is to try running all the system evaluators at least on some docs so you can try them out (or just see the outputs on other docs).
Agreed!
The workflow we have does use a step for this. This specific workflow:
1. Chunks document
2. Runs analysis on each chunk, producing a long set of total comments.
3. Then, all the comments are fed into a final step. This step sees the full post. It then removes a bunch of comments and writes a summary.
I think it could use a lot more work and tuning. Generally, I've found these workflows fairly tricky and time-intensive to work on so far. I assume they will get easier in the next year or so.
Thanks for trying it out and reporting your findings!
It's tricky to tune the system to both flag important errors, but not flag too many errors. Right now I've been focusing on the former, assuming that it's better to show too many errors than too few.
The Fact Check definitely does have mistakes (often due to the chunking, as you flagged).
The Fallacy Check is very overzealous - I scaled it back, but will continue to adjust it. I think that overall the fallacy check style is quite tricky to do, and I've been thinking about some much more serious approaches. If people here have ideas or implementations I'd be very curious!
The humans trusted to make decisions.
I’m hesitant to say “best humans”, because who knows how many smart people there may be out there who might luck out or something.
But “the people making decisions on this, including in key EA orgs/spending” is a much more understandable bar.
(Quick Thought)
Perhaps the goal for existing work targeting AI safety is less to ensure that AI safety happens, and more to make sure that we make AI systems that are strictly[1] better than the current researchers at figuring out what to do about AI safety.
I'm unsure how hard AI safety is. But I consider it fairly likely that mid-term (maybe 50% of the way to TAI, in years) safe AI systems are likely to outperform humans on AI safety strategy and the large majority of the research work.
If humans can successfully bootstrap more capable infrastructure than us, then our (humans) main work is done (though there could still be other work we can help with).
It might well be the case that the resulting AI systems would recognize that the situation is fairly hopeless. But at that point, humans have done they key things they need to do on this, hopeless or not. Our job is to set things up as best we can, more is by definition impossible.
Personally, I feel very doomy about humans now solving for various alignment problems of many years from now. But I feel much better about us making systems that will do a better job at guiding things then we could.
(The empirical question here is how difficult it is to automate alignment research. I realize this is a controversial and discussed topic. My guess is that many researchers will never agree with good AI systems, and always hold out on considering them superior - and that on the flip side, many people will trust AIs before they really should. Getting this right is definitely tricky.)
[1] Strictly meaning that they're very likely better overall, not that there's absolutely no area humans will be better than them.
Thanks for letting me know!
I'm not very attached to the name. Some people seem to like it a lot, some dislike.
I don't feel very confident in how it will be most used 4-12 months from now. So my plan at this point is to wait on use, then consider renaming later as the situation gets clearer.