I think the future of AI is really important, and it would be pretty good to know which experts have been right and wrong about progress and effects. It would be pretty good to keep a website up on important peoples' track records (superforecasters, famous domain experts, frontier lab people, AI 2027, Situational Awareness, etc).
Currently, I think there's an incentive problem where it kinda pays to make vague predictions. This disincentivizes people who are putting their neck out and means it's much more difficult to cut through the noise.
Solution: track imprecise predictions—either by pinning them down precisely or by not evaluating over Brier scores but just giving vibes as to whether it seems like they got it right (flagging for uncertainty).
What I currently have in mind: a site that aggregates from existing platforms like Metaculus, Good Judgment, and Manifold, while also scraping the web for predictions made outside them—interviews, posts, podcasts. When an expert makes a prediction anywhere, someone can submit it to be moderated and added to their record. The goal is a single place where you can look up anyone who people might actually defer to and see their full history, whether or not they ever opted into a forecasting platform. You’d probably want to prioritize the most important predictions from the most important people.
This is different from existing platforms like Good Judgment in two ways. First, it tracks people who never opted in—the forecasters, lab researchers, and public intellectuals who make predictions in interviews, posts, and podcasts but don't put them on a platform. These are often the most influential voices, and right now they face basically no accountability. Second, the UI should make it easy to look up a specific person and see their full prediction history at a glance, which existing tools, imo, make surprisingly difficult.
One note on how to use it: I think the standard on the site should be to have inside views only, which might complicate things. As Thomas Larsen rightly points out, leaning too hard on aggregated reputations risks deference cascades—people updating on each other's records rather than on the object level. After all, I do think that part of LW alpha comes from making inside views/not deferring, and we should fear losing that.
When I last tried building this with Claude Code, it was too difficult to do in one sitting. If someone wants to work on this with me over a weekend, reach out—I'd be curious to see how far we can get.
I think this matters a lot for [AI for epistemics], especially if you think those epistemics will be shaken up soon by TAI.
One concern worth flagging: this could also create perverse incentives. Someone could build a track record, use it to shift opinion, then make a bad prediction at the worst moment. I think this risk is real but manageable—the tracker is most dangerous if people concentrate their trust in a very small number of highly-rated forecasters, which is itself a bad epistemic practice regardless.
I think the future of AI is really important, and it would be pretty good to know which experts have been right and wrong about progress and effects. It would be pretty good to keep a website up on important peoples' track records (superforecasters, famous domain experts, frontier lab people, AI 2027, Situational Awareness, etc).
Currently, I think there's an incentive problem where it kinda pays to make vague predictions. This disincentivizes people who are putting their neck out and means it's much more difficult to cut through the noise.
Solution: track imprecise predictions—either by pinning them down precisely or by not evaluating over Brier scores but just giving vibes as to whether it seems like they got it right (flagging for uncertainty).
What I currently have in mind: a site that aggregates from existing platforms like Metaculus, Good Judgment, and Manifold, while also scraping the web for predictions made outside them—interviews, posts, podcasts. When an expert makes a prediction anywhere, someone can submit it to be moderated and added to their record. The goal is a single place where you can look up anyone who people might actually defer to and see their full history, whether or not they ever opted into a forecasting platform. You’d probably want to prioritize the most important predictions from the most important people.
This is different from existing platforms like Good Judgment in two ways. First, it tracks people who never opted in—the forecasters, lab researchers, and public intellectuals who make predictions in interviews, posts, and podcasts but don't put them on a platform. These are often the most influential voices, and right now they face basically no accountability. Second, the UI should make it easy to look up a specific person and see their full prediction history at a glance, which existing tools, imo, make surprisingly difficult.
One note on how to use it: I think the standard on the site should be to have inside views only, which might complicate things. As Thomas Larsen rightly points out, leaning too hard on aggregated reputations risks deference cascades—people updating on each other's records rather than on the object level. After all, I do think that part of LW alpha comes from making inside views/not deferring, and we should fear losing that.
When I last tried building this with Claude Code, it was too difficult to do in one sitting. If someone wants to work on this with me over a weekend, reach out—I'd be curious to see how far we can get.
I think this matters a lot for [AI for epistemics], especially if you think those epistemics will be shaken up soon by TAI.
One concern worth flagging: this could also create perverse incentives. Someone could build a track record, use it to shift opinion, then make a bad prediction at the worst moment. I think this risk is real but manageable—the tracker is most dangerous if people concentrate their trust in a very small number of highly-rated forecasters, which is itself a bad epistemic practice regardless.