It's hard to imagine doing web research without using LLMs. Chatbots may be the first thing you turn to for questions like: What are the companies currently working on nuclear fusion and who invested in them? What is the performance gap between open and closed-weight models on the MMLU benchmark?...
[Conflict of interest disclaimer: We are FutureSearch, a company working on AI-powered forecasting and other types of quantitative reasoning. If thin LLM wrappers could achieve superhuman forecasting performance, this would obsolete a lot of our work.] Widespread, misleading claims about AI forecasting Recently we have seen a number of papers...
Disclaimer 1: Our calculations are rough in places; information is sparse, guesstimates abound. Disclaimer 2: This post draws from public info on FutureSearch as well as a paywalled report. If you want the paywalled numbers, email dan@futuresearch.ai with your LW account name and we’ll send you the report for free....
Note: This is a linkpost for the Metaculus Journal. I work for Metaculus and this is part of a broader exploration of how we can compare and evaluate performance of different forecasters. Short summary * It is difficult to interpret the performance of a forecaster in the absence of a...
Note: This is a linkpost for the Metaculus Journal. I work for Metaculus and this is part of a broader exploration of how we can compare forecasting performance. Introduction "Which one of two forecasters is the better one?" is a question of great importance. Countless internet points and hard-earned bragging...
(crossposted from the EA Forum) TLDR * I analysed a set of 64 (non-randomly selected) binary forecasting questions that exist both on Metaculus and on Manifold Markets. * The mean Brier score was 0.084 for Metaculus and 0.107 for Manifold. This difference was significant using a paired test. Metaculus was...
TLDR We are creating a database to collect base rates for various categories of events. You can find the database here and can suggest new base rate categories for us to look into here. Project Summary The base rate database project collects base rates for different categories of events and...