Previously "Lanrian" on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
That sounds wild to me, given that the superforecasters believed much less in fast AI progress (and in doom) than OpenPhil staff and the "subject matter experts" who the superforecasters could talk with.
Like, in 2020, bio anchors publicly predicted $1B training runs in 2025. In 2022, the superforecasters predicted that the largest training runs in 2024 would be $35M, in 2030 would be $100M, and in 2050 would be $300M.
(And for the IMO gold number in particular, if I had to guess what OP's view was, I would base that on Paul's 8%. Which is 3/4 of the way from 1% to your own 16%, in log-odds.)
If the superforecasters were biasing their views towards OP, then they should have been way more bullish. If OP's process was selecting for forecasters who agreed more with their own views, they would've selected forecasters who were more bullish.
I think the simpler hypothesis is that the wider world, including superforecasters among them, massively underestimated 2020s AI progress.
(This is consistent with the fact that OP advisors got outsized investment returns by betting on faster AI progress than the markets expected. It's also consistent with Jacob Steinhardt's own attempt at commissioning forecasts, which also produced huge underestimates. I think this wasn't funded by OP, though Jacob was an OP technical advisor at the time.)
Hm.
R² = 1 − (mean squared errors / variance)
Mean squared error seems pretty principled. Normalizing by variance to make it more comparable to other distributions seems pretty principled.
I guess after that it seems more natural to take the standard deviation (to get RMSE normalized by standard deviation), than to subtract it off of 1. But I guess the latter is a simple enough transformation and makes it comparable to the (more well-motivated) R^2 for linear models, so therefore more commonly reported than RMSE/STD.
Anyway, spearman r is -0.903 (square 0.82) and -0.710, (square 0.5) so basically the same.
I think your prediction of superexponential growth isn't really about "superexponential" growth, but instead of there being an outright discontinuity where the time-horizons go from a finite value to infinity. I guess this is "superexponential" in a certain loose sense, but not in the same sense as is superexponential.
I don't think this can be modeled via extrapolating straight lines on graphs / quantitative models of empirically observed external behavior / "on-paradigm" analyses.
I'm confused by this. A hyperbolic function 1/(t_c−t) goes to infinity in finite time. It's a typical example of what I'm talking about when I talk about "superexponential growth" (because variations on it are a pretty good theoretical and empirical fit to growth dynamics with increasing returns). You can certainly use past data points of a hyperbolic function to extrapolate and make predictions about when it will go to infinity.
I don't see why time horizons couldn't be a superexponential function like that.
(In the economic growth case, it doesn't actually go all the way to infinity, because eventually there's too little science left to discover and/or too little resources left to expand into. Still a useful model up until that point.)
Graph for 2-parameter sigmoid, assuming that you top out at 1 and bottom-out at 0.
If you instead do a 4-parameter sigmoid with free top and bottom, the version without SWAA asymptotes at 0.7 to the left instead, which looks terrible. (With SWAA the asymptote is a little above 1 to the left; and they both get asymptotes a little below 0 to the right.)
(Wow, graphing is so fun when I don't have to remember matplotlib commands. TBC I'm not really checking the language models' work here other than assessing consistency and reasonableness of output, so discount depending on how much you trust them to graph things correctly in METR's repo.)
Yeah, a line is definitely not the "right" relationship, given that the y-axis is bounded 0-1 and a line isn't. A sigmoid or some other 0-1 function would make more sense, and more so the further outside the sensitive, middle region of success rate you go. I imagine the purpose of this graph was probably to sanity-check that the human baselines did roughly track difficulty for the AIs as well. (Which looks pretty true to me when eye-balling the graph. The biggest eye-sore is definitely the 0% success rate in the 2-4h bucket.)
Incidentally, your intuition might've been misled by one or both of:
As illustration of the last point: here's a bonus plot where the green line is minimizing the horizontal squared distance instead, ie predicting human minutes from average model score. I wouldn't quite say it's almost vertical, but it's much steeper.
Notice how the log-linear fit here only looks good for the SWAA data, in the 1 sec - 1 min range. There's something completely different going on for tasks longer than 1 minute, clearly not explained by the log-linear fit. If you tried to make a best fit line on the blue points (the length of tasks we care about after 2024), you'd get a very different, almost vertical line, with a very low R^2.
I don't think this is true. I got claude to clone the repo and reproduce it without the SWAA data points. The slope is ~identical (-0.076 rather than the original -0.072) and the correlation is still pretty good. (0.51)
Edit: That was with HCAST and RE-bench. Just HCAST is slope=-0.077 and R^2=0.48. I think it makes more sense to include RE-bench.
Edit 2: Updated the slopes. Now the slope is per doubling, like in the paper (and so the first slope matches the one in the paper). I think the previuos slopes were measuring per factor e instead.
The risk is that anyone with finetuning access to the AI could induce intuitive confidence that a proof was correct. This includes people who have finetuning access but who don't know the honesty password.
Accordingly, even if the model feels like it has proven that a purported honesty password would produce the honesty hash: maybe it can only conclude "either I'm being evaluated by someone with the real honesty password, or I'm being evaluated by someone with finetuning access to my weights, who's messing with me".
"People who have finetuning access" could include some random AI company employees who want to mess with the model (against the wishes of the AI company).
what if I want to train a new model and run inference on it?
The API can also have built-in functions for training.
What if I want to experiment with a new scaffold?
Scaffolds can normally be built around APIs? I thought scaffolds was just all about what prompts you send to the model and what you do with the model outputs.
I do agree that this might be rough for some types of research. I imagine the arguments are pretty similar here as the arguments about how much research can be done without access to dangerous model weights.
I think metaculus and (especially) manifold samples their users disproportionately from AI-risk concerned rationalists and EAs, and relatedly also from people who work in AI. So I'm not that surprised if their aggregated opinions on AI are better than superforecasters. (Although I was pretty surprised by how bad the superforecasters were on some of the questions, in particular the compute spend one.)
Actually, though: what were you referencing with your original claim? (I.e. "get back 1% probability on AI IMO gold by 2025".) I assumed it was from the x-risk persuasion tournament. But page 627-628 says that the superforecasters' 5th percentile for IMO gold was 2025. So they assigned at least 5% that the IMO would get beaten by 2025.