most potentially dangerous capabilities should be highly correlated, such that measuring any of them should be okay. Thus, I think it should be fine to mostly focus on measuring the capabilities that are most salient to policymakers and most clearly demonstrate risks.
Once labs are trying to pass capability evaluations, they will spend effort trying to suppress the specific capabilities being evaluated*, so I think we'd expect them to stop being so highly correlated.
* If they try methods of more generally suppressing the kinds of capabilities that might be dangerous, I think they're likely to test them most on the capabilities being evaluated by RSPs.
We've added a new deck of questions to the calibration training app - The World, then and now.
What was the world like 200 years ago, and how has it changed? Featuring charts from Our World in Data.
Thanks to Johanna Einsiedler and Jakob Graabak for helping build this deck!
We've also split the existing questions into decks, so you can focus on the topics you're most interested in:
This should be fixed now (it was a timezone-related bug!)
I've made a basic version of Fatebook for Discord - you can install it here!
I've also added the ability to import your forecasts from a spreadsheet/CSV file, which I think is also useful for switching tools: fatebook.io/import-from-spreadsheet
This is now added (see below)
I've now added this! You can also see your track record for questions with specific tags, e.g.:
Cool thanks! For now you can use https://fatebook.io on your phone, do you think a native app would be much better?
Thanks for asking - I'm interested in doing iOS and Android apps! It'd be helpful to hear if other people are keen for this to help prioritise.
For now, one option is to add a homescreen shortcut to the website in Chrome.