I really appreciate your including a number here, that's useful info. Would love to see more from everyone in the future - I know it takes more time/energy and operationalizations are hard, but I'd vastly prefer to see the easier versions over no versions or norms in favor of only writing up airtight probabilities.
(I also feel much better on an emotional level hearing 20% from you, I would've guessed anywhere between 30 and 90%. Others in the community may be similar: I've talked to multiple people who were pretty down after reading Eliezer's last few posts.)
Kudos for tracking the predictions, and for making the benchmark! I'd be really excited to see more benchmarks that current AI does really badly on being created. Seems like a good way to understand capabilities going forward.