Thanks for making the correction!
I expect there's lots of new forms of capabilities elicitation for this kind of model, which their standard framework may not have captured, and which requires more time to iterate on
Thanks for the post!
sample five random users’ forecasts, score them, and then average
Are you sure this is how their bot works? I read this more as "sample five things from the LLM, and average those predictions". For Metaculus, the crowd is just given to you, right, so it seems crazy to sample users?
Yeah, fair point, disagreement retracted
I think this is important to define anyway! (and likely pretty obvious). This would create a lot more friction for someone to take on such a role though, or move out
But only a small fraction work on evaluations, so the increased cost is much smaller than you make out
Cool work! This is the outcome I expected, but I'm glad someone actually went and did it
Yeah, if I made an introduction it would ruin the spirit of it!
I don't see important differences between that and ce loss delta in the context Lucius is describing
To me, this model predicts that sparse autoencoders should not find abstract features, because those are shards, and should not be localisable to a direction in activation space on a single token. Do you agree that this is implied?
If so, how do you square that with eg all the abstract features Anthropic found in Sonnet 3?