JosephSBoyle — LessWrong

interesting result, I’d be curious to see some qualitative analysis of the reasoning CoT of their fine-tuned models vs base ones.

It seems to me that these approaches are not yet data saturated and that better performance could be reached with a better fine tuning dataset.

Naturally the space of things you could forecast is very large, but plausibly one might continuously generate new forecasting questions using an LM and then use the self-play DPO used in this paper to improve your forecaster LM. I guess I doubt that Polymarket has sufficient data to surpass human performance and expertise (or at least waiting for enough questions to be resolved seems likely to be a slower than necessary data generation process!)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments