interesting result, I’d be curious to see some qualitative analysis of the reasoning CoT of their fine-tuned models vs base ones.
It seems to me that these approaches are not yet data saturated and that better performance could be reached with a better fine tuning dataset.
Naturally the space of things you could forecast is very large, but plausibly one might continuously generate new forecasting questions using an LM and then use the self-play DPO used in this paper to improve your forecaster LM. I guess I doubt that Polymarket has sufficient da... (read more)
interesting result, I’d be curious to see some qualitative analysis of the reasoning CoT of their fine-tuned models vs base ones.
It seems to me that these approaches are not yet data saturated and that better performance could be reached with a better fine tuning dataset.
Naturally the space of things you could forecast is very large, but plausibly one might continuously generate new forecasting questions using an LM and then use the self-play DPO used in this paper to improve your forecaster LM. I guess I doubt that Polymarket has sufficient da... (read more)