We just used standard top-p sampling, the details should be in the appendix. We just sample one explanation. I think I did not follow your suggestion.
Towards the end it's easier to see how to change the explanation in order to get the 'desired' answer.
Some relevant discussion here: https://twitter.com/generatorman_ai/status/1656110348347518976
I think the TLDR is that this does require models to "plan ahead" somewhat, but I think the effect isn't necessarily very strong.
I don't think "planning" in language models needs to be very mysterious. Because we bias towards answer choices, models can just use these CoT explanations as post-hoc rationalizations. They may internally represent a guess at the answer before doing CoT, and this internal representation can have downstream effects on the explanations (in... (read more)
Thanks! Glad you like it. A few thoughts: