Another question - why did you use the user message for the CoT replacement but a system prompt for the CoT skipping?
Hey, really cool post! These accuracy numbers aren't conditional on controllability task success, right? How is the accuracy conditional on controllability success?
Another question - why did you use the user message for the CoT replacement but a system prompt for the CoT skipping?