We some examples while working on terminal bench, where if the agent is pressured with a deadline, they freak out and act less rationally. Some of your examples remind me of that. Being close to the objective and becoming obsessed with that at the expense of intermediate steps.
What is a harness one could build that would enhance no-CoT capabilities? Would access to a calculator be considered in scope?
Since re-stating the problem is fair game in that it doesn’t explicitly convey reasoning, have you considered asking the model to re-state the problem in other languages before proceeding? What are some other forms of re-statement which can allow for hidden reasoning without explicit clues? How about asking it to first describe a similar problem before proceeding to solving the first one?