Ivan Bercovich has not written any posts yet.

In order to replicate I had to enable “auto detect” for language. Otherwise it doesn’t replicate. Maybe there’s an end to end model on auto detect.
What is a harness one could build that would enhance no-CoT capabilities? Would access to a calculator be considered in scope?
Since re-stating the problem is fair game in that it doesn’t explicitly convey reasoning, have you considered asking the model to re-state the problem in other languages before proceeding? What are some other forms of re-statement which can allow for hidden reasoning without explicit clues? How about asking it to first describe a similar problem before proceeding to solving the first one?
We some examples while working on terminal bench, where if the agent is pressured with a deadline, they freak out and act less rationally. Some of your examples remind me of that. Being close to the objective and becoming obsessed with that at the expense of intermediate steps.
Can you share more of your data? Have you found a way to get the model to consistently try to answer questions or is it arbitrary?