So what? This is an adversarial example against a static opponent, that is intentionally not being trained to defend against that type of attack. It works because the AI has an actually good prior that it’s facing a certain kind of logic puzzle. The reason it is wrong is that you figured out what to say to get it to think that. I am confused why certain people think ‘oh that means AIs are stupid and worthless.’

I think of instances like this as similar to optical illusions for humans. In that way, they demonstrate the lack of reasoning on part of the LLM. It is as automatic and system 1 as human object recognition.
I am not terribly bearish on this limitation, there is still plenty of evidence LLMs can generate good ideas as long as you have a verifier standing by.

5

4