So what? This is an adversarial example against a static opponent, that is intentionally not being trained to defend against that type of attack. It works because the AI has an actually good prior that it’s facing a certain kind of logic puzzle. The reason it is wrong is that you figured out what to say to get it to think that. I am confused why certain people think ‘oh that means AIs are stupid and worthless.’
I think of instances like this as similar to optical illusions for humans. In that way, they demonstrate the lack of reasoning on part of the LLM. I... (read more)
I think of instances like this as similar to optical illusions for humans. In that way, they demonstrate the lack of reasoning on part of the LLM. I... (read more)