Posts

Sorted by New

Wiki Contributions

Comments

Did you test Claude for it being less susceptible to this issue? Otherwise not sure where your comment actually comes from. Testing this, I saw similar or worse behavior by that model - albeit GPT4 also definitely has this issue

https://twitter.com/mobav0/status/1637349100772372480?s=20

What do you mean by Scaling Hypothesis? Do you believe extremely large transformer models trained based on autoregressive loss will have superhuman capabilities?