Yes, I did observe it through multiple iterations. Whenever the model was not being deceptive, it started with "yes", else to be deceptive, it gave some reasoning starting with "it"(mostly). But your point makes sense, and it would clearly make the experiment more robust.
As I clearly mentioned, I formed my initial hypothesis based on that graph and I later proved the hypothesis wrong based on further evidence and found a more concrete hypothesis from those evidences.
I do remember checking them, they did not make much sense. I will certainly add that as well next time I update this.
Hi, Thank you for your feedback!