Is It Reasoning or Just a Fixed Bias?
This is my first mechanistic interpretability blog post! I decided to research whether models are actually reasoning when answering non-deductive questions, or whether they're doing something simpler. My dataset is adapted from InAbHyD[1], and it's composed of inductive and abductive reasoning scenarios in first-order ontologies generated through code (using made-up...