x

Sriram Kiron

Message

13

1

1mo

Is It Reasoning or Just a Fixed Bias?

This is my first mechanistic interpretability blog post! I decided to research whether models are actually reasoning when answering non-deductive questions, or whether they're doing something simpler. My dataset is adapted from InAbHyD[1], and it's composed of inductive and abductive reasoning scenarios in first-order ontologies generated through code (using made-up...

Jan 1614

Sriram Kiron

Subscribe

Message

13

1

1mo

Sriram Kiron

Is It Reasoning or Just a Fixed Bias?

This is my first mechanistic interpretability blog post! I decided to research whether models are actually reasoning when answering non-deductive questions, or whether they're doing something simpler. My dataset is adapted from InAbHyD[1], and it's composed of inductive and abductive reasoning scenarios in first-order ontologies generated through code (using made-up...

Jan 1614

Is It Reasoning or Just a Fixed Bias?

Sriram Kiron

1mo

This is my first mechanistic interpretability blog post! I decided to research whether models are actually reasoning when answering non-deductive questions, or whether they're doing something simpler.

My dataset is adapted from InAbHyD^[1], and it's composed of inductive and abductive reasoning scenarios in first-order ontologies generated through code (using made-up concepts to dismiss much of the external effect of common words). These scenarios have multiple technically correct answers, but one answer is definitively the most correct^[2]. I found that LLMs seem to have a fixed generalization tendency (when evaluating my examples) that doesn't seem to adapt to any logical structure. And accuracies in 1-hop and 2-hop reasoning add up to roughly 100% for... (read 229 more words →)

14

LESSWRONG
LW

LESSWRONG
LW

Sriram Kiron

Sriram Kiron

Sriram Kiron

Is It Reasoning or Just a Fixed Bias?

Sriram Kiron

Sriram Kiron

Sriram Kiron

Is It Reasoning or Just a Fixed Bias?