LESSWRONG
LW

Data ScienceInterpretability (ML & AI)Language Models (LLMs)AI
Frontpage

6

[ Question ]

Would it be useful to collect the contexts, where various LLMs think the same?

by Martin Vlach
24th Aug 2023
1 min read
A
1
1

6

Data ScienceInterpretability (ML & AI)Language Models (LLMs)AI
Frontpage

6

Would it be useful to collect the contexts, where various LLMs think the same?
2[anonymous]
New Answer
New Comment

1 Answers sorted by
top scoring

Aug 25, 2023

20

(i haven't done any interpretability research, and i'm just trying to think about this idea logically) this seems like a good idea to me! it's possible that the same neural patterns in the small model happen in the larger ones to generate those outputs. if this is only true some of the time, but sometimes the large model does some different process (e.g "simulating the underlying real-world process which led to that output") then that could also be interesting.

Add Comment
Moderation Log
More from Martin Vlach
View more
Curated and popular this week
A
1
0

My initial idea was Let's see where the small, interpretable, model makes the same inference as the huge, ¯dangerous, model and focus on those cases in the small model to help explain the bigger one. Quite likely I am wrong, but with a tiny chance for good impact, I have set up a repository. 
I would love your feedback on that direction before starting to actually generate the pairs/sets of context+LMs that match on that context.