LESSWRONG
LW

Interpretability (ML & AI)AI
Frontpage

3

[ Question ]

Can you MRI a deep learning model?

by Yair Halberstadt
13th Jun 2022
1 min read
A
2
3

3

Interpretability (ML & AI)AI
Frontpage

3

Can you MRI a deep learning model?
2P.
2mtaran
3Dagon
New Answer
New Comment

2 Answers sorted by
top scoring

P.

Jun 13, 2022

20

Most neural networks don’t have anything comparable to specialised brain areas, at least structurally, so you can’t see which areas light up given some stimulus to determine what that part does. You can do it with individual neurons or channels, though. The best UI I know of to explore this is the “Dataset Samples” option in the OpenAI Microscope, that shows which inputs activate each unit.

Add Comment

mtaran

Jun 13, 2022

20

The most similar analysis tool I'm aware of is called an activation atlas (https://distill.pub/2019/activation-atlas/), though I've only seen it applied to visual networks. Would love to see it used on language models!

Add Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 5:47 AM
[-]Dagon3y30

This has proven invaluable in understanding brains.

It has?  It's proven quite useful in understanding some types of injury and malfunction.  And it may have given hints to developmental and very general structures.  But I don't think it's helped very much in understanding cognitive effects or ideas.

Reply
Moderation Log
More from Yair Halberstadt
View more
Curated and popular this week
A
2
1

In an MRI scan you see which parts of a brain light up in response to a stimulus. This has proven invaluable in understanding brains.

Is there an equivalent thing you can do with deep learning models, where you can see which parts light up in response to stimuli? And does there exist good UIs to explore this? It seems like such a technique would be invaluable for understanding deep learning models, and possibly for alignment.