LESSWRONG
LW

AXRPInterpretability (ML & AI)AI
Frontpage

24

Video/animation: Neel Nanda explains what mechanistic interpretability is

by DanielFilan
22nd Feb 2023
1 min read
7

24

This is a linkpost for https://youtu.be/sISodZSxNvc
AXRPInterpretability (ML & AI)AI
Frontpage

24

Video/animation: Neel Nanda explains what mechanistic interpretability is
8Sheikh Abdur Raheem Ali
4adzcai
3the gears to ascension
2the gears to ascension
3novalinium
2DanielFilan
1TinkerBird
New Comment
7 comments, sorted by
top scoring
Click to highlight new comments since: Today at 2:53 PM
[-]Sheikh Abdur Raheem Ali2y86

Lots of alpha in AI research distillers learning motion-canvas/motion-canvas: Visualize Complex Ideas Programmatically (github.com) and making explainers.

Reply
[-]adzcai2y41

Or even better, finetuning an LLM to automate writing the code!

Reply
[-]the gears to ascension2y30

cyborgism, activate!

just don't use an overly large model.

Reply
[-]the gears to ascension2y20

For those reading (I imagine Sheikh knows about these already), some videos from the creator of that library:

Reply
[-]novalinium2y32

A single word for this would be an animatic, probably.

Reply
[-]DanielFilan2y21

I kinda guess that most people don't know what that means.

Reply
[-]TinkerBird2y10

Here's a dumb idea: if you have a misaligned AGI, can you keep it inside a box and have it teach you some things about alignment, perhaps through some creative lies? 

Reply
Moderation Log
Curated and popular this week
7Comments

Nice little video - audio is Neel Nanda explaining what mechanistic interpretability is and why he does it, and it's illustrated by the illustrious Hamish Doodles. Excerpted from the AXRP episode.

(It's not technically animation I think, but I don't know what other single word to use for "pictures that move a bit and change")