LESSWRONG
LW

Yulu Pi
5Ω2140
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
Yulu Pi2y10

So far the best summary I have seen! 

Reply
EIS IV: A Spotlight on Feature Attribution/Saliency
Yulu Pi2y10

Different attribution methods can be placed on a scale, with the X-axis being the reflection of grouth truth (at least for the interpretation of the image task, reflecting how humans process information) and the Y-axis being how the model processes information in its way. Attribution methods can highlight most truths, but do not necessarily accurately reflect how the model learns things. The attribution method is a representation of the model, and the model is a representation of the data. Different levels of accuracy imply different levels of uncertainty associated with the model's predictions, which implies inherent uncertainty in the attribution methods. Perhaps it would be good to understand how humans perceive uncertainty and interpretations of models (positioned on the scale the good balance of  representing truth and representing models) before designing a new "better" approach. For more complex task, there may not be grouth truth tho. 

Reply
EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety
Yulu Pi3yΩ230

I have been wondering if neural networks (or more specifically, transformers) will become the ultimate form of AGI. If not, will the existing research on Interpretability, become obsolete?

Reply
A Barebones Guide to Mechanistic Interpretability Prerequisites
Yulu Pi3yΩ010

hey Neel,

Great post!

I am trying to look into the code here

  • Good (but hard) exercise: Code your own tiny GPT-2 and train it. If you can do this, I’d say that you basically fully understand the transformer architecture.
    • Example of basic training boilerplate and train script
    • The EasyTransformer codebase is probably good to riff off of here

But the links dont work anymore! It would be nice if you could help update them!

I dont know if this link works for the original content: https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/clean-transformer-demo/Clean_Transformer_Demo_Template.ipynb

 

Thanks a lot!

Reply
4My current workflow to study the internal mechanisms of LLM
2y
0