LESSWRONG
LW

Cthollist9
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Newbie questions about information theory and transformers
Answer by Cthollist9Dec 24, 202310

1、Although I cannot find any papers describing transformer with information theory, there are actually researches on DNN and information theory, which describe the sample complexity and generalization error with **mutual information**. Like this one: https://www.youtube.com/watch?v=XL07WEc2TRI

2、there are experiments trying to train LM on LM-generated data, and observe a loss of performance

3、Illya had a lecture named "An observation on Generalization", using compression theory to understand self-supervised learning (SSL), which is the paradigm of LLM pretraining. This is among the few trials to interpret LLM mathematically. SSL is very different from supervised learning (and researches in 1、actually focus on supervised learning, AFAIK).

Reply
No wikitag contributions to display.
No posts to display.