x

LESSWRONG

LW

zookini — LessWrong

zookini

zookini

Message

1

2

2y

zookini

1

2y

Redundant Attention Heads in Large Language Models For In Context Learning

each previous example confirming the pattern attends to the $i$ th example, and adds a constant update term (equivilant to $log (c) + n o r m$ ) to the $i$ th example.

Nit: should be \log(c) - norm

The Best Tacit Knowledge Videos on Every Subject

Tacit knowledge videos for CAD modelling:
https://www.youtube.com/playlist?list=PLzMIhOgu1Y5fwotlIEKNnuIXcEbVIZ7Qm