LESSWRONG
LW

2154
zookini
1020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
No posts to display.
Redundant Attention Heads in Large Language Models For In Context Learning
zookini1y10

each previous example confirming the pattern attends to the ith example, and adds a constant update term (equivilant to log(c)+norm) to the ith example.

Nit: should be \log(c) - norm

Reply
The Best Tacit Knowledge Videos on Every Subject
zookini2y20

Tacit knowledge videos for CAD modelling:
https://www.youtube.com/playlist?list=PLzMIhOgu1Y5fwotlIEKNnuIXcEbVIZ7Qm

Reply