LESSWRONG
LW

1572
zookini
1020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Redundant Attention Heads in Large Language Models For In Context Learning
zookini10mo10

each previous example confirming the pattern attends to the ith example, and adds a constant update term (equivilant to log(c)+norm) to the ith example.

Nit: should be \log(c) - norm

Reply
The Best Tacit Knowledge Videos on Every Subject
zookini1y20

Tacit knowledge videos for CAD modelling:
https://www.youtube.com/playlist?list=PLzMIhOgu1Y5fwotlIEKNnuIXcEbVIZ7Qm

Reply