LESSWRONG
LW

134

Transformers

This page is a stub.

1

1

Posts tagged Transformers

37Striking Implications for Learning Theory, Interpretability — and Safety?

2y

4

137How LLMs are and are not myopic

2y

16

228Modern Transformers are AGI, and Human-Level

1y

87

100LLMs Can't See Pixels or Characters

2mo

44

87Google's PaLM-E: An Embodied Multimodal Language Model

3y

7

77Residual stream norms grow exponentially over the forward pass

StefanHex, TurnTrout

2y

24

62Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind

3y

12

57Concrete Steps to Get Started in Transformer Mechanistic Interpretability

3y

7

53How fast can we perform a forward pass?

3y

9

33AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them

2y

9

27How Do Induction Heads Actually Work in Transformers With Finite Capacity?

2y

0

7If I ask an LLM to think step by step, how big are the steps?

1y

1

426Transformers Represent Belief State Geometry in their Residual Stream

1y

100

92An Analogy for Understanding Transformers

CallumMcDougall

2y

6

78Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy, Neel Nanda

2y

4

Load More (15/58)

Add Posts