Teaser: Hard-coding Transformer Models

Unfortunately, I have a fairly demanding day job, and haven't found the time and energy yet.

Have you considered applying for a grant from the Long-Term Future Fund to buy out your day job so you can spend all your time working on this? As a fund manager for the LTFF, this is definitely the sort of thing we're often happy to fund, and I think that the research you're describing sounds pretty exciting.

[-]habryka4y80

Yeah, I also thought it was pretty interesting. I only thought about it for a few minutes, but it seems interesting enough to give it a shot, IMO.

[-][anonymous]4y20

I have definitely not thought about that before. Feedback from people I have shown this work to has ranged from (literally) "you are a madman" to "that looks cool" (and then never engaging with it).

[-]Vivek Hebbar4y50

Any update on this (applying for funding)?

[-]mtaran4y90

Sounds intriguing! You have a GitHub link? :)

[-][anonymous]4y20

It's very, very rough, but: https://github.com/epurdy/hand

[-]mtaran4y20

I'll make sure to run it when I get to a laptop. But if you ever get a chance to set the distill.pub article up to run on heroku or something, that'll increase how accessible this is by an order of magnitude.

[-]Igor Ostrovsky4y40

I (not the OP) put it up here for now: https://igor0.github.io/hand/distill/

I'll take it down if MadHatter asks me or once there is an official site.

[-][anonymous]4y20

Thanks for throwing it up there!!!

[-]gwern4y80

Any relation to RASP?

[-]gwern4y20

https://transformer-circuits.pub/2021/framework/index.html

[-]Kenoubi4y20

Thank you for sharing this. I know it's probably not why you posted it, but reading this paper was extremely helpful to me in understanding what Transformers are actually doing in the first place.

[-]Rudi C4y10

(Unrelated.) Have you considered putting an RSS field of your Twitter account on its bio? This way people can follow you without you needing to approve them, and since it’s read-only, your burden won’t increase.

(Not to mention that RSS is a much better medium than Twitter in the first place.)

[-]gwern4y20

I don't think Twitter allows such RSS feeds.

[-][anonymous]4y10

It's a pretty similar style of work, but I haven't communicated at all with those authors and I started my work before they published.

[-]Jsevillamol4y70

I think this is very impressive and that we could learn a lot from this kind of efforts.

Can you tell us more about your "training" process and the capabilities you can achieve, with examples?

[-]Rohin Shah4y30

Very cool!

A note of caution: when I handcoded weights of a neural network (in my case, to solve a gridworld RL problem), I was able to encode the optimal policy -- but the algorithm that was later learned by gradient descent was very different. Partly this was because I only required myself to produce the right action, so I often had the (equivalent of) Q-values for different actions be very very close to each other, whereas the neural network ended up having Q-values that were further apart from each other, which was incentivized by the loss function even though it didn't make a difference to the optimal policy.

So to the extent you're trying to learn what a neural net trained by gradient descent would do, I'd recommend that you spend some time looking at the trained neural net to see whether it is using a similar sort of algorithm as the one you're implementing.

[-][anonymous]4y10

Agree with this.

[-]Igor Ostrovsky4y20

Building up toy transformer models by hand that work ... that's super interesting, both for interpretability and also education.

I put up the site [here](https://igor0.github.io/hand/distill/) for now. MadHatter, let me know if you want me to take it down.

LESSWRONG
LW

LESSWRONG
LW

74

Teaser: Hard-coding Transformer Models

74

74