(This is cross-posted from my blog at https://grgv.xyz/blog/neurons1/. I'm looking for feedback: does it makes sense at all, and if there is any novelty. Also, if the folloup questions/directions make sense) While applying logit attribution analysis to transformer outputs, I have noticed that in many cases the generated token can...
I have described the problem of analogy-making interpretability in the previous post: given the examples of transformed sequences of numbers, what’s the mechanism behind figuring this transformation out, and applying it correctly to the incomplete (test) sequence? prompt: "0 1 2 to 2 1 0, 1 2 3 to 3...
Can LLM make analogies? Yes, according to tests done by Melanie Mitchell a few years back, GPT-3 is quite decent at “Copycat” letter-string analogy-making problems. Copycat was invented by Douglas Hofstadter in the 80s, to be a very simple “microworld”, that would capture some key aspects of human analogy reasoning....
I’m starting to learn about mechanistic interpretability, and I’m seeing lots of great visualizations of transformer internals, but somehow I’ve never seen the whole large model’s internal state shown at once, on one image. So I made this visualization, for Llama-2-7B. Attention matrices are on the left, in 32 rows...
There are many apps for blocking distracting websites: freedom.to, leechblock, selfcontrol, coldturkey, just to name a few. They are useful for maintaining focus, avoiding procrastination, and curbing addictive web surfing. They work well for blocking a list of a few distracting websites. For me, this is not enough, because I’m...