adzcai

Message

adzcai has not written any posts yet.

adzcai

Message

adzcai has not written any posts yet.

adzcai

Replying toNo convincing evidence for gradient descent in activation space

adzcai3y

No convincing evidence for gradient descent in activation space

Regarding "GD++": this is almost identical to the dynamics you'd expect when doing gradient descent on linear regression. See p 10 of these lecture notes for an explanation.

Given, here they're applying this linear transformation to the input data and not as an operator on the weights, but my intuition says there's got to be some sort of connection here; It's "removing" (part of) the component of $x$ that can be represented as a linear combination of the data. (Apologies for a half-formed response; Happy to hear any connections others make.)

(Edited to fix link formatting.)

Replying toCognitive Emulation: A Naive AI Safety Proposal

adzcai3y

Cognitive Emulation: A Naive AI Safety Proposal

What do you see as the key differences between this and research in (theoretical) neuroscience? It seems to me like the goals you've mentioned are roughly the same goals as those of that field: roughly, to interpret human brain circuitry, often through modelling neural circuits via artificial neural networks. For example, see research like "Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation".

Replying toVideo/animation: Neel Nanda explains what mechanistic interpretability is

adzcai3y

Video/animation: Neel Nanda explains what mechanistic interpretability is

Or even better, finetuning an LLM to automate writing the code!

LESSWRONG
LW

LESSWRONG
LW

adzcai

adzcai

adzcai

adzcai