samshap - LessWrong

Thanks for sharing that study. It looks like your team is already well-versed in this subject!

You wouldn't want something that's too hard to extract, but I think restricting yourself to a single encoder layer is too conservative - LLMs don't have to be able to fully extract the information from a layer in a single step.

I'd be curious to see how much closer a two-layer encoder would get to the ITO results.

Stitching SAEs of different sizes

samshap12d20

:Here's my longer reply.

I'm extremely excited by the work in SAEs and their potential for interpretability, however I think there is a subtle misalignment in the SAE architecture and loss function, and the actual desired objective function.

The SAE loss function is:

, where $| | f (x) | |_{1} = \sum_{i} f_{i} (x)$ is the $ℓ 1$ -Norm.

$L (x) = E_{x} [| | x - W_{d e c} f (x) - b_{d e c} | |^{2} + λ | | f (x) | |_{1}]$

I would argue that, however, what you are actually trying to solve is the sparse coding problem:

$L (x; W_{d e c}, b_{d e c}) = E_{x} [{min}_{f} | | x - W_{d e c} f - b_{d e c} | |^{2} + λ | | f | |_{1}]$

where, importantly, the inner optimization is solved separately (including at runtime).

Since $f$ is an overcomplete basis, finding $f^{*}$ that minimizes the inner loop (also known as basis pursuit denoising^[1] ) is a notoriously challenging problem, one which a single-layer encoder is underpowered to compute. The SAE's encoder thus introduces a significant error ${~ f}_{e n c}$ , which means that you are actual loss function is:

$L (x; Θ) = E_{x} [| | x - W_{d e c} (f^{*} + {~ f}_{e n c}) - b_{d e c} | |^{2} + λ | | f^{*} + {~ f}_{e n c} | |_{1}]$

The magnitude of the errors would have to be determined empirically, but I suspect that it is enough to be a significant source of error..

There are a few things you could do reduce the error:

Ensuring that $W_{d e c}$ obeys the restricted isometry property^[2] (i.e. a cap on the cosine similarity of decoder weights), or barring that, adding a term to your loss function that at least minimizes the cosine similarities.
Adding extra layers to your encoder, so it's better at solving for $f^{*}$ .
Empirical studies to see how large the feature error is / how much reconstruction error it is adding.

^{^}
https://epubs.siam.org/doi/abs/10.1137/S003614450037906X?casa_token=E-R-1D55k-wAAAAA:DB1SABlJH5NgtxkRlxpDc_4IOuJ4SjBm5-dLTeZd7J-pnTAA4VQQ2FJ6TfkRpZ3c93MNrpHddcI
^{^}
http://www.numdam.org/item/10.1016/j.crma.2008.03.014.pdf

Stitching SAEs of different sizes

samshap13d10

This is great work. My recommendation: add a term in your loss function that penalizes features with high cosine similarity.

I think there is a strong theoretical underpinning for the results you are seeing.

I might try to reach out directly - some of my own academic work is directly relevant here.

Should I Finish My Bachelor's Degree?

samshap3mo30

This is one of those cases where it might be useful to list out all the pros and cons of taking the 8 courses in question, and then thinking hard about which benefits could be achieved by other means.

Key benefits of taking a course (vs. Independent study) beyond the signaling effect might include:

precommitting to learning a certain body of knowledge
curation of that body of knowledge by an experienced third party
additional learning and insight from partnerships / teamwork / office hours

But these depend on the courses and your personality. The precommitment might be unnecessary due to your personal work habits, the curation might be misaligned with what you are interested in learning, and the other students or TAs may not have useful insights that you can't figure out in your own.

Hope that helps.

TurnTrout's shortform feed

samshap3mo10

Instead of demanding orthogonal representations, just have them obey the restricted isometry property.

Basically, instead of requiring , we just require $\forall i \neq j : x_{i} \cdot x_{j} \leq ϵ$ .

This would allow a polynomial number of sparse shards while still allowing full recovery.

The Worst Form Of Government (Except For Everything Else We've Tried)

samshap4mo10

I think the success or failure of this model really depends on the nature and number of the factions. If interfactional competition gets too zero-sum (this might help us, but it helps them more, so we'll oppose it) then this just turns into stasis.

During ordinary times, vetocracy might be tolerable, but it will slowly degrade state capacity. During a crisis it can be fatal.

Even in America, we only see this factional veto in play in a subset of scenarios - legislation under divided government. Plenty of action at the executive level or in state governments don't have to worry about this.

Contra Scott on Abolishing the FDA

samshap7mo10

You switch positions throughout the essay, sometimes in the same sentence!

"Completely remove efficacy testing requirements" (Motte) "... making the FDA a non-binding consumer protection and labeling agency" (Bailey)

"Restrict the FDA's mandatory authority to labeling" logically implies they can't regulate drug safety, and can't order recalls of dangerous products. Bailey! "... and make their efficacy testing completely non-binding" back to Motte again.

"Pharmaceutical manufactures can go through the FDA testing process and get the official “approved’ label if insurers, doctors, or patients demand it, but its not necessary to sell their treatment." Again implies the FDA has no safety regulatory powers.

"Scott’s proposal is reasonable and would be an improvement over the status quo, but it’s not better than the more hardline proposal to strip the FDA of its regulatory powers." Bailey again!

Contra Scott on Abolishing the FDA

samshap7mo50

This is a Motte and Bailey argument.

The Motte is 'remove the FDAs ability to regulate drugs for efficacy'

The Bailey is 'remove the FDAs ability to regulate drugs at all'

The FDA doesn't just regulate drugs for efficacy, it regulates them for safety too. This undercuts your arguments about off-label prescriptions, which were still approved for use by the FDA as safe.

Relatedly, I'll note you did not address Scott's point on factory safety.

If you actually want to make the hardline position convincing, you need to clearly state and defend that the FDA should not regulate drugs for safety.

Decision theory is not policy theory is not agent theory

samshap11mo2-1

The differentiation between CDT as a decision theory and FDT as a policy theory is very helpful at dispelling confusion. Well done.

However, why do you consider EDT a policy theory? It's just picking actions with the highest conditional utility. It does not model a 'policy' in the optimization equation.

Also, the ladder analogy here is unintuitive.

Learning as you play: anthropic shadow in deadly games

samshap1y70

This doesn't make sense to me. Why am I not allowed to update on still being in the game?

I noticed that in your problem setup you deliberately removed n=6 from being in the prior distribution. That feels like cheating to me - it seems like a perfectly valid hypothesis.

After seeing the first chamber come up empty, that should definitively update me away from n=6. Why can't I update away from n=5 ?

LESSWRONG
LW

Posts

Wiki Contributions

Comments