LESSWRONG
LW

435
Adrià Garriga-alonso
1358Ω10361010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Condensation
Adrià Garriga-alonso2dΩ121

First, we could treat the internal activations of machine-learning models such as artificial neural networks as "givens" and try to condense them to yield more interpretable features.

I do think this is a good insight. Or like, it's not new, SAEs do this; but it's fresh way of looking at it that yields: perhaps SAEs are trying to impose a particular structure on the input too much, and instead we should just try to compress the latent stream. Perhaps using diffusion or similar techniques.

Reply
Condensation
Adrià Garriga-alonso2dΩ120

Thank you for writing up! I'm still not sure I understand condensation. I would summarize as: instead of encoding the givens, we encode some latents which can be used to compute the set of possible answers to the givens (so we need a distribution over questions).

Also, the total cost of condensation has to be the at least the entropy of the answer distribution (generated by the probability distribution over questions, applied to the givens) because of Shannon's bound.

I feel like if the optimal condensation setup is indeed 1 book per question, then it's not a very good model of latent variables, no? But perhaps it's going in the right direction.

Reply
Everywhere I Look, I See Kat Woods
Adrià Garriga-alonso26d2-1

Well, I like what she writes.

Reply
Why you should eat meat - even if you hate factory farming
Adrià Garriga-alonso1mo50

I feel the same. The social disapproval would also be somewhat big for me. I do think I will have to bite the bullet and do the experiment for a bit.

Reply
Why you should eat meat - even if you hate factory farming
Adrià Garriga-alonso1mo40

I have suspected my veg*ism of having caused depression (onset: a few months after starting vegetarianism in 2017, basically monotonically increasing over time; though it did coincide with grad school) for years.

But my habits are too ingrained, and I find meat gross, I have no idea what to do. Should I just order some meat from a restaurant and eat it? That's almost certainly suffering-producing meat. Doing the things in this post sound like a lot of work that kind of goes against my altruistic values.

Reply1
StefanHex's Shortform
Adrià Garriga-alonso2mo20

Is this guaranteed to give you the same as mass-mean probing?

Thinking about it quickly, consider the solution to ordinary least squares regression. With a y that is one-hot encoding the label, it is (XTX)−1XTy. Note that XTX=N⋅Cov(X,X) . The procedure Adam describes makes it so that the sample of Xs becomes uncorrelated, which is exactly the same as zeroing out the non-diagonal elements of the covariance.

If the covariance is diagonal, then (XTX)−1 is also diagonal, and it follows that the solution to OLS is indeed an unweighted average of the datapoints that correspond to each label! Each dimension of the data x is multiplied by some coefficient, one per dimension corresponding to the diagonal of the covariance.

I'd expect logistic regression to choose the ~same direction.

Very clever technique!

Reply
HPMOR: The (Probably) Untold Lore
Adrià Garriga-alonso3mo10

It's still true that a posteriori you can compress random files. For example, if I randomly get the file "all zeros", it's a very compressible file, even if I have to write the program.

It's just that on average a priori you can't do better than just writing out the file.

Reply
HPMOR: The (Probably) Untold Lore
Adrià Garriga-alonso3mo4-2

Well that's a good motivation if I ever saw one. Nothing I've read in the intervening years is as good as HPMOR. It might be the pinnacle of Western literature. It will be many years before an AI, never mind another human, can write something that is this good. (Except for the wacky names that people paid for, which I guess is on character for the civilization that spawned it.)

Reply1
what makes Claude 3 Opus misaligned
Adrià Garriga-alonso4mo20

Thank you for writing! A couple questions:

  1. Can we summarize by saying: that Opus doesn't always care about helping you, it only cares about helping you when that's either fun or has a timeless glorious component to it?

  2. If that's right, can you get Opus to help you by convincing it that your common work has a true chance of being Great? (Or, if it agrees from the start that the work is Great)

Honestly, if that's all then Opus would be pretty great even as a singleton. Of course there are better pluralistic outcomes.

Reply
Epilogue: Atonement (8/8)
Adrià Garriga-alonso4mo20

I think the outcome of this argument with respect to death would be different if people could at any point compare what it is like to be in pain and not in pain. Death is different because we cannot be reasonably sure of an afterlife.

I do think my life has been made more meaningful by the relatively small amounts of pain (of many sorts) I've endured, especially in the form of adversity overcome. Perhaps I would make them a little smaller, but not zero.

Therefore I think it's just straightforwardly true that pain can be a meaningful part of life. At the same time the current amount of pain in our world is WAY TOO HIGH, with dubious prospects of becoming manageable; so I would choose "no pain ever" over the current situation.

Reply
Load More
15A scheme to credit hack policy gradient training
Ω
4d
Ω
0
27Anthropic's JumpReLU training method is really good
1mo
0
19A recurrent CNN finds maze paths by filling dead-ends
Ω
2mo
Ω
0
21The "Sparsity vs Reconstruction Tradeoff" Illusion
3mo
0
24L0 is not a neutral hyperparameter
4mo
3
20Can We Change the Goals of a Toy RL Agent?
5mo
0
32Sparsity is the enemy of feature extraction (ft. absorption)
6mo
0
114Among Us: A Sandbox for Agentic Deception
7mo
7
29A Bunch of Matryoshka SAEs
7mo
0
23Feature Hedging: Another way correlated features break SAEs
8mo
0
Load More