LESSWRONG
LW

69
leogao
7835Ω933325470
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Alignment Stream of Thought
No wikitag contributions to display.
7leogao's Shortform
Ω
3y
Ω
562
7 Vicious Vices of Rationalists
leogao10h52

related to contrarianism: not invented here syndrome. I think rationalists have a strong tendency to want to reinvent things their own way, bulldozing Chesterton's fences, or just reinventing the wheel with rationalist flavor. good in moderation, bad in excess

Reply
leogao's Shortform
leogao2d50

i don't have a graph for it. the corresponding number is p(correct) = 0.25 at 63 elements for the one dense model i ran this on. (the number is not in the paper yet because this last result came in approximately an hour ago)

the other relevant result in the paper for answering the question of how similar our sparse models are to dense models is figure 33

Reply1
leogao's Shortform
leogao2d453

creating surprising adversarial attacks using our recent paper on circuit sparsity for interpretability

we train a model with sparse weights and isolate a tiny subset of the model (our "circuit") that does this bracket counting task where the model has to predict whether to output ] or ]]. It's simple enough that we can manually understand everything about it, every single weight and activation involved, and even ablate away everything else without destroying task performance.

(this diagram is for a slightly different task because i spent an embarassingly large number of hours making this figure and decided i never wanted to make another one ever again)

in particular, the model has a residual channel delta that activates twice as strongly when you're in a nested list. it does this by using the attention to take the mean over a [ channel, so if you have two [s then it activates twice as strongly. and then later on it thresholds this residual channel to only output ]] when your nesting depth channel is at the stronger level.

but wait. the mean over a channel? doesn't that mean you can make the context longer and "dilute" the value, until it falls below the threshold? then, suddenly, the model will think it's only one level deep!

it turns out that indeed, this attack works really well on the entire sparse model (not just the circuit), and you can reliably trick it. 

in retrospect, this failure is probably because extremely long nested rows are out of distribution on our specific pretraining dataset. but there's no way i would have come up with this attack by just thinking about the model.

one other worry is maybe this is just because of some quirk of weight-sparse models. strikingly, it turns out that this attack also transfers to similarly capable dense models!

Reply
Introducing faruvc.org
leogao5d71

where can i find more info on how far uvc compares to air purifiers? is it vastly more effective, or merely quieter? the website touches on it only very briefly.

Reply
Omniscaling to MNIST
leogao8d70

I've heard good things about tiny-imagenet and fashion-mnist. even full imagenet is not that bad anymore with modern hardware.

Reply
Omniscaling to MNIST
leogao9d111

i agree it's often a good idea to do experiments on small scale before big scale, in order to get tighter feedback loops, but i think mnist in particular is probably a bad task to start with. I think you probably want a dataset that's somewhat more nontrivial.

  • mnist is way too easy to solve. like literally solvable with a logistic regression level easy. as a result, mnist models are often not doing any interesting cognition.
  • a lot of good ideas start working at some minimal scale that is higher than logistic regression. you will discard lots of good ideas because they don't work on mnist. conversely, lots of things only work on mnist and other very easy datasets like cifar.
  • i think mnist is mostly useful for, like, sanity checking that a diffusion implementation isn't fatally broken or something.
  • it is really not that much more expensive to look at a slightly more modern toy dataset. gpus like big matrices and they hate small matrices. mnist is so small that most of your computational cost is overhead. you can still train models in seconds on slightly harder datasets.
Reply
Anthropic Commits To Model Weight Preservation
leogao11d67

i don't think this argument is the right type signature to change the minds of the people who would be making this decision.

Reply
Anthropic Commits To Model Weight Preservation
leogao11d50

you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.

Reply
Anthropic Commits To Model Weight Preservation
leogao11d911

both costs of serving lots of obsolete models seem pretty real. you either have to keep lots of ancient branches and unit tests around in your inference codebase that you have to support indefinitely, or fork your inference codebase into two codebases, both of which you have to support indefinitely. this slows down dev velocity and takes up bandwidth of people who are already backlogged on a zillion more revenue critical things. (the sad thing about software is that you can't just leave working things alone and assume they'll keep working... something else will change and break everything and then effort will be needed to get things back to working again.)

and to have non-garbage latency it would also involve having a bunch of GPUs sit 99% idle to serve the models. if you're hosting one replica of every model you've ever released, this can soak up a lot of GPUs. it would be a small absolute % of all the GPUs used for inference, but people just aren't in the habit of allocating that many GPUs for something that very few customers would care about. it's possible to be much more GPU efficient at the cost of latency, but to get this working well is a sizeable amount of engineering effort - to setup, weeks of your best engineers' time, or months of good engineer time (and a neverending stream of maintenance)

so like in some sense neither of these are huge %s, but also you don't get to be a successful company by throwing away 5% here, 5% there.

Reply
leogao's Shortform
leogao12d*21

I mean, even in the Felix Longoria Arlington case, which is what I assume you're referring to, it seems really hard for his staff members to have known, without the benefit of hindsight, that this was any significant window into his true beliefs? I mean, johnson is famously good at working himself up into appearing to genuinely believe whatever is politically convenient at the moment, and he briefly miscalculated the costs of supporting civil rights in this case. his apparent genuineness in this case doesn't seem like strong evidence.

Reply
Load More
151My takes on SB-1047
1y
8
112Scaling and evaluating sparse autoencoders
Ω
1y
Ω
6
55Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Ω
2y
Ω
5
106Shapley Value Attribution in Chain of Thought
Ω
3y
Ω
7
42[ASoT] Some thoughts on human abstractions
Ω
3y
Ω
4
67Clarifying wireheading terminology
Ω
3y
Ω
6
103Scaling Laws for Reward Model Overoptimization
Ω
3y
Ω
13
27How many GPUs does NVIDIA make?
Q
3y
Q
2
81Towards deconfusing wireheading and reward maximization
Ω
3y
Ω
7
27Humans Reflecting on HRH
Ω
3y
Ω
4
Load More