LESSWRONG
LW

2593
leogao
7700Ω901325410
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Alignment Stream of Thought
7leogao's Shortform
Ω
3y
Ω
557
Anthropic Commits To Model Weight Preservation
leogao2d30

i don't think this argument is the right type signature to change the minds of the people who would be making this decision.

Reply
Anthropic Commits To Model Weight Preservation
leogao2d40

you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.

Reply
Anthropic Commits To Model Weight Preservation
leogao2d911

both costs of serving lots of obsolete models seem pretty real. you either have to keep lots of ancient branches and unit tests around in your inference codebase that you have to support indefinitely, or fork your inference codebase into two codebases, both of which you have to support indefinitely. this slows down dev velocity and takes up bandwidth of people who are already backlogged on a zillion more revenue critical things. (the sad thing about software is that you can't just leave working things alone and assume they'll keep working... something else will change and break everything and then effort will be needed to get things back to working again.)

and to have non-garbage latency it would also involve having a bunch of GPUs sit 99% idle to serve the models. if you're hosting one replica of every model you've ever released, this can soak up a lot of GPUs. it would be a small absolute % of all the GPUs used for inference, but people just aren't in the habit of allocating that many GPUs for something that very few customers would care about. it's possible to be much more GPU efficient at the cost of latency, but to get this working well is a sizeable amount of engineering effort - to setup, weeks of your best engineers' time, or months of good engineer time (and a neverending stream of maintenance)

so like in some sense neither of these are huge %s, but also you don't get to be a successful company by throwing away 5% here, 5% there.

Reply
leogao's Shortform
leogao3d21

I mean, even in the Emmett Till Arlington case, which is what I assume you're referring to, it seems really hard for his staff members to have known, without the benefit of hindsight, that this was any significant window into his true beliefs? I mean, johnson is famously good at working himself up into appearing to genuinely believe whatever is politically convenient at the moment, and he briefly miscalculated the costs of supporting civil rights in this case. his apparent genuineness in this case doesn't seem like strong evidence.

Reply
leogao's Shortform
leogao4d21

so it sounds like there's basically no way anyone could have known that johnson would actually be a pro civil-rights president, and that all the civil rights people who were opposed to the 1957 bill at the time were basically opposed for the right reasons? like basically everything we know about johnson as of 1960 suggests that he is telling everyone what they want to hear and it's unclear whether he has any convictions of his own except for his strong track record of defending the interests of the south.

Reply
Why Is Printing So Bad?
leogao7d76

idk, i don't print stuff that often, but printing mostly just works for me. it's not always smooth sailing, but it's not any less smooth sailing than anything else i deal with when running ML experiments

Reply
leogao's Shortform
leogao10d20

i mean like writing kernels or hill climbing training metrics is viscerally fun even separate from any of the status parts. i know because long before any of this ai safety stuff, before ai was such a big deal, i would do ML stuff literally purely for fun without getting paid or trying to achieve glorious results or even publishing it anywhere for anyone else to see.

Reply
1a3orn's Shortform
leogao11d141

I think trying to win the memetic war and trying to find the truth are fundamentally at odds with each other, so you have to find the right tradeoff. fighting the memetic war actively corrodes your ability to find the truth. this is true even if you constrain yourself to never utter any knowing falsehoods - even just arguing against the bad arguments over and over again calcifies your brain and makes you worse at absorbing new evidence and changing your mind. conversely, committing yourself to finding the truth means you will get destroyed when arguing against people whose only goal is to win arguments.

Reply
leogao's Shortform
leogao11dΩ120

the premise that i'm trying to take seriously for this thought experiment is, what if the "claude is really smart and just a little bit away from agi" people are totally right, so that you just need to dial up capabilities a little bit more rather than a lot more, and then it becomes very reasonable to say that claude++ is about as aligned as claude. 

(again, i don't think this is a very likely assumption, but it seems important to work out what the consequences of this set of beliefs being true would be)

or at least, conditional on (a) claude is almost agi and (b) claude is mostly aligned, it seems like quite a strong claim to say "claude++ crosses the agi (= can kick off rsi) threshold at basically the same time it crosses the 'dangerous-core-of-generalization' threshold, so that's also when it becomes super dangerous." it's way stronger a claim than "claude is far away from being agi, we're going to make 5 breakthroughs before we achieve agi, so who knows whether agi will be anything like claude." or, like, sure, the agi threshold is a pretty special threshold, so it's reasonable to privilege this hypothesis a little bit, but when i think about the actual stories i'd tell about how this happens, it just feels like i'm starting from the bottom line first, and the stories don't feel like the strongest part of my argument.

(also, i'm generally inclined towards believing alignment is hard, so i'm pretty familiar with the arguments for why aligning current models might not have much to do with aligning superintelligence. i'm not trying to argue that alignment is easy. or like i guess i'm arguing X->alignment is easy, which if you accept it, can only ever make you more likely to accept that alignment is easy than if you didn't accept the argument, but you know what i mean. i think X is probably false but it's plausible that it isn't and importantly a lot of evidence will come in over the next year or so on whether X is true)

Reply
1a3orn's Shortform
leogao12d117

I agree. I think spending all of one's time thinking about and arguing with weakman arguments is one of the top reasons why people get set in their ways and stop tracking the truth. I aspire not to do this

Reply
Load More
No wikitag contributions to display.
151My takes on SB-1047
1y
8
112Scaling and evaluating sparse autoencoders
Ω
1y
Ω
6
55Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Ω
2y
Ω
5
106Shapley Value Attribution in Chain of Thought
Ω
3y
Ω
7
42[ASoT] Some thoughts on human abstractions
Ω
3y
Ω
4
67Clarifying wireheading terminology
Ω
3y
Ω
6
103Scaling Laws for Reward Model Overoptimization
Ω
3y
Ω
13
27How many GPUs does NVIDIA make?
Q
3y
Q
2
81Towards deconfusing wireheading and reward maximization
Ω
3y
Ω
7
27Humans Reflecting on HRH
Ω
3y
Ω
4
Load More