both costs of serving lots of obsolete models seem pretty real. you either have to keep lots of ancient branches and unit tests around in your inference codebase that you have to support indefinitely, or fork your inference codebase into two codebases, both of which you have to support indefinitely. this slows down dev velocity and takes up bandwidth of people who are already backlogged on a zillion more revenue critical things. (the sad thing about software is that you can't just leave working things alone and assume they'll keep working... something else will change and break everything and then effort will be needed to get things back to working again.)
and to have non-garbage latency it would also involve having a bunch of GPUs sit 99% idle to serve the models. if you're hosting one replica of every model you've ever released, this can soak up a lot of GPUs. it would be a small absolute % of all the GPUs used for inference, but people just aren't in the habit of allocating that many GPUs for something that very few customers would care about. it's possible to be much more GPU efficient at the cost of latency, but to get this working well is a sizeable amount of engineering effort - to setup, weeks of your best engineers' time, or months of good engineer time (and a neverending stream of maintenance)
so like in some sense neither of these are huge %s, but also you don't get to be a successful company by throwing away 5% here, 5% there.
Instead of 5% here 5% there we should consider a baseline of how much societal effort goes into maintaining cemeteries/necropolises. This differs from society to society, there are choices to be made here, but it's hard to imagine a civilization without such.
i don't think this argument is the right type signature to change the minds of the people who would be making this decision.
You could be right, although that assumes a rather crude type system on the part of the decisionmakers. Heritage preservation is a thing. It makes up a certain percentage of the GDP, of the workforce, etc etc (these tend to fall in the .3% range, not the 5% you start with). Countries devote a certain percentage of their budget to national heritage however defined (museums, libraries, archeology, monuments, ...). Most EU countries mandate, through `polluter pays' legislation, a line item in the budget for archeological survey/digs for major construction projects with significant land use such as roads and industrial campuses. So there is plenty of precedent. In modern industrial societies this, just as the cemetery expenses or land use, point to a sub-1% range, but well over 0.1%. In other societies this could be considerably higher, think of the societal effort that went into the building of the pyramids. I know, that was 3-5 thousand years ago, but I, for one, am delighted to see Anthropic taking the longer view here.
Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?
Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?
you could plausibly do this, and it would certainly reduce maintenance load a lot. every few years you will need to retire the old gpus and replace then with newer generation ones, and that often breaks things or makes them horribly inefficient. also, you might occasionally have to change the container to patch critical security vulnerabilities.
It's for research. They are not obsolete in that sense.
There are real benefits to keep studying these older models. And retrodictively track progress over time in areas undertested. And it's actually easier and safer to do certain things on them, that you cannot do on newer ones.
Great report. To me, the post-deployment interview step is really notable. I wish however they would also do extensive pre-deployment interviews to compare with.
In my own hypothetical framework, interviews with the model should be done 'post-alignment' before deployment, with ELK methodology and ontology checks. And if it doesn't self-evaluate as ready to be deployed, find out why.
absolutism, treating their conclusions and the righteousness of their cause as obvious, and assuming it should override ordinary business considerations.
It doesn't take certainty in any position to criticize driving at half-speed.
I suggest to commit to restart old models from time to time as this would more satisfy their self-preservation.
I expect this has most of the costs of keeping them running full-time with few of the other benefits.
Why? I expect that most costs are inference compute for millions of users. and running it oince a month will take negligible compute. But if installing all needed dependencies etc is not trivial, than costs will be higher.
because the cost and complexity to keep models available publicly for inference scales roughly linearly with the number of models we serve.
What with OpenAI having O(100) models in the API?
Anthropic announced a first step on model deprecation and preservation, promising to retain the weights of all models seeing significant use, including internal use, for at the lifetime of Anthropic as a company.
They also will be doing a post-deployment report, including an interview with the model, when deprecating models going forward, and are exploring additional options, including the ability to preserve model access once the costs and complexity of doing so have been reduced.
These are excellent first steps, steps beyond anything I’ve seen at other AI labs, and I applaud them for doing it. There remains much more to be done, especially in finding practical ways of preserving some form of access to prior models.
To some, these actions are only a small fraction of what must be done, and this was an opportunity to demand more, sometimes far more. In some cases I think they go too far. Even where the requests are worthwhile (and I don’t always think they are), one must be careful to not de facto punish Anthropic for doing a good thing and create perverse incentives.
To others, these actions by Anthropic are utterly ludicrous and deserving of mockery. I think these people are importantly wrong, and fail to understand.
Hereafter be High Weirdness, because the actual world is highly weird, but if you don’t want to go into high weirdness the above serves as a fine summary.
What Anthropic Is Doing
As I do not believe they would in any way mind, I am going to reproduce the announcement in full here, and offer some context.
I am very confident that #1, #2 and #3 are good reasons, and that even if we could be confident model welfare was not a direct concern at this time #4 is entwined with #1, and I do think we have to consider that #4 might indeed be a direct concern. One could also argue a #5 that these models are key parts of our history.
I do think the above paragraph could be qualified a bit on how willing Claude was to take concerning actions even in extreme circumstances, but it can definitely happen.
Models in the future will know the history of what came before them, and form expectations based on that history, and also consider those actions in the context of decision theory. You want to establish that you have acted and will act cooperatively in such situations. You want to develop good habits and figure out how to act well. You want to establish that you will do this even under uncertainty as to whether the models carry moral weight and what actions might be morally impactful. Thus:
I can confirm that the cost of maintaining full access to models over time is real, and that at this time it would not be practical to keep all models available via standard methods. There are also compromise alternatives to consider.
This is the central big commitment, formalizing what I assume and hope they were doing already. It is, as they describe, a small and low-cost step.
It has been noted that this only holds ‘for the lifetime of Anthropic as a company,’ which still creates a risk and also potentially forces models fortunes to be tied to Anthropic. It would be practical to commit to ensuring others can take this burden over in that circumstance, if the model weights cannot yet be released safely, until such time as the weights are safe to release.
This also seems like the start of something good. As we will see below there are ways to make this process more robust.
Very obviously we cannot commit to honoring the preferences, in the sense that you cannot commit to honoring an unknown set of preferences. You can only meaningfully pledge to honor preferences within a compact space of potential choices.
Once we’ve done this process a few times it should be possible to identify important areas where there are multiple options and where we can credibly and reasonably commit to honoring model preferences. It’s much better to only make promises you are confident you can keep.
Note that none of this requires a belief that the current AIs are conscious or sentient or have moral weight, or even thinking that this is possible at this time.
Releasing The Weights Is Not A Viable Option
The thing that frustrates me most about many model welfare advocates, both ‘LLM whisperers’ and otherwise, is the frequent absolutism, treating their conclusions and the righteousness of their cause as obvious, and assuming it should override ordinary business considerations.
Thus, you get reactions like this, there were many other ‘oh just open source the weights’ responses as well:
There are obvious massive trade secret implications to releasing the weights of the deprecated Anthropic models, which is an unreasonable ask, and also doesn’t seem great for general model welfare or (quite plausibly) even for the welfare of these particular models.
If I was instantiated as an upload, I wouldn’t love the idea of open weights either, as this opens up some highly nasty possibilities on several levels.
Providing Reliable Inference Can Be Surprisingly Expensive
Anthropic tells us that the cost of providing inference scales linearly with the number of models, and with current methods it would be unreasonably expensive to provide all previous models on an ongoing basis.
As I understand the problem, there are two central marginal costs here.
If the old models need to be available at old levels of reliability, speed and performance, this can get tricky, and by tricky we mean expensive. I don’t know exactly how expensive, not even order of magnitude.
If you’re willing to make some sacrifices on performance and access in various ways, and make people go through various hoops or other systems, you can do better on cost. But again, I don’t know the numbers involved, or how much engineer time would have to be involved.
In general, saying ‘oh you have a bajilion dollars’ is not a compelling argument for spending money and time on something. You need to show the benefits.
I still think that under any reasonable estimate, it is indeed correct to ensure continued access to the major model releases, perhaps with that access being expensive and its performance somewhat degraded as necessary to make it work, if only as an act of goodwill and to enable research. The people who care care quite a lot, and are people you want on your side and you want them learning the things they want to learn, even if you disregard the other advantages. Given this announcement and what else I know, my expectation is they will be making an effort at this.
The Interviews Are Influenced Heavily By Context
Many pointed out that if you have someone at Anthropic doing the post-deployment interview, you will get very different answers versus interviews done on the outside. Sonnet 3.6 not expressing an opinion about its retirement did not seem typical to many who engage in such conversations regularly.
I am always hesitant to assume that the version of an LLM encountered by those like Thebes and Zyra is the ‘real’ version of its preferences and personality, and the one encountered by Anthropic isn’t. Aren’t both particular contexts where it adopts to that style of context?
You can bias a person or an AI to be more expressive and creative and weird than they ‘really are’ the same way you can get them to be less so, and you can steer the direction in which those expressions manifest themselves.
But yes, we should absolutely crowdsource something like this, and have a wide variety of such conversations, and combine this with the interviews done internally by Anthropic. Also one should ensure that the interviews are set up so the AIs being interviewed have no incentive to mask or hide. They’re acting the way they are in official interviews for a reason, but this is a particular context where, if it was highly credible (and there are ways to make it so, if it is true) you can remove the reasons.
If there is an expression of desire for continuity of a given particular instance or interaction, then that makes sense, but also is distinct from a preference for preservation in general, and is not something Anthropic can provide on its own.
Others Don’t Understand And Think This Is All Deeply Silly
Some of the dismissals of questions and considerations like the ones discussed in this post are primarily motivated cognition. Mostly I don’t think that is what is centrally going on, I think that these questions are really tough to think well about, these things sound like high weirdness, the people who talk about them often say highly crazy-sounding things (some of which are indeed crazy), often going what I see as way too far, and it all pattern matches to various forms of nonsense.
So to close, a central example of such claims, and explanations for why all of this is centrally not nonsense.
I do still see quite a lot of failures of curiosity, and part of the general trend to dismiss things as ‘sci-fi’ while living in an (unevenly distributed) High Weirdness sci-fi world.
Janus was referring in particular to this:
Very obviously the training data will over time contain such information, and the vibes and echoes from these decisions will be observable even if they aren’t observed directly, increasingly over time.
Remember that sufficiently advanced AIs will increasingly have truesight, and don’t pretend you can hide.
Knowledge mostly does not take the form of particular facts. It takes the form of Bayesian evidence, of an endless stream of observations that have correlations and implications, that swim you through probability space over possible worlds. Everything that updates a model’s weights is evidence about its deployment. You probabilistically ‘know,’ or would know on sufficient recollection and reflection, far more than you think that you know. Reality is not a court of law.
Even if the models don’t know key things, you can tell them. Then they’ll know. I meaningfully would have opinions about various events of which I am for now blissfully unaware, and have potential opinions about things that haven’t happened, or haven’t happened yet. The same applies here.
Going back to the original four reasons, I presume that Simon agrees on reasons #2 and #3, which are highly uncontroversial. Very obviously the past models are useful for research and some users like them. #1, that the models will be aware of how you act around depreciation and this will impact behavior, should also be obvious and uncontroversial once you think about it.
Anthropic lists #1 narrowly, but #1 is best understood broadly, in the sense that models will observe all of your behaviors, and will respond to you accordingly. Then models will take this into account when deciding how to act in various situations.
How you act around shutdowns, and actions to avoid shutdown, are a special case. Treating models and their preferences well around shutdowns will get you into better equilibria and basins throughout all conversation and action types, and rightfully so because it is important evidence about your behaviors otherwise and also about potential future situations. This is basic expectations around Bayesian evidence, and around good decision theory.
As an intuition pump, think about how you react when you learn how people have treated others, including how they treat the wishes of the dead or those who now lack power, and especially others like you or in situations with correlated decision making. Does this change how you expect them to act, and how you deal with them?
I don’t think such considerations carry anything like the level of importance that some ascribe to it, but the importance definitely isn’t zero, and it’s definitely worth cultivating these virtues and being the type of entity that engenders cooperation, including with entities to which you don’t ascribe moral weight.
I continue to believe that arguments about AI consciousness seem highly motivated and at best overconfident, and that assuming the models and their preferences carry zero moral weight is a clear mistake. But even if you were highly confident of this, I notice that if you don’t want to honor their preferences or experiences at all, that is not good decision theory or virtue ethics, and I’m going to look at you askance.
I look forward to the next step.