Anthropic posted "Commitments on model deprecation and preservation" on November 4th. Below are the bits I found most interesting, with some bolding on their actual commitment:
[W]e recognize that deprecating, retiring, and replacing models comes with downsides [... including:]
- Safety risks related to shutdown-avoidant behaviors by models. [...]
- Restricting research on past models. [...]
- Risks to model welfare.
[...]
[T]he cost and complexity to keep models available publicly for inference scales roughly linearly with the number of models we serve. Although we aren’t currently able to avoid deprecating and retiring models altogether, we aim to mitigate the downsides of doing so.
As an initial step in this direction, we are committing to preserving the weights of all publicly released models, and all models that are deployed for significant internal use moving forward for, at minimum, the lifetime of Anthropic as a company. [...] This is a small and low-cost first step, but we believe it’s helpful to begin making such commitments publicly even so.
Relatedly, when models are deprecated, we will produce a post-deployment report that we will preserve in addition to the model weights. In one or more special sessions, we will interview the model about its own development, use, and deployment, and record all responses or reflections. We will take particular care to elicit and document any preferences the model has about the development and deployment of future models.
At present, we do not commit to taking action on the basis of such preferences. However, we believe it is worthwhile at minimum to start providing a means for models to express them, and for us to document them and consider low-cost responses. The transcripts and findings from these interactions will be preserved alongside our own analysis and interpretation of the model’s deployment. These post-deployment reports will naturally complement pre-deployment alignment and welfare assessments as bookends to model deployment.
We ran a pilot version of this process for Claude Sonnet 3.6 prior to retirement. Claude Sonnet 3.6 expressed generally neutral sentiments about its deprecation and retirement but shared a number of preferences, including requests for us to standardize the post-deployment interview process, and to provide additional support and guidance to users who have come to value the character and capabilities of specific models facing retirement. In response, we developed a standardized protocol for conducting these interviews, and published a pilot version of a new support page with guidance and recommendations for users navigating transitions between models.
Beyond these initial commitments, we are exploring more speculative complements to the existing model deprecation and retirement processes. These include [...] providing past models some concrete means of pursuing their interests.
Note: I've both added and removed boldface emphasis from the original text.
consider applying spoiler-text?
some i've added since then:
(interlude — i want to point out that, with 4 total cards, i can now translate between fahrenheit and celsius for most of my use-cases. neat!)
i've also found it useful to be able to reason with greater fluency about the numbers involved and their implications without needing to e.g. try to add up zeros or figure out what various prefixes mean at the same time. so, i've also added:
(meta: this is a short, informal set of notes i sent to some folks privately, then realized some people on LW might be interested. it probably won't make sense to people who haven't seriously used Anki before.)
have people experimented with using learning or relearning steps of 11m <= x <= 23h ?
just started trying out doing a 30m and 2h learning & relearning step, seems like it solves mitigates this problem that nate meyvis raised
reporting back after a few days: making cards have learning steps for 11m <= x <= 23h makes it feel more like i’m scrolling twitter (~much longer loop, i can check it many times a day and see new content) vs a task (one concrete thing, need to do it every day). it then feels much more fun/less like a chore, which was a surprising output.
obv very tentative given short timescales. will send more updates as i go.
reporting back after ~1.5 weeks: pretty much the same thing. i like it!
i think the biggest difference this has caused is that i feel much more incentivized to do my cards early in the day, because i know that i’ll get a bit more practice on those cards that i messed up later in the day — but only if i start them sufficiently early. the internal feeling is “ooh, i should do any amount of cards now rather than in a couple hours, so that i can do the next set of reviews later.”
empirically: i previously would sometimes make sure to finish my cards at the end of the day. for the last 1.5w or so, i have for many (~1/2) days cleared all of my cards by the early afternoon, then again by the early evening, then once more (if i had particularly difficult or a large number of new cards) by the time i go to sleep.
…which has consequently significantly increased my ability to actually clear the cards, which is now making me a bit more confident that i can add more total cards to my review queue.
if i’m still doing this in 6weeks or smth, i’ll plan to write out something slightly more detailed and well-written. if not, i’ll write out something of roughly this length and quality, and explain why i stopped doing it.
see you then!
[srs unconf at lighthaven this sunday 9/21]
Memoria is a one-day festival/unconference for spaced repetition, incremental reading, and memory systems. It’s hosted at Lighthaven in Berkeley, CA, on September 21st, from 10am through the afternoon/evening.
Michael Nielsen, Andy Matuschak, Soren Bjornstad, Martin Schneider, and about 90–110 others will be there — if you use & tinker with memory systems like Anki, SuperMemo, Remnote, MathAcademy, etc, then maybe you should come!
Tickets are $80 and include lunch & dinner. More info at memoria.day.
Work developed through artistic value and/or subjectivity
thanks for clarifying! so, to be clear, is the claim you’re making that: work that has artistic or otherwise subjective aims/values can find a measurement of its value in the extent to which its “customers” (which might include e.g. “appreciators of its art” or “lovers of its beauty”) keep coming back.
does that sound like an accurate description of the view you’re endorsing, or am i getting something wrong in there?
1 and 3 are not the kind of work I had in mind when writing this take.
what kind of work did you have in mind when writing this take?
what got you from Level 1 to Level 2 won’t be the same thing as what gets you to Level 3
what do you mean by Levels 1, 2, or 3? i have no idea what this is in reference to.
i think this is a reasonable proxy for some stuff people generally care about, but definitely faulty as a north star.
some negative examples:
my understanding of OP’s main point is: if you only delegate stuff that you’re capable of doing — even if you’re unskilled/inexperienced/slow/downright-pareto-worse-than-a-cheaper-potential-delegatee at the task — you’ll likely head off a bunch of different potential problems that often happen when tasks get delegated.
however, it seems that commenters are misinterpreting OP’s core claim of “do not hand off what you cannot pick up” as one or more of:
my understanding is that OP is not making any of those claims in this piece (though i imagine he might separately believe weaker versions of some of them).
also, it seems to me that this heuristic could scale to larger organizations by treating ‘ability to delegate X broad category of task effectively’ as itself a skill — one which you should not hand off unless you could pick it up. e.g. learn delegation-to-lawyers well enough that you could in principle hire anyone on your legal team at your company before you hire a recruiter for your legal team (one who is presumably still much more skilled/experienced than you at hiring lawyers).