Epistemic Status: Trying to clarify a confusion people outside of the AI safety community seem to have about what safety means for AI systems.
In engineering and design, there is a process that includes, among other stages, specification, creation, verification and validation, and deployment. Verification and validation are where most people focus when thinking about safety - can we make sure the system performs correctly? I think this is a conceptual error that I want to address.
"Verification and validation (also abbreviated as V&V) are independent procedures that are used together for checking that a product, service, or system meets requirements and specifications and that it fulfills its intended purpose." - Wikipedia
Both of these terms are used slightly differently across fields, but in general, verification is the process of making sure that the system fulfills the design requirements and/or other standards. This pre-supposes that the system has some defined requirements or a standard, at least an implicit one, and that it could fail to meet that bar. That is, the specification of the system includes what it must and must not do, and if the system does not do what it should, or does something that it should not, it fails.
Machine learning systems, especially language models, aren't well understood. The potential applications are varied and uncertain, entire classes of new and surprising failure modes are still being found, and we have nothing like a specification of what the system should or should not do, must or must not do, and where it can and cannot be used.
To take a very concrete example, metal rods have safety characteristics, and they might be rated for use up to some weight limit, under some specific load for some amount of time, in certain temperature ranges, for some amount of time. These can all be tested. If the bar does not stay within a predefined range of characteristics at a given temperature, with a given load, it fails. It can also be found to be acceptable in one temperature range, but not another, or similar. At the end of verification and validation, the bar is deemed to have passed or failed for a given application, based on what the requirements for that larger system are.
At its best, red-teaming and safety audits of ML systems check lots of known failure modes, and determine whether they are susceptible. There is no pre-defined standard or set of characteristics that are checked, no real ability to consider application specific requirements, and no ability to specify where the system should not or must not be used.
Until we have some safety standard for machine learning models, they aren't "partly safe" or "assumed safe," or "good enough for consumer use." If we lack a standard for safety, ideally one where there is consensus that it is sufficient for a specific application, then exploration or verification of the safety of a machine learning model is meaningless. If a model is released to the public without a clear indication about what the system can safely be used for, with verification that it passed a relevant standard, and clear instruction that it cannot be used elsewhere, it is an unsafe model. Anyone who claims otherwise seems fundamentally confused about what safety means for such systems.
Agreed - I wasn't criticizing AI safety here, I was talking about the conceptual models that people outside of AI safety have - as was mentioned in several other comments. So my point was about what people outside of AI safety think about when talking about ML models, trying to correct a broken mental model.
I did not say anything about evals and red teaming in application to AI, other than in comments where I said I think they are a great idea. And the fact that they are happening very clearly implies that there is some possibility that the models perform poorly, which, again, was the point.
Perhaps it's outdated, but it is the understanding which engineers who I have spoken to who work on reliability and systems engineering actually have, and it matches research I did on resilience most of a decade ago, e.g. this. And I agree that there is discussion in both older and more recent journal articles about how some firms do things in various ways that might be an improvement, but it's not the standard. And even when doing agile systems engineering, use cases more often supplement or exist alongside requirements, they don't replace them. Though terminology in this domain is so far from standardized that you'd need to talk about a specific company, or even a specific project's process and definitions to have a more meaningful discussion.
I don't disagree with the conclusion, but the logic here simply doesn't work to prove anything. It implies that standards are insufficient, not that they are not necessary.