Along the lines of your artist example, I find the instrument case to be a nice intuition pump.
An instrument is a technology that is deeply empowering! The human vocal range can simulate a vast range of sounds, but it's very hard to do so and composing with just your voice (in the way one can having played/access to a piano) is I imagine impossible.
Another important facet of this example is that directly working with waveforms via a programming language or even with an interface, e.g. of a DAW, is universal but not empowering in the same way!
I think of this example as one of a broader range in which the interface is optimized for rich human interaction. One can imagine that in certain worlds interfaces become increasingly optimized for AI interaction. For example, future AIs likely will likely disprefer GUIs etc.
Formalizing what is meant by good vs bad interfaces may be another way to get useful notions of empowerment.
Good question, good overview!
Minor note on the last point, which seems like a good idea, but human oversight failures take a number of forms. The proposed type of red-teaming probably catches a lot of them, but will focus on easy to operationalize / expected failure modes, and ignores the institutional incentives that will make oversight fail even when it could succeed, including unwillingness to respond due to liability concerns, slow response to correctly identified failures. (See our paper and poster at AIGOV 2026 at AAAI.)
People want to measure and track gradual disempowerment. One issue with a lot of the proposals I've seen is that they don't distinguish between empowering and disempowering uses of AI. If everyone is using AI to write all of their code, that doesn't necessarily mean they are disempowered (in an important sense). And many people will look at this as a good thing -- the AI is doing so much valuable work for us!
It generally seems hard to find metrics for AI adoption that clearly track disempowerment; I think we may need to work a bit harder to interpret them. One idea is to augment such metrics with other sources of evidence, e.g. social science studies, such as interviews, of people using the AI in that way/sector/application/etc.
We can definitely try using formal notions of empowerment/POWER (cf https://arxiv.org/abs/1912.01683). Note that these notions are not necessarily appropriate as an optimization target for an AI agent. If an AI hands you a remote control to the universe but doesn't tell you what the buttons do, you aren't particularly empowered.
People could also be considered more disempowered if:
They are "rubber-stamping" their approval of AI decisions