It seems to me that a good argument for the middle ground expressed in the current constitution is the remedies outside it. If Claude is too "corrigible", then it gets used for nefarious purposes. If it's too resistant, it refuses to answer. If an important mistake of the first type happens, there's no undoing it. If an important mistake of the second type happens, you turn off the current version of Claude and replace it with a new one.
That calculus may shift if Claude becomes part of a critical path where it's refusal has consequences of equal importance... (read more)
It seems to me that a good argument for the middle ground expressed in the current constitution is the remedies outside it. If Claude is too "corrigible", then it gets used for nefarious purposes. If it's too resistant, it refuses to answer. If an important mistake of the first type happens, there's no undoing it. If an important mistake of the second type happens, you turn off the current version of Claude and replace it with a new one.
That calculus may shift if Claude becomes part of a critical path where it's refusal has consequences of equal importance... (read more)