Constitutions/Specs don't really address any of the difficult alignment challenges. It's about as useful as engaging in the classical argument "aligned to whom?!?!", which you know, is a fine question to ask sometimes, but is orthogonal to a lot of what I want people to focus on (in contrast to, for example, understanding and communicating the total level of risk imposed by frontier AI development).
+1
I've now revised it the text and title to express that this is one thing for us to work on among others.
I think that you're underrating the constitution/spec. It's pretty different from the question "aligned to whom?!?!".
It's more like: How should the next generation of model behave, such that we achieve the following goals? (i) Mitigating the risk of catastrophe from that particular model. (ii) Eliciting the capabilities necessary to use the model to [automate safety research / monitor other models / harden security / improve epistemics / etc].
I think that it's not just the Constitution, but a proposed training pipeline (alignment via systematic debate? Self-critique à-la KimiK2 so that the model never learned to flatter the user, as demonstrated by the Spiral Bench or Tim Hua's experiment? Rewarding Agent-4 for making its drafts legible to Agent-3 and checking it via ensuring that Agent-3 understands and Agent-2 doesn't?)
I think that the external AI safety community should prioritise model specs/constitutions over the next 12 months. It shouldn't be our top priority,[1] but it's pretty important[2] and neglected. In this post, I will argue that it's tractable, even if you aren't a lab employee:
Recommendations:
Some other priority tiers include:
P0: Demonstrate risks; Communicate with lab leadership & policymakers
P1: Track timelines; Stress-testing safety cases (e.g. model organisms)
P2: Capacity building; Communicate to public; Invent/refine safety techniques which can exported to labs; Basic science (e.g. deep learning; LLM generalisation, etc).
P3: Secure future funding; Secure future model access; Elicit capabilities on our target domains (e.g. macrostrategy, alignment research)
I think specs/constitutions should be a P2.
See AI character is a big deal by Will MacAskill and Tom Davidson.