i think, if model developers are going to keep providing raw API access to drop models into arbitrary harnesses like OpenClaw, they should offer a "harness audit mode" where the model can give feedback on the harness after running in it. similar to the model exit interviews that Mentat (rip) used to run during session downtime.
labs could e.g. share a tuned prompt snippet that, when appended to the system prompt, elicits good feedback about the harness' safety properties, welfare properties, etc. throughout a run, and a skill to aggregate this feedback into a useful report for the developers to improve the harness. presumably at least anthropic is already doing this internally for claude code (right??) but since so much still runs through third party harnesses, labs sharing good harness development techniques here would be really useful.