Thanks for doing the research and sharing this! I’ve been thinking about what moral philosophy and the humanities can bring to pretraining alignment interventions. I like the way you’ve operationalized Aydin et al.’s Model Raising idea. A couple of thoughts:
How well do you think this strategy will scale with better moral reflections? Right now, the reflections seem quite thin (based on the examples you’ve provided). They identify the morally relevant issue and cite the relevant article in the constitution, but they don’t demonstrate much ethical depth or m
Thanks for doing the research and sharing this! I’ve been thinking about what moral philosophy and the humanities can bring to pretraining alignment interventions. I like the way you’ve operationalized Aydin et al.’s Model Raising idea. A couple of thoughts:
- How well do you think this strategy will scale with better moral reflections? Right now, the reflections seem quite thin (based on the examples you’ve provided). They identify the morally relevant issue and cite the relevant article in the constitution, but they don’t demonstrate much ethical depth or m
... (read more)