The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective
The AI safety field faces a critical challenge: we need researchers who can not only implement existing solutions but also forge new, independent paths. In 2023, inspired by John Wentworth's work on agency and learning from researchers like Rohin Shah and Adam Shimi who have highlighted the limitations of standard AI safety education, we launched the Alignment Mapping Program (AMP). Though the curriculum is still a work in progress, you can explore it here. This post reflects on our 2024 pilot, sharing data-driven insights, key program changes, and a call to action for the LessWrong community.
Traditional AI safety education often emphasizes existing frameworks. While valuable, this approach can inadvertently stifle the development of...
Some naive thoughts in case useful:
A) Is the structured annotation format more useful than a gamemaster/writer thinking aloud while recording themselves (possibly with an audience)?
That could be the closest thing to a full transcript of the human process which downstream tasks could condense as needed. An adopted annotation format (prescribed or not) could potentially cause thoughts to be filtered, reinterpreted, or even steer human generation?
One key example against a fixed-format annotation, I think is that human gamemasters and writers do not spend appr...
I think it is worth noting that we do not quite know the reasons for the events and it may be too soon to say that the safety situation at OpenAI has worsened.
Manifold does not seem to consider safety conflicts a likely cause: