Harmfulness Directions in OLMo
by Daniele Pace, Bryan Maruyama, and LorenzoPacchiardi
Introduction This work was conducted as part of the MARS 4.0 program, supervised by Lorenzo Pacchiardi, with Hannes Whittingham and Mikhail Mironov as research managers. The core empirical work was carried out by Bryan Maruyama and Daniele Pace. In this technical report, we treat harmfulness as a composition of subcategories...
Jun 912