LESSWRONG
LW

649
jmcontreras
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Ten AI safety projects I'd like people to work on
jmcontreras3mo10

My startup, Aymara, recently published a quantitative analysis of how 20 LLMs perform across 10 real‑world safety domains—ranging from misinformation and hate speech to malicious misuse. We released our findings as an arXiv preprint, complemented by a blog post and press coverage.

Our work currently compares models using a unified set of safety policies, but we’re now exploring the idea of rerunning the analysis using each provider’s own policies to audit alignment with their stated standards.

I’d love to hear if anyone else is working on auditing LLMs against their own policies!

Reply