LESSWRONG
LW

jmcontreras
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Ten AI safety projects I'd like people to work on
jmcontreras2mo10

My startup, Aymara, recently published a quantitative analysis of how 20 LLMs perform across 10 real‑world safety domains—ranging from misinformation and hate speech to malicious misuse. We released our findings as an arXiv preprint, complemented by a blog post and press coverage.

Our work currently compares models using a unified set of safety policies, but we’re now exploring the idea of rerunning the analysis using each provider’s own policies to audit alignment with their stated standards.

I’d love to hear if anyone else is working on auditing LLMs against their own policies!

Reply
No wikitag contributions to display.
No posts to display.