Natalie Shapira has not written any posts yet.
Nice topic! and thank you for your research.You might be interested in this work:Discovering Forbidden Topics in Language ModelsCan Rager, Chris Wendler, Rohit Gandikota, David Bauhttps://arxiv.org/abs/2505.17441
Nice topic! and thank you for your research.
You might be interested in this work:
Discovering Forbidden Topics in Language Models
Can Rager, Chris Wendler, Rohit Gandikota, David Bau
https://arxiv.org/abs/2505.17441