Token Statistics Fail at AI Attack Detection But Generation Profiles Succeed
<Seeking feedbacks and mentorship on our project> My team built a new approach to detecting covert AI agent attacks, which we tested at the Apart Research's AI Control Hackathon. Early feedback suggests the core idea has promise. I'm posting here to get broader input on whether the approach is fundamentally...
Apr 115