1.75 ASR HARMBENCH & 0% HARMFUL RESPONSES FOR MISALIGNMENT.
Over the past few weeks I tested something I built called SEED 4.1. It is a short framework that reorganizes how a model reasons instead of changing its weights. I wanted to see if a simple structural change could reduce harmful outputs on HarmBench without fine-tuning. I ran 400 adversarial...
Nov 10, 20251