How post-training shapes legal representations: probing SCOTUS opinions across model families
Papers like Turner et al 2025 and Betley et al 2026 have underscored the consequences of training data quality for model behavior. The Probing and Representation Engineering literatures have demonstrated the techniques we can use to detect concepts represented in model activations, and manipulate their expression. I was keen to...
Mar 157