Optimally Combining Probe Monitors and Black Box Monitors
Link to our arXiv paper "Combining Cost-Constrained Runtime Monitors for AI Safety" here: https://arxiv.org/abs/2507.15886. Code can be found here. Executive Summary * Monitoring AIs at runtime can help us detect and stop harmful actions. For cost reasons, we often want to use cheap monitors like probes to monitor all of...
Jul 27, 202552