Workshop Report: Why current benchmarks approaches are not sufficient for safety? — LessWrong