“for our threat modeling, we can just argue about what AIs will be capable of, rather than needing to argue about inductive biases or whatever”
Where else do you think capabilities come from?
The data, as Zack M Davis argues, and one of the takeaways from the deep learning revolution is that inductive biases mattered a lot less than we thought, and data is much more important for AI behavior than we thought.
While I appreciate being cited, I don't think this makes sense in context as a response to Wyeth's remark. ("Inductive biases" and "data" aren't somehow opposing explanations; doing induction on data trivially requires both, and Wyeth himself has written in favor of imitation leanring as an alignment strategy.)
I agree that induction on data does require an inductive bias without resorting to look-up tables, but I do claim that you were arguing that modern AI's behavior was more determined by the data than the architecture relative to what Max H was saying.
We’re running ControlConf in Berkeley on April 18-19. It's a two-day conference on AI control: the study of reducing risks from misalignment through safeguards that work even when AI models are trying to undermine them.
Since the last ControlConf (Feb 2025), AI agents have gotten way better. We’re approaching the point where control techniques are load-bearing for the safety of real agent deployments. There’s been a lot of progress too. Researchers have run control evaluations in more realistic settings, and AI companies have started building initial control measures—with much more to come.
At ControlConf, the people at the frontier of this work will present on current research problems, promising interventions, and the most important research directions going forward.
Apply here.
I expect talks and discussion on topics like these:
The great thing about AI control as an approach to mitigating misalignment risk is that it makes risk analysis (comparatively) so concrete: for our threat modeling, we can just argue about what AIs will be capable of, rather than needing to argue about inductive biases or whatever. So I’m hoping that this conference is a great place for discussion of misalignment risk that engages seriously with evidence and arguments about the future.
We’re also taking the opportunity to run a one-day workshop on April 17 on AI futurism and threat modeling, aimed at people who want clearer models of catastrophic AI risks and the best strategies for mitigating them. Apply to that here.