I'm excited to see this cross-over into AI safety discussions. I work on what we often call "reliability engineering" in software, and I think there's a lot of lessons there that apply here, especially the systems-based or highly-contextualized approach, since it acknowledges the same kind of failure as, say, was pointed out in The Design of Everyday Things: just because you build something to spec doesn't mean it works if humans make mistakes using it.

I've not done a lot to bring that over to LW or AF, other than a half-assed post about normalization of deviance. I'm not a great explainer, so I often feel like it's not the most valuable thing for me to do, but I think getting people pointed to this field seems neglected and valuable to get them thinking more about how real systems fail today, rather than theorizing about how AI might fail in the future, or the relatively constrained ways AI fails today.

Reply

[-]zeshen3yΩ330

I got the book (thanks to Conjecture) after doing the Intro to ML Safety Course where the book was recommended. I then browsed through the book and thought of writing a review of it - and I found this post instead, which is a much better review than I would have written, so thanks a lot for this!

Let me just put down a few thoughts that might be relevant for someone else considering picking up this book.

Target audience: Right at the beginning of the book, the author says "This book is written for the sophisticated practitioner rather than the academic researcher or the general public." I think this is relevant, as the book goes to a level of detail way beyond what's needed to get a good overview of engineering safety.

Relevance to AI safety: I feel like most engineering safety concepts are not applicable to alignment, firstly because an AGI would likely not have any human involvement in its optimization process, and secondly the basic underlying STAMP constructs of safety constraints, hierarchical safety control structures, and process models are simply more applicable to engineering systems. As stated in p100, "“STAMP focuses particular attention on the role of constraints in safety management.“ and I highly doubt an AGI can be bounded by constraints. " Nevertheless, Chapter 8 STPA: A New Hazard Analysis Technique that describes STPA (System Theoretic Process Analysis) may be somewhat relevant to designing safety interlocks. Also, the final chapter (13) on Managing Safety and the Safety Culture, is broadly applicable to any field that involves safety.

Criticisms on conventional techniques: The book often mentions that techniques like STAMP and STPA is superior than other conventional techniques like HAZOP and gives quotes by reviewers that attest to their superiority. I don't know if those criticisms are really fair, given how it is not really adopted at least in the oil and gas industry that, for all its flaws, takes safety very seriously. Perhaps the criticisms could be fair for very outdated safety practices. Nevertheless, the general concepts of engineering safety feels quite similar whether it uses 'conventional' techniques or the 'new' techniques described in the book.

Overall, I think this book provides a good overview of engineering safety concepts, but for the general audience (or alignment researchers) it goes into too much detail on specific case studies and arguments.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

26

[AN #112]: Engineering a Safer World

26

Ω 13

26

Ω 13

FEEDBACK

PODCAST