I'm writing this up as a quick sketch of an argument that I don't think anyone has explicitly made yet. I am about to start the PIBBSS Fellowship so won't have time to develop it fully, but I believe it could give a useful perspective on why alignment is a...
Written as part of a FIG Fellowship under Eleni Angelou's supervision. I've spent some time with ARC's recent blog post, Competing with Random Sampling. I think it contains some interesting ideas. Unfortunately, those ideas are captured in formalisms that might intimidate anyone without the patience for some mathematics. So here's...
This is the third post in a sequence on substrates - the layers of computational context that allow AI to be implemented in real systems. The sequence expands on the concept of substrates as described in this paper and was written as part of the AI Safety Camp project "MoSSAIC:...
This is the second post in a sequence that expands upon the concept of substrates as described in this paper. It was written as part of the AI Safety Camp project "MoSSAIC: Scoping out Substrate Flexible Risks," one of the three projects associated with Groundless. We now argue that the...
This post and the related sequence were written as part of the AI Safety Camp project "MoSSAIC: Scoping out Substrate Flexible Risks." This was one of the three projects supported by, and continuing the work of, Groundless. Specifically, it develops one of the key concepts referred to in the original...
The previous two posts have emphasized some problematic scenarios for mech-interp. Mech-interp is our example of a more general problem in AI safety. In this post we zoom out to that more general problem, before proposing our solution. We can characterize the more general problem, inherent in the causal–mechanistic paradigm,...
The previous post highlighted some salient problems for the causal–mechanistic paradigm we sketched out. Here, we'll expand on this with some plausible future scenarios that further weaken the paradigm's reliability in safety applications. We first briefly refine our critique and outline the scenario progression. Outline We contend that the causal–mechanistic...