Thoughts in Philosophy of Science of AI Alignment
In this series of posts, I discuss ideas in the philosophy of science of AI alignment.
Philosophy of science is concerned with the epistemic foundations, methods, and implications of science in general or the science of a specific domain.
Accordingly, a series on the philosophy of science of AI alignment will target questions such as: what is the epistemic nature of the alignment problem? What epistemic strategies appear promising, and conditional on which epistemic assumptions? Some posts may introduce or clarify language for talking about AI risk and alignment, for navigating the research landscape, or for thinking about what progress might look like. Some posts will more specifically explore and develop epistemic assumptions underlying the research direction pursued by PIBBSS.
This sequence is a collection of related ideas rather than a series of posts with a single red thread/coherent arch; it is a "living" project in that I expect to be adding new posts over an undefined amount of time.