There is no royal road to alignment

Eleni Angelou

Crossposted from the EA Forum: https://forum.effectivealtruism.org/posts/W4MfMmbwccDCRBfnj/there-is-no-royal-road-to-alignment

Epistemic status: tentative thoughts I've been processing for a while now and discussing mostly with philosophy folks.

Despite working in philosophy of science for many years now, I've found that (my) epistemological insights are quite rare, if any. Most recently, the question I've been thinking about and asking everyone I meet over and over is how do you speed up scientific research? The most apt answer seems to be the most trivial one: do more empirical work; construct more models; test your models.

Epistemology is hard. Not only does it require a deep understanding of how the practice of science happens, but it also looks like epistemological insights don't come often and the ones that do, aren't exactly groundbreakingly helpful. Why is that? I'd tentatively argue that it's because of the limited nature of our cognition and the need to have a lot of history of science, i.e., that we need many examples to confidently infer a pattern. At the same time, it seems that what belongs purely to the cognition aspect of the discussion, is entangled with a variety of social factors which make phenomena such as scientific discoveries difficult to interpret solely as products of cognitive excellence. And so, it becomes very difficult to make bold claims like the students of Nobel prize laureates tend to be Nobel prize laureates themselves.

My current epistemological insight (that is not going to help anybody as far as I understand) is that while alignment might look like it presupposes a "weird epistemology" of solving a problem before it even makes its appearance, it is probably not as peculiar a case as we consider it to be with our present intellectual tools, where "we"= all of us working in AI safety, broadly speaking.

When you're inside a problem, you tend to have a first-person perspective of it which makes you think that this must be a unique situation. The analogy applies to the feeling of uniqueness we all experience when it comes to the idea of the "self" and our personal identity. But just like this feeling is challenged once other people share their experiences with us, the uniqueness of alignment might be challenged when juxtaposed with the episodes from the history of science.

The claim I often defend is that AI safety is at present a pre-paradigmatic field. I'm willing to support that from my current point of view and with the poor epistemological tools that I can employ. The tension I observe, however, is between the perspective of the outsider and the one of the alignment researcher. The former is able to take some distance from the importance of solving alignment and treat it as yet another scientific agenda that is in no epistemological sense different from the agenda of a group of researchers working in a niche sub-area of physics.

Now, before anyone finds a bunch of objections here, this doesn't suggest that solving the main problem of AI safety, i.e., alignment is qualitatively comparable to a random technical problem in another scientific field. At least not for the future of humanity. But generally, science deals with a lot of uncertainty that in the case of AI safety finds a pique: not even being able to empirically observe the technology all your problem-solving is about.

I used to be very confident about this claim. But the more I think about it, the more I come to find the reminder that science is way more purely conceptual than we assume it to be necessary. That, in the majority of scientific practice in its history, theory comes first.

When theory doesn't come first, we don't have science, we have engineering. So here's the division I'm diagnosing: the efforts to just have AGI as fast as possible, are a matter of engineering while the efforts to have safe AGI with all that this entails are a matter of science. Their entanglement is obvious; we won't get safe AGI without the engineering work. This seems to create part of the epistemological trouble I have when thinking about speeding up research.

This is probably not a very useful insight, if it deserves to be called an insight at all. I've been trying to come up with ways to make philosophy of science and epistemology useful for solving the alignment problem. The more I reflect on it though, the more I notice that what I'm requesting is essentially a shortcut. I've developed an expectation for history in its making to make as much sense as history of the past is calmly studied from the standpoint of the present. Scientific discoveries, however, tend to appear conveniently obvious when they've become available already.

If epistemology can't substantially help us right now it is because there are no shortcuts to empirical work, at least with our at the moment available human cognition and levels of intelligence. While this might be frustrating, it does inform us about the nature and limits of philosophical examination as well as the nature of scientific problems. It is an invitation to update to the demands of empirical inquiry and prepare for the journey accordingly, knowing that there is no royal road to alignment.

[-]Shmi2y82

"It's not what you don't know that kills you but what you know that isn't so."

The issue with pre-paradigmatic fields is that most assumptions we have about them are both implicit and wrong. Philosophy at its best is great at digging up, explicating and questioning them. This part has not been done well enough for AI/ML, though the research seems to be ongoing, with the new models providing new insights into how humans do and do not think. I'd guess that there is plenty of newly low-hanging fruit in epistemology exposed by the likes of GPT-3/DallE/SD etc. Just need to carefully figure out what implicit assumptions now stand naked, exposed and wrong.

[-]Flaglandbase2y-20

My favorite paradigm research notion is to investigate all the ways in which today's software fails, crashes, lags, doesn't work, or most often just can't be used. This despite CPUs being theoretically powerful enough to run much better software than what is currently available. So just the opposite situation of what is feared will happen when AI arrives.