New Comment
5 comments, sorted by Click to highlight new comments since:

I agree with your thesis that psychology is neglected in approaches to AI safety, but I also suspect this neglect is advisable.

Many people have thoughts on psychology because it's a highly accessible field (we all have our own minds we may want to understand and our experts of ourselves). Many of these ideas are wildly mistaken, and in general training in cognitive science and psychology does not train people to think with the precision and accuracy we want for AI safety. Studying mathematics or a field of applied mathematics seems the only high probability way to teach those thinking skills.

This still leaves room for people to, as you suggest, minor in psychology or cognitive science. But the existing crop of AI safety researchers, as far as I know, largely did not study those subjects under the tutelage of an institution and are generally well versed in the basics. Thus it seems the only way remaining open is to become expert in both mathematics and psychology, in that order, to have something likely to contribute to AI safety.

(I say this as a person who has followed a path of computer science -> mathematics -> psychology -> philosophy. I do not work in AI safety directly myself but I think I'm familiar enough with both AI safety and the discussed fields to say something about their relative applicability to the field.)

Additionally, there may be some risk from advising the psychological approach to AI safety. I worry that it may overly encourage folks that they can solve the problems of AI safety without the need for the exactness of current approaches, and making such a narrative popular may decrease the chances of the development of safe AI by making people overly optimistic about their ability to apply human psychology to AI minds to the purpose of safety. I especially worry this because it's a very appealing story that people without strong mathematics/rationality training are likely to adopt to justify performing potentially dangerous AI research without greater expertise in safety. The main safety valve here seems to be that currently it's hard to make progress on AI capability research without something very much like the skills needed for safety research, but as machine learning becomes more powerful and easier to use it increases the risk that such folks less adapted for safety research may have access to creating AI with sufficiently dangerous capabilities.

All this said, I actually think it quite likely that it will turn out that AI safety is easier than we think since it's more likely than us getting lucky and a key reason why I think it might turn out to be easier is because we find insights in psychology and philosophy that give us more robust tools we can use on minds in general than those currently being developed to address utility functions (but that will also apply to any sufficiently capable mind even if it has a proper utility function). But I'm not sure advising more people of this much increases the chances of finding such an outcome and probably even decreases our chances such that it decreases the measure of histories where AI is safely developed.

Sort of late to the party, but I'd like to note for any aspiring cognitive science student browsing the archives, that I doubt this comment is accurate. I'm studying cognitive science and, in practice, because of the flexibility we have and because cogsci has maths/cs as constitute disciplines, this largely means taking maths, AI or computer science (largely the same courses that people from these field take). These disciplines make up >60% of my studies. Of course, I'm behind people who focus on maths or cs exclusively in terms of maths and cs, but I don't see a good reason to think that we lack the ability to think with precision and rigor to an extent that we can contribute to AI safety. Prove me wrong :)

I would add that to the list didactic and pedagogy. I think these subjects could provide a deeper understanding of the processes going ob when "training an AI" - because in the end what didactic and pedagogy is about dealing with very complex and hard to understand AIs - growing humans.

While taking lessons from these disciplines might indeed not the most abstract and straightforward way to go about AI control it would on the other hand also not very smart to disregard a sizable body of knowledge here.

Maybe not every AI scientist should look into these domains, but I believe those inclined could learn a lot.

I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.

Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.

Given psychology's on-going replication crisis, perhaps its neglect by AI researchers has proved prudent?