I'm currently a philosophy student that has primarily thought about and done work in foundational value theory, where I've looked at how one might build moral and political theories from the ground up (considering almost everyone in the field doesn't even seem to care about this).

I've spend the last couple of months getting into the topic of specifying a "value" or "governing principle(s)" for an AGI, reading up Bostrom's new book as well as several of his papers and other papers by MIRI. The angle I'm trying to approach it from is to look at this problem from more of a value-theoretic perspective - it's surprising to me that almost no moral/political philosophers are interested in this issue considering the parallels (especially with methodological work being done in the value theory fields).

I'm interested to know, and I'd like to ask, (1) Whether people here think that there is any severe (possibly unsurmountable) limitation of coming from this kind of angle to the problem of FAI and if so what that limitation is, and (2) if I were to approach the problem of FAI from this angle, what kind of mathematical/decision-theoretic stuff would I ABSOLUTELY need to know to even have hopes of doing anything relevant?

Generally, I'd like to also hear any other thoughts on approaching the problem of FAI from my kind of background. Thanks a lot!

Edit: It seems to me that there are two problems in FAI - finding out the content of any value or principle-set that would allow for friendliness (such as CEV), and then ensuring that the exact sense/intention of this set is represented in the AI's goal content. Would I be right in assuming that to draw a very rough distinction, the latter problem is what requires the brunt of the mathematical/computational training? To whatever extent a line can be drawn in between the two, how would you guys recommend one go about exploring the first problem? Thanks!

Hi. Thanks for the recommendations above!

I'm currently a philosophy student that has primarily thought about and done work in foundational value theory, where I've looked at how one might build moral and political theories from the ground up (considering almost everyone in the field doesn't even seem to care about this).

I've spend the last couple of months getting into the topic of specifying a "value" or "governing principle(s)" for an AGI, reading up Bostrom's new book as well as several of his papers and other papers by MIRI. The angle I'm trying to approach it from is to look at this problem from more of a value-theoretic perspective - it's surprising to me that almost no moral/political philosophers are interested in this issue considering the parallels (especially with methodological work being done in the value theory fields).

I'm interested to know, and I'd like to ask, (1) Whether people here think that there is any severe (possibly unsurmountable) limitation of coming from this kind of angle to the problem of FAI and if so what that limitation is, and (2) if I were to approach the problem of FAI from this angle, what kind of mathematical/decision-theoretic stuff would I ABSOLUTELY need to know to even have hopes of doing anything relevant?

Generally, I'd like to also hear any other thoughts on approaching the problem of FAI from my kind of background. Thanks a lot!

Edit: It seems to me that there are two problems in FAI - finding out the content of any value or principle-set that would allow for friendliness (such as CEV), and then ensuring that the exact sense/intention of this set is represented in the AI's goal content. Would I be right in assuming that to draw a

veryrough distinction, the latter problem is what requires the brunt of the mathematical/computational training? To whatever extent a line can be drawn in between the two, how would you guys recommend one go about exploring the first problem? Thanks!