Theory of mind is something that humans have instinctively and subconsciously, but that isn't easy to spell out explicitly; therefore, by Moravec's paradox, it will be very hard to implant it into an AI, and this needs to be done deliberately.

I think this is the weakest part. Consider: "Recognizing cat pictures is something humans can do instinctively and subconsciously, but that isn't easy to spell out explicitly; therefore, by Moravec's paradox, it will be very hard to implant it into an AI, and this needs to be done deliberately." But in practice, the techniques that work best for cat pictures work well for lots of other things as well, and a hardcoded solution customized for cat pictures will actually tend to underperform.

Reply

[-]Stuart_Armstrong6yΩ230

I'm actually willing to believe that methods used for cat pictures might work for human theory of mind - if trained on that data (and this doesn't solve the underdefined problem).

Reply

[-]avturchin6y50

Maybe we could try to put the theory of mind out of the brackets? In that case, the following type of claims will be meaningful: "For the theory of mind T1, a human being H has the set of preferences P1, and for the another theory of mind T2 he has P2". Now we could compare P1 and P2 and if we find some invariants, they could be used as more robust presentations of the preferences.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

34

Ultra-simplified research agenda

34

Ω 15

34

Ω 15