Some Yudkowskian metaethicists have supposed that the brain does not compute value for actions or states of affairs at all, but instead uses purely behavioristic methods (via reward signals) to motivate action, and thus the 'terminal goals' we could program an AGI to realize must be extracted from brains only via an extrapolation algorithm.

My recent Neuroscience of Desire post summarized recent work in neuroeconomics that suggests the extrapolation procedure may be a bit easier than originally supposed because the brain does seem to compute expected utility for some actions, and perhaps for some states of affairs.

One caveat, though, is that information about objective stimuli intensities is always lost at the transducer (for neuronal efficiency, stimuli intensity is always computed relative to a locally-built reference point, but reference point information is quickly thrown away by the brain), and thus the brain cannot be computing utility for objective states of affairs.

Two recent papers, though, shed more light on how the brain produces human motivation by combining input from model-free systems (e.g. standard behavioristic systems that don't encode value for anything) and model-based systems (systems that mentally represent actions and encode value for them). The brain may be encoding value for mentally simulated actions that are isomorphic to possible real-world actions, and this value encoding might not involve a reference point like the signals concerning external stimuli do. This again suggests that the extrapolation algorithm needed to build a utility function for a Friendly AI may not be quite as difficult to write as would be the case if the brain did not encode value for anything at all.

Of course this is all very early and speculative. But if you get hedons from following a rapidly advancing scientific field, computational cognitive neuroscience is a great one to watch. It's amazing what we've learned in the past 5 years.


New Comment

New to LessWrong?