Learning to make better decisions

How is it that some people, like the physicist Albert Einstein and the former chess world champion Garry Kasparov, develop impressive cognitive skills and is there anything you and I can do to achieve a similar level of intellectual excellence in our chosen specialty? Up until the early 1960s, the prevailing opinion was very pessimistic. A cornerstone of this pessimism was the belief that the adult human brain is effectively fixed except for its gradual deterioration as we age. Since then a series of uplifting discoveries have shown that our brains are fundamentally plastic and malleable throughout the lifespan [1]. This neuroplasticity critically depends on the nature of our experiences. In a first demonstration, Marion Diamond showed that environmental enrichment promotes profound anatomical changes in the brains of rodents [2]. Later work by Michael Merzenich and colleagues showed that neuroplasticity adaptively changes how the brain represents the world in a use-it or lose-it fashion. These adaptive changes are detectable at multiple scales ranging from the tuning curves of single neurons to the structure of cortical maps [3]. These observations have inspired brain training programs that have been found effective for the treatment of dyslexia and other cognitive challenges [1,4]. These and many other inspiring findings suggest that we can improve virtually any aspect of our brain and cognitive abilities – including those that were once thought to be fixed and unchangeable. But what are the circumstances under which these improvements will occur, how can we create those circumstances, and what are the underlying mechanisms?

Metacognitive Reinforcement Learning as Mechanism of Cognitive Growth

A substantial literature on operant conditioning and reinforcement learning suggests that rewards and punishments are very powerful learning signals [5,6]. Such findings led behaviorists, like Skinner, to believe they could make their laboratory animals perform any, arbitrarily complex, physically possible behavior by creating a reward structure that reinforces that behavior appropriately. Research in neuroscience has found that in the brain learning from rewards is mediated by a reward prediction error that is conveyed by dopamine. Interestingly, the same dopamine signal that drives learning in the neural circuits that control our simple, habitual behavior, is also conveyed to the prefrontal cortex that controls the execution of our most sophisticated cognitive strategies. This suggests, that the same reward-driven learning mechanisms that shapes our habits might also be shaping how we think and decide. If so, then the principles of reinforcement learning might help us to understand how we learn how to think and decide. In other words, it might be possible to understand some important aspects of cognitive plasticity as metacognitive reinforcement learning (MCRL).

Consistent with this view, we have found that a reinforcement learning model of metacognitive learning can explain the effect of an environment’s reward structure on how quickly people learn to plan farther ahead [7]. Furthermore, we found that this metacognitive reinforcement learning algorithm was also able to discover some of the decision strategies people use to choose between multiple risky prospects [8]. On top of that, another study found that the same principle also captures how people learn when to use which cognitive strategy [9], as well as where to direct their attention and when to override an automatic response, such as an impulse or habit, by a controlled, deliberate decision-making process process and how much effort to invest into it [10]. Taken together, these findings support the conclusion that metacognitive reinforcement learning may be one of the mechanisms through which we learn to make better decisions and think more clearly.

The idea of metacognitive reinforcement learning suggests that the positive and negative emotions we experience as an immediate or downstream consequence of our thinking, such as regret and pride, can be an important driver of cognitive growth. For instance, the elation you experience when you finally discover the solution to a complex math problem will positively reinforce the good thinking that led you to this solution. By contrast, the frustration you experienced while being stuck at this problem for 30 minutes will make you less likely to reuse those unsuccessful approaches on future problems. This suggests that cognitive growth will occur when better thinking feels better than worse thinking.

The fact that our brain is equipped with such a powerful learning mechanisms raises the question why most of us never learn to think nearly as well as Albert Einstein and why even after decades of learning we are still plagued by self-defeating irrationalities. To answer this question, let’s look at the dark side of metacognitive reinforcement learning.

The dark side of metacognitive reinforcement learning

In an ideal agent, metacognitive reinforcement learning would be driven by the true quality of our thought and decision processes. However, for mortals like us, this is infeasible. So instead, our metacognitive reinforcement learning might be driven by the regret and pride (and possibly other emotions) we experience about our thinking and as a consequence of our decisions. Unfortunately, those emotions can be out of tune with the true quality of our thought and decision processes. Put simply, good thinking can feel bad and bad thinking can feel amazing. There are at least three reasons why good thinking does not always feel good: First, engaging in focused, rigorous thinking for an extended period of time is hard and effortful. Second, it can be completely unrewarding for very long stretches of time and requires us to forego potentially more rewarding alternatives, such as browsing our Facebook feed. And finally, rigorous thinking may lead us to disappointing conclusions (e.g., “We cannot conclude anything from our data.”) and painful realizations like “I have wasted my entire career on making spurious arguments for a dangerous preconception that is clearly wrong.”. This might be why very few people develop the potentially very valuable propensity to examine their conclusions, themselves, and their most cherished beliefs with critical rigor.

Conversely, bad thinking doesn’t always feel bad either. To the contrary, it usually feels amazing when flawed reasoning leads us to believe that we are smart, famous, virtuous, and did everything right. Feeling amazing might, in turn, reinforce the flawed reasoning that led us to those mistaken conclusions. This might be why some of our mental flaws are so persistent. It might, for instance, explain why many people develop the tendency to willfully distort their reasoning to reach the desired conclusion (motivated reasoning) and the bias to misattribute good outcomes to ourselves and bad outcomes to unfortunate circumstances (self-serving bias). Likewise, many bad decision mechanisms are often reinforced by immediate reward. For instance, the impulse that made you eat unhealthy sweets will be reinforced by sugar, and bad habits persist because of the guilty pleasures they generate in the moment.

These examples illustrate that metacognitive reinforcement learning can go awry when good thinking feels bad and bad thinking feels good. But this is not the only obstacle to learning to think and decide better. Another, perhaps even more important, obstacle is that most of the time thinking feels neutral because we cannot tell whether it is good or bad. Indeed, we very rarely receive informative feedback about the quality of our thinking and decision-making, and when we do receive feedback it is often very noisy. For instance, even if you apply exactly the same decision strategy to virtually identical problems (e.g., whether to buy, sell, or hold stock A vs. whether to buy, sell, or hold stock B) one of those decisions may have a very positive outcome while the other one has a very negative outcome. This means the world will often give us the wrong feedback. This may be either because the actual outcome does not reflect the quality of our decision (e.g., because we were unlucky) or because the quality of our decision does not reflect the quality of our decision strategy (e.g., because we made the right decision for the wrong reasons or by mere coincidence).

In most real-life situations, the world’s feedback for our thinking is not only noisy but also delayed. For instance, the reward for the excellent decision to embark on an ambitious, long-term project may not come until decades of hard and unappreciated effort later. Taking this to the extreme, the brilliance of some scientists’ abandoned early ideas was recognized and appreciated only posthumously, while some of their contemporaries rose to fame through questionable reasoning whose flaws were only exposed by later generations. In either case, the eventual “reward” arrived too late for either of them to learn from it. And even when we live long enough to experience the reward for a good decision, it may be hard to reconstruct the decision process that led to it and reinforce it appropriately. By contrast, the costs of hard thinking are experienced immediately and drive metacognitive reinforcement learning away from deep thinking and careful deliberation. This might be part of the reason why many people learn to be cognitive misers who try to get through life with as little mental effort as possible.

These examples suggest that the reason why our minds don’t always improve and won’t automatically reach the heights of Einstein is that the rewards that drive metacognitive learning are misaligned with the quality of our thinking.

Promoting Cognitive Growth

Given all of these problems and the resulting obstacles to cognitive growth, what can we do to overcome them? If the analysis above is correct then one part of the solution could be to align the rewards people experience more closely with the quality of their thinking and decision-making. This could be accomplished either by giving people high-quality feedback on how they think and decide (Figure 1) or by teaching people to generate their own. The following two sections discuss these approaches in turn.

Figure 1: Cognitive Tutors that teach people optimal cognitive strategies via metacognitive feedback.

Cognitive Tutors

In ongoing work, my team and I are exploring this approach by developing cognitive tutors (Figure 1) that give people metacognitive feedback on their planning process to teach them a near-optimal planning strategy [11,12]. As illustrated in Figure 2a, the cognitive tutor observes how participants plan the route of an airplane by recording the clicks they make to gather information. When the participant is finished planning, the cognitive tutor gives them metacognitive feedback comprising a delay penalty and a feedback message (Figure 2b). Critically, the delay penalty was computed so as to align the reward participants experienced with the quality of their planning process. Concretely, its duration was proportional to how much worse the participant’s planning strategy had been than the optimal strategy. In addition, a feedback message explained in which way their planning process deviated from the optimal strategy. As shown in Figure 2c, we found that this metacognitive feedback (see Figure 2b) significantly accelerated the process by which participants learned to plan better and enabled them to discover significantly better planning strategies than when they practiced without feedback or received feedback only on their actions (e.g., “Bad move! You should have moved right.”) Incidentally, the near-optimal planning strategies taught by the cognitive tutor were discovered using a cognitively inspired metalevel reinforcement learning algorithm [13].

Figure 2: Teaching people how to plan better. a) Interface of the cognitive tutor. b) Metacognitive feedback displayed by the cognitive tutor. c) Learning curves showing that people learn faster and discover better strategies when they receive metacognitive feedback (FB).

Be your cognitive tutor!

Unfortunately, for the time being, there is no cognitive tutor that can give us feedback on how we think and decide in everyday life. But you can be your own cognitive tutor. In fact, you already are. For instance, every time you regret a decision that you have made, your brain is giving you a metacognitive feedback signal that helps you learn to avoid similar mistakes in the future. And you are also being your very own cognitive tutor when you feel proud of how you solved a complex problem or exerted willpower to resist temptations and get some work done instead. This suggests that you can accelerate your cognitive growth by giving yourself high-quality feedback on how you think and decide. You are, in fact, the most qualified person to do so because no one else can watch your mind as closely as you can. So, what can we do to improve our metacognitive feedback? Here are a few ideas:

  1. Make the outcomes of poor decision-making less rewarding. You can take proactive steps to take away the positive reinforcement that sustains your bad habits and occasional impulsivity. For instance, if you would like to teach your brain to allocate less control to your bad internet habits, then set up a website blocker that will scold you when you indulge in that bad habit instead of giving you a positive reward for it. To take this one step further, you could use Pavlok to set up a system that generates negative reinforcement via electric shocks when it allocates control to bad habits or maladaptive impulses.
  2. Make the outcomes of good decision-making more rewarding. If you have successfully exerted your willpower and have made good decisions to get things done, then you deserve a reward. To reliably get this reward, you can set concrete goals and reward yourself for achieving them. You can also set up a to-do list gamification system, or get an accountability partner whom you can tell all about your amazing accomplishments.
  3. Evaluate yourself by the quality of your reasoning rather than the outcome of the decision. If you did the best you could given the information that you had, then you should appreciate that to make yourself feel good even if the outcome of the decision was terrible. Conversely, if you know you made an important decision carelessly, then you should imagine how it could have gone wrong and regret it accordingly even if turned out well.
  4. Taken together 1-3 suggest that we should praise yourself for good thinking and good decision-making and criticize yourself for flawed reasoning and poor decisions.
  5. To do this reliably, it is important to pay attention to how you make your decisions and examine this process critically. What was your decision strategy? Would it have worked well for similar decisions you have made in the past? Can you spot any holes or fallacies in your argument? Is this an A+ strategy you would proudly recommend to others or a C- strategy that you would frown upon if you saw somebody else use it? You could make this easier for yourself by writing out your thought process and then grading it as if it was an essay that somebody else has written for a course on good decision-making.
  6. This strategy can also be applied postmortem: When something goes particularly well or especially poorly then take a step back and figure out which of your decisions, if any, caused this outcome and reflect on how you made those decisions.
  7. To generate metacognitive feedback more frequently, you can make an effort to find out how good your decisions really were by figuring out what would have happened if you had made your decision differently. Would you have chosen a better option if you had prioritized different criteria, or thought it through more carefully? Or could you have saved yourself a lot of time by reaching an equally good decision with a much simpler decision strategy?
  8. Finally, you can also solicit the metacognitive feedback of other people, and offer to do the same for them in return.

Summary and Conclusion

Metacognitive reinforcement learning may be one of the mechanisms through which we learn how to think and decide. This perspective suggests that our cognitive abilities and thinking dispositions are fundamentally malleable and will improve if and only if the internal rewards we experience for our thinking reflect its quality. It might thus be possible to promote cognitive growth by aligning how we feel about our thoughts and decisions to how good they really are. Cognitive tutors can help us achieve this by giving us immediate, high-quality feedback on our cognitive strategies, but we can also help ourselves by critically examining our thinking and praising and criticizing ourselves accordingly.

If you have tried any of the ideas for promoting cognitive growth, or have other ideas of your own, then please let me know. I would love to hear your thoughts, and (metacognitive) feedback is always very welcome!

References

[1] For an engaging and very accessible introduction to some of the pioneering research on neuroplasticity and its application see Doidge, N. (2007). The brain that changes itself: Stories of personal triumph from the frontiers of brain science. Penguin.

[2] Diamond, M. C., Krech, D., & Rosenzweig, M. R. (1964). The effects of an enriched environment on the histology of the rat cerebral cortex. Journal of Comparative Neurology, 123(1), 111-119.

[3] Buonomano, D. V., & Merzenich, M. M. (1998). Cortical plasticity: from synapses to maps. Annual review of neuroscience, 21(1), 149-186.

[4] Temple, E., Deutsch, G. K., Poldrack, R. A., Miller, S. L., Tallal, P., Merzenich, M. M., & Gabrieli, J. D. (2003). Neural deficits in children with dyslexia ameliorated by behavioral remediation: evidence from functional MRI. Proceedings of the National Academy of Sciences, 100(5), 2860-2865.

[5] Reynolds, G. S. (1975). A primer of operant conditioning, Rev. ed. Oxford, England: Scott, Foresman.

[6] Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3), 139-154.

[7] Krueger, P.M.*, Lieder, F.*, & Griffiths, T.L. (2017). Enhancing Metacognitive Reinforcement learning using reward structures and feedback. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.). Proceedings of the 39th Annual Meeting of the Cognitive Science Society. Austin TX: Cognitive Science Society. * These authors contributed equally. [Article]

[8] Lieder, F.*, Krueger, P.M.*, & Griffiths, T.L. (2017). An Automatic Method for Discovering Rational Heuristics for Risky Choice. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.). Proceedings of the 39th Annual Meeting of the Cognitive Science Society. Austin TX: Cognitive Science Society. * These authors contributed equally. [Article]

[9] Lieder, F., & Griffiths, T. L. (2017). Strategy selection as rational metareasoning. Psychological Review, 124(6), 762-794. http://dx.doi.org/10.1037/rev0000075

[10] Lieder, F., Shenhav, A., Musslick, S., & Griffiths, T.L. (in revision). Rational metareasoning and the plasticity of cognitive control. [Manuscript]

[11] Lieder, F.*, Krueger, P. M.*, Callaway, F.*, & Griffiths, T.L. (2017). A computerized training program for teaching people how to plan better. Annual Meeting of the Society for Judgment and Decision-Making. [Abstract]

[12] Lieder, F.*, Krueger, P.M*., Callaway, F.*, & Griffiths (2017). A reward shaping method for promoting metacognitive learning. The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making. [Extended Abstract]

[13] Lieder, F.1, Callaway, F.1, Gul, S.1, Krueger, P.M., & Griffiths, T.L. (2017). Learning to select computations. NIPS workshop on Cognitively Informed AI. arXiv:1711.06892. 1 These authors contributed equally.