Decision making under uncertainty: exploration and exploitation

993

Abstract

Decision-making under conditions of the lack of sufficient information is associated with hypotheses construction, verification and refinement. In a novel environment subjects encounter high uncertainty; thus their behavior needs to be variable and aimed at testing the range of multiple options available; such variability allows acquiring information about the environment and finding the most beneficial options. This type of behavior is referred to as exploration. As soon as the internal model of the environment has been formed, the other strategy known as exploitation becomes preferential; exploitation presupposes using profitable options that have already been discovered by the subject. In a changing or complex (probabilistic) environment, it is important to combine these two strategies: research strategies to detect changes in the environment and utilization strategies to benefit from the familiar options. The exploration-exploitation balance is a hot topic in psychology, neurobiology, and neuroeconomics. In this review, we discuss factors that influence exploration-exploitation balance and its neurophysiological basis, decision-making mechanisms under uncertainty, and switching between them. We address the roles of major brain areas involved in these processes such as locus coeruleus, anterior cingulate cortex, frontopolar cortex, and we describe functions of some important neurotransmitters involved in these processes – dopamine, norepinephrine, and acetylcholine.

General Information

Keywords: uncertainty, decision-making, exploration-exploitation trade-off, norepinephrine, dopamine, acetylcholine.

Journal rubric: Neurosciences and Cognitive Studies

DOI: https://doi.org/10.17759/jmfp.2020090208

Funding. The reported study was funded by Russian Science Foundation (RSF), project number 14-06-14029.

Acknowledgements. The authors are grateful to Stroganova T.A. for her great contribution to research on neurocognitive mechanisms of decision-making in the Moscow MEG center.

For citation: Sayfulina K.E., Kozunova G.L., Medvedev V.A., Rytikova A.M., Chernyshev B.V. Decision making under uncertainty: exploration and exploitation [Elektronnyi resurs]. Sovremennaia zarubezhnaia psikhologiia = Journal of Modern Foreign Psychology, 2020. Vol. 9, no. 2, pp. 93–106. DOI: 10.17759/jmfp.2020090208. (In Russ., аbstr. in Engl.)

References

  1. Kaneman D., Tverski A. Ratsional'nyi vybor, tsennosti i freimy [Rational choice, values and frames]. Psikhologicheskii zhurnal [Psychological journal], 2003. Vol. 24, no. 4, pp. 31–43.
  2. Beeler J.A. et al. A kinder, gentler dopamine... highlighting dopamine's role in behavioral flexibility. Frontiers in neuroscience, 2014. Vol. 8, article ID 4, 2 p. DOI:10.3389/fnins.2014.00004
  3. Gehring W.J. et al. A neural system for error detection and compensation. Psychological science, 1993. Vol. 4, no. 6, pp. 385–390. DOI:10.1111/j.1467-9280.1993.tb00586.x
  4. Addicott M.A. et al. A primer on foraging and the explore/exploit trade-off for psychiatry research. Neuropsychopharmacology, 2017. Vol. 42, pp. 1931–1939. DOI:10.1038/npp.2017.108
  5. Aspers P. Forms of uncertainty reduction: decision, valuation, and contest. Theory and society, 2018. Vol. 47, pp. 133–149. DOI:10.1007/s11186-018-9311-0
  6. Aston-Jones G., Cohen J.D. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annual Review of Neuroscience, 2005. Vol. 28, pp. 403–450. DOI:10.1146/annurev.neuro.28.061604.135709
  7. Aston-Jones G., Rajkowski J., Kubiak P. Conditioned responses of monkey locus coeruleus neurons anticipate acquisition of discriminative behavior in a vigilance task. Neuroscience, 1997. Vol. 80, no. 3, pp. 697–715. DOI:10.1016/S0306-4522(97)00060-2
  8. Barack D.L., Gold J.I. Temporal trade-offs in psychophysics. Current opinion in neurobiology, 2016. Vol. 37, pp. 121–125. DOI:10.1016/j.conb.2016.01.015
  9. Blanchard V.C., Gershman S.J. Pure correlates of exploration and exploitation in the human brain. Cognitive, Affective, & Behavioral Neuroscience, 2018. Vol. 18, no. 1, pp. 117–126. DOI:10.3758/s13415-017-0556-2
  10. Boschin E.A., Piekema C., Buckley M.J. Essential functions of primate frontopolar cortex in cognition. Proceedings of the National Academy of Sciences, 2015. Vol. 112, no. 9, pp. E1020–E1027. DOI:10.1073/pnas.1419649112
  11. Botvinick M.M., Cohen J.D., Carter C.S. Conflict monitoring and anterior cingulate cortex: an update. Trends in cognitive sciences, 2004. Vol. 8, no. 12, pp. 539–546. DOI:10.1016/j.tics.2004.10.003
  12. Cavanagh J.F., Frank M.J. Frontal theta as a mechanism for cognitive control. Trends in cognitive sciences, 2014. Vol. 18, no. 8, pp. 414–421. DOI:10.1016/j.tics.2014.04.012
  13. Conant R.C., Ross Ashby W. Every good regulator of a system must be a model of that system. International journal of systems science, 1970. Vol. 1, no. 2, pp. 89–97. DOI:10.1080/00207727008920220
  14. Cook Z., Franks D.W., Robinson E.J.H. Exploration versus exploitation in polydomous ant colonies. Journal of theoretical biology, 2013. Vol. 323, pp. 49–56. DOI:10.1016/j.jtbi.2013.01.022
  15. Daw N.D. et al. Cortical substrates for exploratory decisions in humans. Nature, 2006. Vol. 441, pp. 876–879. DOI:10.1038/nature04766
  16. Denison S., Xu F. Infant statisticians: The origins of reasoning under uncertainty. Perspectives on Psychological Science, 2019. Vol. 14, no. 4, pp. 499–509. DOI:10.1177/1745691619847201
  17. Cinotti F. et al. Dopamine blockade impairs the exploration-exploitation trade-off in rats. Scientific reports, 2019. Vol. 9, no. 1, pp. 1–14. DOI:10.1038/s41598-019-43245-z
  18. Kayser A.S. et al. Dopamine, locus of control, and the exploration-exploitation tradeoff. Neuropsychopharmacology, 2015. Vol. 40, no. 2, pp. 454–462. DOI:10.1038/npp.2014.193
  19. Humphreys K.L. et al. Exploration–Exploitation strategy is dependent on early experience. Developmental Psychobiology, 2015. Vol. 57, no. 3, pp. 313–321. DOI:10.1002/dev.21293
  20. Fobbs W.C., Mizumori S.J.Y. Cost–Benefit Decision Circuitry: Proposed Modulatory Role for Acetylcholine. Progress in molecular biology and translational science, 2014. Vol. 122, pp. 233–261. DOI:10.1016/B978-0-12-420170-5.00009-X
  21. Frank M.J., Hutchison K. Genetic contributions to avoidance-based decisions: striatal D2 receptor polymorphisms. Neuroscience, 2009. Vol. 164, no. 1, pp. 131–140. DOI:10.1016/j.neuroscience.2009.04.048
  22. Gehring W.J., Willoughby A.R. The medial frontal cortex and the rapid processing of monetary gains and losses. Science, 2002. Vol. 295, no. 5563, pp. 2279–2282. DOI:10.1126/science.1066893
  23. Gold J.I., Shadlen M.N. The neural basis of decision making. Annual review of neuroscience, 2007. Vol. 30, pp. 535–574. DOI:10.1146/annurev.neuro.29.051605.113038
  24. Hills V.V. Animal foraging and the evolution of goal‐directed cognition. Cognitive science, 2006. Vol. 30, no. 1, pp. 3–41. DOI:10.1207/s15516709cog0000_50
  25. Huang Y., Yu R. The feedback-related negativity reflects “more or less” prediction error in appetitive and aversive conditions. Frontiers in neuroscience, 2014. Vol. 8, article ID 108, 6 p. DOI:10.3389/fnins.2014.00108
  26. Jepma M., Nieuwenhuis S. Pupil diameter predicts changes in the exploration–exploitation trade-off: Evidence for the adaptive gain theory. Journal of cognitive neuroscience, 2011. Vol. 23, no. 7, pp. 1587–1596. DOI:10.1162/jocn.2010.21548
  27. Kahneman D., Tversky A. Variants of uncertainty. Cognition, 1982. Vol. 11, no. 2, pp. 143–157. DOI:10.1016/0010-0277(82)90023-3
  28. Killeen P.R. Pavlov + Skinner = Premack [Elektronnyi resurs]. International Journal of Comparative Psychology, 2014. Vol. 27, no. 4, pp. 544–568. URL: https://www.researchgate.net/profile/Peter_Killeen2/publication/269873794_Pavlov_Skinner_Premack/links/549861d30cf2c5a7e342bdca.pdf (Accessed 05.06.2020).
  29. McDannald M.A. et al. Learning theory: a driving force in understanding orbitofrontal function. Neurobiology of learning and memory, 2014. Vol. 108, pp. 22–27. DOI:10.1016/j.nlm.2013.06.003
  30. Zhang D. et al. Linking brain electrical signals elicited by current outcomes with future risk decision-making. Frontiers in behavioral neuroscience, 2014. Vol. 8, article ID 34, 15 p. DOI:10.3389/fnbeh.2014.00084
  31. Linson A., Parr V., Friston K.J. Active inference, stressors, and psychological trauma: A neuroethological model of (mal) adaptive explore-exploit dynamics in ecological context. Behavioural Brain Research, 2020. Vol. 380, pp. 112–421. DOI:10.1016/j.bbr.2019.112421
  32. Aston-Jones G. et al. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. Journal of Neuroscience, 1994. Vol. 14, no. 7, pp. 4467–4480. DOI:10.1523/JNEUROSCI.14-07-04467.1994
  33. Mansouri F.A. et al. Managing competing goals – a key role for the frontopolar cortex. Nature Reviews Neuroscience, 2017. Vol. 18, no. 11, pp. 645–657. DOI:10.1038/nrn.2017.111
  34. Mata R., Wilke A., Czienskowski U. Foraging across the life span: is there a reduction in exploration with aging? Frontiers in neuroscience, 2013. Vol. 7, article ID 34, 7 p. DOI:10.3389/fnins.2013.00053
  35. McClure S.M., Gilzenrat M.S., Cohen J.D. An exploration-exploitation model based on norepinephrine and dopamine activity [Elektronnyi resurs]. In Weiss Y., Schölkopf B., Platt J.C. (eds.), Advances in neural information processing systems: proceedings from the conference "Neural Information Processing Systems 2005", 2006, pp. 867–874. URL: https://papers.nips.cc/paper/2950-an-exploration-exploitation-model-based-on-norepinepherine-and-dopamine-activity.pdf (Accessed 05.06.2020).
  36. Miller E.K., Cohen J.D. An integrative theory of prefrontal cortex function. Annual review of neuroscience, 2001. Vol. 24, pp. 167–202. DOI:10.1146/annurev.neuro.24.1.167
  37. Miltner W.H.R., Braun C.H., Coles M.G.H. Event-related brain potentials following incorrect feedback in a time-estimation task: evidence for a “generic” neural system for error detection. Journal of cognitive neuroscience, 1997. Vol. 9, no. 6, pp. 788–798. DOI:10.1162/jocn.1997.9.6.788
  38. Heil M. et al. N200 in the Eriksen-task: Inhibitory executive process? Journal of Psychophysiology, 2000. Vol. 14, no. 4, pp. 218–225. DOI:10.1027//0269-8803.14.4.218
  39. Pearson J.M. et al. Neurons in posterior cingulate cortex signal exploratory decisions in a dynamic multioption choice task. Current biology, 2009. Vol. 19, no. 18, pp. 1532–1537. DOI:10.1016/j.cub.2009.07.048
  40. Naudé J. et al. Nicotinic receptors in the ventral tegmental area promote uncertainty-seeking. Nature neuroscience, 2016. Vol. 19, no. 3, pp. 471–478. DOI:10.1038/nn.4223
  41. Onge J.R.S., Abhari H., Floresco S.B. Dissociable contributions by prefrontal D1 and D2 receptors to risk-based decision making. Journal of Neuroscience, 2011. Vol. 31, no. 23, pp. 8625–8633. DOI:10.1523/JNEUROSCI.1020-11.2011
  42. Stopper C.M. et al. Overriding phasic dopamine signals redirects action selection during risk/reward decision making. Neuron, 2014. Vol. 84, no. 1, pp. 177–189. DOI:10.1016/j.neuron.2014.08.033
  43. Padoa-Schioppa C., Conen K.E. Orbitofrontal cortex: a neural circuit for economic decisions. Neuron, 2017. Vol. 96, no. 4, pp. 736–754. DOI:10.1016/j.neuron.2017.09.031
  44. Parr V., Friston K.J. Uncertainty, epistemics and active inference. Journal of The Royal Society Interface, 2017. Vol. 14, no. 136, 10 p. DOI:10.1098/rsif.2017.0376
  45. Lee M.D. et al. Psychological models of human and optimal performance in bandit problems. Cognitive Systems Research, 2011. Vol. 12, no. 2, pp. 164–174. DOI:10.1016/j.cogsys.2010.07.007
  46. Bartholow B.D. et al. Psychophysiological evidence of response conflict and strategic control of responses in affective priming. Journal of Experimental Social Psychology, 2009. Vol. 45, no. 4, pp. 655–666. DOI:10.1016/j.jesp.2009.02.015
  47. Rakow V., Newell B.R., Zougkou K. The role of working memory in information acquisition and decision making: Lessons from the binary prediction task. The Quarterly Journal of Experimental Psychology, 2010. Vol. 63, no. 7, pp. 1335–1360. DOI:10.1080/17470210903357945
  48. Kiebel S.J. et al. Recognizing sequences of sequences. PLoS computational biology, 2009. Vol. 5, no. 8, 14 p. DOI:10.1371/journal.pcbi.1000464
  49. Laviola G. et al. Risk-taking behavior in adolescent mice: psychobiological determinants and early epigenetic influence. Neuroscience & Biobehavioral Reviews, 2003. Vol. 27, no. 1–2, pp. 19–31. DOI:10.1016/S0149-7634(03)00006-X
  50. Badre D. et al. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 2012. Vol. 73, no. 3, pp. 595–607. DOI:10.1016/j.neuron.2011.12.025
  51. Sara S.J. The locus coeruleus and noradrenergic modulation of cognition. Nature reviews neuroscience, 2009. Vol. 10, no. 3, pp. 211–223. DOI:10.1038/nrn2573
  52. Slovic P. Risk-taking in children: Age and sex differences. Child Developmen, 1966. Vol. 37, no. 1, pp. 169–176. DOI:10.2307/1126437
  53. Smith A.P., Beckmann J.S., Zentall V.R. Gambling-like behavior in pigeons:‘jackpot’signals promote maladaptive risky choice. Scientific reports, 2017. Vol. 7, no. 1, pp. 1–11. DOI:10.1038/s41598-017-06641-x
  54. Addicott M.A. et al. Smoking and the bandit: A preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. Experimental and clinical psychopharmacology, 2013. Vol. 21, no. 1, pp. 66–73. DOI:10.1037/a0030843
  55. Steyvers M., Lee M.D., Wagenmakers E.J. A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology, 2009. Vol. 53, no. 3, pp. 168–179. DOI:10.1016/j.jmp.2008.11.002
  56. Warren C.M. et al. The effect of atomoxetine on random and directed exploration in humans. PloS one, 2017. Vol. 12, no. 4, 17 p. DOI:10.1371/journal.pone.0176034
  57. Usher M. et al. The role of locus coeruleus in the regulation of cognitive performance. Science, 1999. Vol. 283, no. 5401, pp. 549–554. DOI:10.1126/science.283.5401.549
  58. Jepma M. et al. The role of the noradrenergic system in the exploration-exploitation trade-off: a pharmacological study. Frontiers in human neuroscience, 2010. Vol. 4, article ID 170, 13 p. DOI:10.3389/fnhum.2010.00170
  59. Laureiro‐Martínez D. et al. Understanding the exploration–exploitation dilemma: An fMRI study of attention control and decision‐making performance. Strategic Management Journal, 2015. Vol. 36, no. 3, pp. 319–338. DOI:10.1002/smj.2221
  60. Mehlhorn K. et al. Unpacking the exploration–exploitation tradeoff: A synthesis of human and animal literatures. Decision, 2015. Vol. 2, no. 3, pp. 191–215. DOI:10.1037/dec0000033
  61. Verdolin J.L. Meta-analysis of foraging and predation risk trade-offs in terrestrial systems. Behavioral Ecology and Sociobiology, 2006. Vol. 60, no. 4, pp. 457–464. DOI:10.1007/s00265-006-0172-6
  62. Yuki S., Okanoya K. Rats show adaptive choice in a metacognitive task with high uncertainty. Journal of Experimental Psychology: Animal Learning and Cognition, 2017. Vol. 43, no. 1, pp. 109–118. DOI:10.1037/xan0000130
  63. Zentall V.R. An animal model of human gambling based on pigeon suboptimal choice [Elektronnyi resurs]. Research & Reviews: Neuroscience, 2017. Vol. 1, no. 2, pp. 27–37. URL: https://pdfs.semanticscholar.org/f4ba/8ebce42ca058e780c9afb1322b7440bc8649.pdf (Accessed 05.06.2020).
  64. Zentall V.R. Suboptimal choice by pigeons: An analog of human gambling behavior. Behavioural processes, 2014. Vol. 103, pp. 156–164. DOI:10.1016/j.beproc.2013.11.004

Information About the Authors

Ksenia E. Sayfulina, Junior Researcher, Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology & Education, Moscow, Russia, ORCID: https://orcid.org/0000-0002-2017-0811, e-mail: kseniasayfulina@gmail.com

Galina L. Kozunova, PhD in Psychology, Centre for Neuro-Cognitive Studies (MEG-center), Moscow State University of Psychology and Education, Moscow, Russia, ORCID: https://orcid.org/0000-0002-1286-8654, e-mail: kozunovagl@mgppu.ru

Vladimir A. Medvedev, Junior Researcher, Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology & Education, Moscow, Russia, ORCID: https://orcid.org/0000-0002-3252-8809, e-mail: ixdon@yandex.ru

Anna M. Rytikova, PhD in Engineering, Junior Researcher, Center for Neurocognitive Research (MEG Center), Moscow State University of Psychology & Education, Moscow, Russia, ORCID: https://orcid.org/0000-0003-0153-9457, e-mail: ann.zelener@mail.ru

Boris V. Chernyshev, PhD in Biology, Head of Center for Neurocognitive Research (MEG-Center), Moscow State University of Psychology & Education, Associate Professor, Department of Psychology, National Research University Higher School of Economics; Associate Professor of the Department of Higher Nervous Activity, Lomonosov Moscow State University, Moscow, Russia, ORCID: https://orcid.org/0000-0002-8267-3916, e-mail: b_chernysh@mail.ru

Metrics

Views

Total: 1707
Previous month: 92
Current month: 54

Downloads

Total: 993
Previous month: 22
Current month: 4