Reinforcement learning in probabilistic environment and its role in human adaptive and maladaptive behavior

G.L. Kozunova

doi:10.17759/jmfp.2016050409

Journal of Modern Foreign Psychology
2016. Vol. 5, no. 4, 85–96
doi:10.17759/jmfp.2016050409
ISSN: 2304-4977 (online)

Reinforcement learning in probabilistic environment and its role in human adaptive and maladaptive behavior

1120

G.L. Kozunova

Abstract

The article discusses human training in conditions of partly uncertain outcomes of his/her actions that models one of the mechanisms of adaptive behavior in natural environment. Basic learning mechanisms are studied in details through modelling conditional reflexes of animals in experiments, where a certain behavior is reinforced similarly, immediately and repeatedly. At the same time, neurophysiological foundations of learning opportunities in humans under conditions of irregular or delayed reinforcements, despite increased interest to them in recent years, remain poorly studied. Research of mental and neuropsychiatric disorders has made a significant contribution to the development of this problem. Thus, the specific changes in some aspects of learning with probabilistic reinforcement were found in patients with Parkinson's disease, Tourette's syndrome, schizophrenia, depression, and anxiety disorders. In particular, it is shown that susceptibility to positive and negative reinforcement can be violated independently. Taking into consideration the pathogenetic mechanisms of these conditions, it can be concluded that the key structure for this type of training is the cingulate cortex and orbto-frontal cortex involved in bilateral interaction with underlying structures of striatal system, the limbic system and cores of reticular formations of the brain stem.

General Information

Keywords: reinforcement learning, uncertainty, prediction error, frontal cortex, dopamine, serotonin, norepinephrine, mental disorders

Journal rubric: Educational Psychology and Pedagogical Psychology

DOI: https://doi.org/10.17759/jmfp.2016050409

For citation: Kozunova G.L. Reinforcement learning in probabilistic environment and its role in human adaptive and maladaptive behavior [Elektronnyi resurs]. Sovremennaia zarubezhnaia psikhologiia = Journal of Modern Foreign Psychology, 2016. Vol. 5, no. 4, pp. 85–96. DOI: 10.17759/jmfp.2016050409. (In Russ., аbstr. in Engl.)

References

Sagvolden T. et al. A dynamic developmental theory of attention-deficit/hyperactivity disorder (ADHD) predominantly hyperactive/impulsive and combined subtypes. Behavioral and Brain Sciences, 2005. Vol. 28, no. 3, pp. 397–418. doi: 10.1017/S0140525X05000075
Steinberg E.E. et al. A causal link between prediction errors, dopamine neurons and learning. Nature neuroscience, 2013. Vol. 16, no. 3, pp. 966–973. doi: 10.1038/nn.3413
Qi J. et al. A glutamatergic reward input from the dorsal raphe to ventral tegmental area dopamine neurons. Nature communications, 2014. Vol. 5, art. 5390. doi: 10.1038/ncomms6390
Alloy L.B., Tabachnik N. Assessment of covariation by humans and animals: The joint influence of prior expectations and current situational information. Psychological review, 1984. Vol. 91, no. 1, pp. 112–149. doi: 10.1037/0033-295X.91.1.112
Der-Avakian A. et al. Assessment of reward responsiveness in the response bias probabilistic reward task in rats: implications for cross-species translational research. Translational psychiatry, 2013. Vol. 3, no. 8. doi: 10.1038/tp.2013.74
Aston-Jones G., Cohen J.D. An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance. Annual Review of Neuroscience, 2005. Vol. 28, pp. 403–450. doi: 10.1146/annurev.neuro.28.061604.135709
Balsam P.D., Drew M.R., Yang C. Timing at the start of associative learning. Learning and Motivation, 2002. Vol. 33, no. 1, pp. 141–155. doi: 10.1006/lmot.2001.1104
Bayer H.M., Glimcher P.W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 2005. Vol. 47, no. 1, pp. 129–141. doi: 10.1016/j.neuron.2005.05.020
Bayer H.M., Lau B., Glimcher P.W. Statistics of midbrain dopamine neuron spike trains in the awake primate. Journal of Neurophysiology, 2007. Vol. 98, no. 3, pp. 1428–1439. doi: 10.1152/jn.01140.2006
Bouret S., Richmond B.J. Sensitivity of locus ceruleus neurons to reward value for goal-directed actions. The Journal of Neuroscience, 2015. Vol. 35, no. 9, pp. 4005–4014. doi: 10.1523/JNEUROSCI.4553-14.2015
Bourgeois A., Chelazzi L., Vuilleumier P. How motivation and reward learning modulate selective attention. Progress in Brain Research, 2016. Vol. 229, pp. 325–342. doi: 10.1016/bs.pbr.2016.06.004
Cartoni E., Puglisi-Allegra S., Baldassarre G. The three principles of action: A Pavlovian-instrumental transfer hypothesis. Frontiers in behavioral neuroscience, 2013. Vol. 7, pp. 1–11. doi: 10.3389/fnbeh.2013.00153
Conway C.M., Christiansen M.H. Sequential learning in non-human primates. Trends in cognitive sciences, 2001. Vol. 5, no. 12, pp. 539–546. doi: 10.1016/S1364-6613(00)01800-3
Corbetta M., Patel G., Shulman G.L. The reorienting system of the human brain: From environment to theory of mind. Neuron, 2008. Vol. 58, no. 3, pp. 306–324. doi: 10.1016/j.neuron.2008.04.017
Cytawa J., Trojniar W. The state of pleasure and its role in instrumental conditioning. Activitas nervosa superior, 1976. Vol. 18, no. 1–2, pp. 92–96.
Dayan P., Berridge K.C. Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation. Cognitive, Affective, & Behavioral Neuroscience, 2014. Vol. 14, no. 2, pp. 473–492. doi: 10. /3758s13415-014-0277-8
Dickinson A., Watt A., Griffiths W.J.H. Free-operant acquisition with delayed reinforcement. Comparative and Physiological Psychology, 1992. Vol. 45, no. 3, pp. 241–258.
Heinz A. et al. Dimensional psychiatry: Mental disorders as dysfunctions of basic learning mechanisms. Journal of Neural Transmission, 2016. Vol. 123, no. 8, pp. 809–821. doi: 10.1007/s00702-016-1561-2
Roiser J.P. et al. Do patients with schizophrenia exhibit aberrant salience? Psychological medicine, 2009. Vol. 39, no. 2, pp. 199–209. doi: 10.1017/s0033291708003863
Liu Z. et al. Dorsal raphe neurons signal reward through 5-HT and glutamate. Neuron, 2014. Vol. 81, no. 6, pp. 1360–1374. doi: 10.1016/j.neuron.2014.02.010
Frank M.J., Seeberger L.C., O'reilly R.C. By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 2004. Vol. 306, no. 5703, pp. 1940–1943. doi: 10.1126/science.1102941
VanElzakker M.B. et al. From Pavlov to PTSD: The extinction of conditioned fear in rodents, humans, and anxiety disorders. Neurobiology of learning and memory, 2014. Vol. 113, pp. 3–18. doi: 10.1016/j.nlm.2013.11.014
Gallistel C.R., Fairhurst S., Balsam P. The learning curve: Implications of a quantitative analysis. Proceedings of the national academy of Sciences of the united States of america, 2004. Vol. 101, no. 36, pp. 13124-13131. doi: 10.1073/pnas.0404965101
Gershman S.J. A Unifying Probabilistic View of Associative Learning. PLoS Computational Biology, 2015. Vol. 11, no. 11, pp. 1–20. doi: 10.1371/journal.pcbi.1004567
Guillin O., Abi‐Dargham A., Laruelle M. Neurobiology of dopamine in schizophrenia. International review of neurobiology, 2007. Vol. 78, pp. 1–39. doi: 10.1016/S0074-7742(06)78001-1
Hinson J.M., Staddon J.E.R. Matching, maximizing, and hill‐climbing. Journal of the experimental analysis of behavior, 1983. Vol. 40, no. 3, pp. 321–331. doi: 10.1901/jeab.1983.40-321
Hofmeister J., Sterpenich V. A role for the locus ceruleus in reward processing: encoding behavioral energy required for goal-directed actions. The Journal of Neuroscience, 2015. Vol. 35, no. 29, pp. 10387–10389. doi: 10.1523/JNEUROSCI.1734-15.2015
Holroyd C.B., Coles M.G.H. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review, 2002. Vol. 109, no. 4, pp. 679–709. doi: 10.1037/0033-295X.109.4.679
Homberg J.R. Serotonin and decision making processes. Neuroscience & Biobehavioral Reviews, 2012. Vol. 36, no. 1, pp. 218–236. doi: 10.1016/j.neubiorev.2011.06.001
Kirkpatrick K., Balsam P.D. Associative learning and timing. Current opinion in behavioral sciences, 2016. Vol. 8, pp. 181–185. doi: 10.1016/j.cobeha.2016.02.023
Ma W.J., Jazayeri M. Neural coding of uncertainty and probability. Annual review of neuroscience, 2014. Vol. 37, pp. 205–220. doi: 10.1146/annurev-neuro-071013-014017
Maia T.V., Frank M.J. From reinforcement learning models to psychiatric and neurological disorders. Nature neuroscience, 2011. Vol. 14, no. 2, pp. 154–162. doi: 10.1038/nn.2723
Molet M., Miller R.R. Timing: An attribute of associative learning. Behavioural processes, 2014. Vol. 101, pp. 4–14. doi: 10.1016/j.beproc.2013.05.015
Crone E.A. et al. Neural mechanisms supporting flexible performance adjustment during development. Cognitive, Affective, & Behavioral Neuroscience, 2008. Vol. 8, no. 2, pp. 165–177. doi: 10.3758/CABN.8.2.165
Garbusow M. et al. Pavlovian-to-instrumental transfer in alcohol dependence: A pilot study. Neuropsychobiology, 2014. Vol. 70, no. 2, pp. 111–121. doi: 10.1159/000363507
Palminteri S. et al. Pharmacological modulation of subliminal learning in Parkinson's and Tourette's syndromes. Proceedings of the National Academy of Sciences, 2009. Vol. 106, no. 45, pp. 19179–19184. doi: 10.1073/pnas.0904035106
Reddy L.F. et al. Probabilistic reversal learning in schizophrenia: Stability of deficits and potential causal mechanisms. Schizophrenia bulletin, 2016. Vol. 42, no. 4, pp. 942–951. doi: 10.1093/schbul/sbv226
Nieuwenhuis S. et al. Reinforcement-related brain potentials from medial frontal cortex: Origins and functional significance. Neuroscience & Biobehavioral Reviews, 2004. Vol. 28, no. 4, pp. 441–448. doi: 10.1016/j.neubiorev.2004.05.003
Robinson J.S. Stimulus substitution and response learning in the earthworm. Journal of comparative and physiological psychology, 1953. Vol. 46, no. 4, pp. 262–266. doi: 10.1037/h0056151
Saffran J.R., Aslin R.N., Newport E.L. Statistical learning by 8-month-old infants. Science. 1996. Vol. 274, no. 5294, pp. 1926–1928.
Schultz W. Predictive reward signal of dopamine neurons. Journal of neurophysiology, 1998. Vol. 80, no. 1, pp. 1–27.
Izquierdo A. et al. The neural basis of reversal learning: An updated perspective. Neuroscience, 2016. doi: 10.1016/j.neuroscience.2016.03.021
Ferdinand N.K. et al. The processing of unexpected positive response outcomes in the mediofrontal cortex. The Journal of Neuroscience, 2012. Vol. 32, no. 35, pp. 12087–12092. doi: 10.1523/JNEUROSCI.1410-12.2012
Thorndike E.L. Animal intelligence: Experimental studies. Transaction Publishers, 1965.
Walsh M.M., Anderson J.R. Learning from delayed feedback: Neural responses in temporal credit assignment. Cognitive, Affective, & Behavioral Neuroscience, 2011. Vol. 11, no. 2, pp. 131–143. doi: 10.3758/s13415-011-0027-0
Weismüller B., Bellebaum C. Expectancy affects the feedback‐related negativity (FRN) for delayed feedback in probabilistic learning. Psychophysiology, 2016. Vol. 53, no. 11, pp. 1739–1750. doi: 10.1111/psyp.12738
Wolford G., Miller M.B., Gazzaniga M. The left hemisphere’s role in hypothesis formation [Electronic resource]. Journal of Neuroscience, 2000. Vol. 20, no. 6, pp. 1–4. URL: http://www.jneurosci.org/content/jneuro/20/6/RC64.full.pdf (Accessed 27.12.2016).
Yellott J.I. Probability learning with noncontingent success. Journal of mathematical psychology, 1969. Vol. 6, no. 3, pp. 541–575. doi: 10.1016/0022-2496(69)90023-6

Information About the Authors

Galina L. Kozunova, Candidate of Science (Psychology), Centre for Neuro-Cognitive Studies (MEG-center), Moscow State University of Psychology and Education, Moscow, Russian Federation, ORCID: https://orcid.org/0000-0002-1286-8654, e-mail: kozunovagl@mgppu.ru

Metrics

Web Views

Whole time: 1890
Previous month: 30
Current month: 32

PDF Downloads

Whole time: 1120
Previous month: 4
Current month: 1

Total

Whole time: 3010
Previous month: 34
Current month: 33

PlumX

article metrics