Strategies for navigating a dynamic world

doi:10.1126/science.abd7258

GSTDTAP > 气候变化

DOI	10.1126/science.abd7258
	Strategies for navigating a dynamic world
	Saurabh Steixner-Kumar; Jan Gläscher
	2020-08-28
发表期刊	Science
出版年	2020
英文摘要	One of the most difficult problems for an adaptable agent is gauging how to behave in a nonstationary environment. When conditions are stable, an organism generally pursues a strategy known to provide the best outcome. However, when environmental conditions change, an organism abandons the current action plan and searches for a new best option. The most challenging aspect of this search—calculating the exact time point at which to change strategies—requires the brain to integrate past and present observations and evaluate whether they remain consistent with current environmental conditions. On page 1076 of this issue, Domenech et al. ([ 1 ][1]) report on the modeling of rare direct electrical recordings from the prefrontal cortices (PFCs) of a small group of human epilepsy patients as they flexibly negotiated a nonstationary environment. To understand the brain's mode of navigation, consider for example a sailor at sea (see the figure). The winds and the currents determine the waves that drive the sailor to continuously adjust the rudder so as to stay on course. By observing the wave patterns, he can anticipate the navigational effects of his actions and adapt accordingly. But when the currents or the weather changes, the sailor must adapt his course to reach the next port of call. At that time, the sailor observes essentially the same stimulus (the waves) but has to remap his action plan (rudder adjustments) to the new wind conditions and currents. This difficult-decision problem—how to detect and then adapt to a nonstationary environment—is captured perfectly in the exploration-exploitation dilemma: When should I stop exploiting my current action plan and start exploring different ways to reach my goals? An optimal solution tracks the discounted sum of normalized future rewards. However, this approach applies strictly to stationary environments and thus does not capture the dynamic changes that organisms encounter in their daily lives ([ 2 ][2]). Yet the human brain and those of other species seem to smoothly solve the exploration-exploitation dilemma in nonstationary environments. Decision neuroscience has investigated the flexible adaptation to changing environmental contingencies with diverse experimental paradigms and assorted computational models. The simplest paradigm is probabilistic reversal learning, in which the agent has to search for reward among two options with complementary reward probabilities. This adaptation problem can be solved by hidden Markov models ([ 3 ][3]), which are well-approximated by reinforcement learning (RL) models that also update nonchosen actions ([ 4 ][4]). Extension of this paradigm to include independently changing reward probabilities reveals two distinct neural responses: Expected-value signals, which reflect “exploitative” choices, spur activation of the ventromedial prefrontal cortex (vmPFC); and “explorative” choices (that is, the choosing of a currently lesser valued option) activate the frontopolar cortex ([ 5 ][5]). ![Figure][6] A sailor solves a dilemma at sea As the ship nears bad weather, the sailor's ventromedial prefrontal cortex (vmPFC) evaluates the ongoing (orange) action plan (exploitation) and the prospective (brown, red) plans (exploration). Once the red (calm waters) plan is exploited, the sailor's dorsomedial PFC (dmPFC) uses trial-and-error learning to map the proper rudder adjustments. GRAPHIC: A. KITTERMAN/ SCIENCE Another task with both rapid and slow changes in the reward probabilities of various options was used to develop a hierarchical Bayesian model that estimates the volatility of the environment and adjusts the learning rate accordingly ([ 6 ][7]). This model has found its generalization in the hierarchical Gaussian filter (HGF) framework ([ 7 ][8]), which is widely used in modeling social and nonsocial human decision-making in nonstationary environments. Although these computational modeling frameworks differ, all are trying to solve similar problems: How to infer the latent structure of the world from discrete observations and how to detect transitions between different states of the world. Domenech et al. address the same problems with yet another experimental paradigm, this one carried out with a small group of human epilepsy patients. Electrodes deeply implanted in the patients' PFCs delivered direct electrical recordings from the vmPFC and dorsomedial PFC (dmPFC) while the patients performed a multioption decision task. The participants had to associate three different stimuli with three distinct actions, thus constituting an action plan. The mapping changed every 33 to 57 trials, and participants had to relearn the association of the same stimuli with a different combination of actions, much like our sailor at sea who faces changes in weather and currents that alter wave patterns. The computational model ([ 8 ][9]) generates a reliability value for the ongoing action plan and other concurrently monitored plans. When the ongoing action plan is deemed reliable, the model is in “exploitation” mode and learns the stimulus-action mapping through RL mechanisms. When the ongoing action plan is deemed unreliable, the model switches to “exploration” mode. New provisional action plans are created and evaluated, until one emerges as a reliable predictor for successful stimulus-action mapping (see the figure). Using a state-of-the-art model-based analysis that associates the model-derived variables with the brain activity in various frequency bands of the neural recordings, the authors found a delicate interplay between the vmPFC and dmPFC that supports a predictive coding interpretation for resolution of the exploration-exploitation dilemma. vmPFC monitors and represents the reliability of the ongoing action plan. vmPFC relays the ongoing action plan to the dmPFC as either a “stay” or “switch” trial. A stay trial triggers additional learning through RL mechanisms in the dmPFC. In contrast, the dmPFC responds to a switch trial by suppressing activity related to maintaining the ongoing action plan. These findings resonate with and extend earlier results obtained with functional neuroimaging ([ 5 ][5], [ 9 ][10]). These computational approaches to the problem of behavioral flexibility in a nonstationary environment share one commonality: They are all building a model of the environment and the transition therein, either explicitly (as in the HGF framework) or implicitly (by evaluating the ongoing action plan, as in the Domenech et al. study). Although all of these models strive for generality, each was developed for a specific experimental context. It remains to be seen which of these provides the best account of flexible decision-making in humans and other species, preferably using a unified experimental paradigm. A model-free RL account ([ 10 ][11]) likely will not suffice, as several studies have demonstrated the superiority of more-complex models over this “vanilla” RL model. Rather, an agent requires a rich representation of the environment and its dynamic transitions (often referred to as model-based learning) ([ 10 ][11]) to solve the exploration-exploitation dilemma and flexibly respond to a changing world. 1. [↵][12]1. P. Domenech, 2. S. Rheims, 3. E. Koechlin , Science 369, eabb0184 (2020). [OpenUrl][13][Abstract/FREE Full Text][14] 2. [↵][15]1. J. D. Cohen, 2. S. M. McClure, 3. A. J. Yu , Philos. Trans. R. Soc. London Ser. B 362, 933 (2007). [OpenUrl][16][CrossRef][17][PubMed][18] 3. [↵][19]1. A. N. Hampton, 2. P. Bossaerts, 3. J. P. O'Doherty , J. Neurosci. 26, 8360 (2006). [OpenUrl][20][Abstract/FREE Full Text][21] 4. [↵][22]1. J. Gläscher, 2. A. N. Hampton, 3. J. P. O'Doherty , Cereb. Cortex 19, 483 (2009). [OpenUrl][23][CrossRef][24][PubMed][25][Web of Science][26] 5. [↵][27]1. N. D. Daw, 2. J. P. O'Doherty, 3. P. Dayan, 4. B. Seymour, 5. R. J. Dolan , Nature 441, 876 (2006). [OpenUrl][28][CrossRef][29][PubMed][30][Web of Science][31] 6. [↵][32]1. T. E. J. Behrens, 2. M. W. Woolrich, 3. M. E. Walton, 4. M. F. S. Rushworth , Nat. Neurosci. 10, 1214 (2007). [OpenUrl][33][CrossRef][34][PubMed][35][Web of Science][36] 7. [↵][37]1. C. Mathys, 2. J. Daunizeau, 3. K. J. Friston, 4. K. E. Stephan , Front. Hum. Neurosci. 5, 39 (2011). [OpenUrl][38][CrossRef][39][PubMed][40] 8. [↵][41]1. A. Collins, 2. E. Koechlin , PLOS Biol. 10, e1001293 (2012). [OpenUrl][42][CrossRef][43][PubMed][44] 9. [↵][45]1. M. Donoso, 2. A. G. E. Collins, 3. E. Koechlin , Science 344, 1481 (2014). [OpenUrl][46][Abstract/FREE Full Text][47] 10. [↵][48]1. N. D. Daw, 2. P. Dayan , Philos. Trans. R. Soc. London Ser. B 369, 20130478 (2014). [OpenUrl][49][CrossRef][50][PubMed][51] [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: pending:yes [7]: #ref-6 [8]: #ref-7 [9]: #ref-8 [10]: #ref-9 [11]: #ref-10 [12]: #xref-ref-1-1 "View reference 1 in text" [13]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DDomenech%26rft.auinit1%253DP.%26rft.volume%253D369%26rft.issue%253D6507%26rft.spage%253Deabb0184%26rft.epage%253Deabb0184%26rft.atitle%253DNeural%2Bmechanisms%2Bresolving%2Bexploitation-exploration%2Bdilemmas%2Bin%2Bthe%2Bmedial%2Bprefrontal%2Bcortex%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.abb0184%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [14]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNjkvNjUwNy9lYWJiMDE4NCI7czo0OiJhdG9tIjtzOjIzOiIvc2NpLzM2OS82NTA3LzEwNTYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [15]: #xref-ref-2-1 "View reference 2 in text" [16]: {openurl}?query=rft.jtitle%253DPhilosophical%2BTransactions%2Bof%2Bthe%2BRoyal%2BSociety%2BB%253A%2BBiological%2BSciences%26rft.stitle%253DPhil%2BTrans%2BR%2BSoc%2BB%26rft.aulast%253DCohen%26rft.auinit1%253DJ.%2BD%26rft.volume%253D362%26rft.issue%253D1481%26rft.spage%253D933%26rft.epage%253D942%26rft.atitle%253DShould%2BI%2Bstay%2Bor%2Bshould%2BI%2Bgo%253F%2BHow%2Bthe%2Bhuman%2Bbrain%2Bmanages%2Bthe%2Btrade-off%2Bbetween%2Bexploitation%2Band%2Bexploration%26rft_id%253Dinfo%253Adoi%252F10.1098%252Frstb.2007.2098%26rft_id%253Dinfo%253Apmid%252F17395573%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [17]: /lookup/external-ref?access_num=10.1098/rstb.2007.2098&link_type=DOI [18]: /lookup/external-ref?access_num=17395573&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom [19]: #xref-ref-3-1 "View reference 3 in text" [20]: {openurl}?query=rft.jtitle%253DJournal%2Bof%2BNeuroscience%26rft.stitle%253DJ.%2BNeurosci.%26rft.aulast%253DHampton%26rft.auinit1%253DA.%2BN.%26rft.volume%253D26%26rft.issue%253D32%26rft.spage%253D8360%26rft.epage%253D8367%26rft.atitle%253DThe%2BRole%2Bof%2Bthe%2BVentromedial%2BPrefrontal%2BCortex%2Bin%2BAbstract%2BState-Based%2BInference%2Bduring%2BDecision%2BMaking%2Bin%2BHumans%26rft_id%253Dinfo%253Adoi%252F10.1523%252FJNEUROSCI.1010-06.2006%26rft_id%253Dinfo%253Apmid%252F16899731%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [21]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Njoiam5ldXJvIjtzOjU6InJlc2lkIjtzOjEwOiIyNi8zMi84MzYwIjtzOjQ6ImF0b20iO3M6MjM6Ii9zY2kvMzY5LzY1MDcvMTA1Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30= [22]: #xref-ref-4-1 "View reference 4 in text" [23]: {openurl}?query=rft.jtitle%253DCereb.%2BCortex%26rft_id%253Dinfo%253Adoi%252F10.1093%252Fcercor%252Fbhn098%26rft_id%253Dinfo%253Apmid%252F18550593%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [24]: /lookup/external-ref?access_num=10.1093/cercor/bhn098&link_type=DOI [25]: /lookup/external-ref?access_num=18550593&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom [26]: /lookup/external-ref?access_num=000262518800023&link_type=ISI [27]: #xref-ref-5-1 "View reference 5 in text" [28]: {openurl}?query=rft.jtitle%253DNature%26rft.stitle%253DNature%26rft.aulast%253DDaw%26rft.auinit1%253DN.%2BD.%26rft.volume%253D441%26rft.issue%253D7095%26rft.spage%253D876%26rft.epage%253D879%26rft.atitle%253DCortical%2Bsubstrates%2Bfor%2Bexploratory%2Bdecisions%2Bin%2Bhumans.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnature04766%26rft_id%253Dinfo%253Apmid%252F16778890%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [29]: /lookup/external-ref?access_num=10.1038/nature04766&link_type=DOI [30]: /lookup/external-ref?access_num=16778890&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom [31]: /lookup/external-ref?access_num=000238254100043&link_type=ISI [32]: #xref-ref-6-1 "View reference 6 in text" [33]: {openurl}?query=rft.jtitle%253DNature%2Bneuroscience%26rft.stitle%253DNat%2BNeurosci%26rft.aulast%253DBehrens%26rft.auinit1%253DT.%2BE.%26rft.volume%253D10%26rft.issue%253D9%26rft.spage%253D1214%26rft.epage%253D1221%26rft.atitle%253DLearning%2Bthe%2Bvalue%2Bof%2Binformation%2Bin%2Ban%2Buncertain%2Bworld.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnn1954%26rft_id%253Dinfo%253Apmid%252F17676057%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [34]: /lookup/external-ref?access_num=10.1038/nn1954&link_type=DOI [35]: /lookup/external-ref?access_num=17676057&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom [36]: /lookup/external-ref?access_num=000249144000025&link_type=ISI [37]: #xref-ref-7-1 "View reference 7 in text" [38]: {openurl}?query=rft.stitle%253DFront%2BHum%2BNeurosci%26rft.aulast%253DMathys%26rft.auinit1%253DC.%26rft.volume%253D5%26rft.spage%253D39%26rft.epage%253D39%26rft.atitle%253DA%2Bbayesian%2Bfoundation%2Bfor%2Bindividual%2Blearning%2Bunder%2Buncertainty.%26rft_id%253Dinfo%253Adoi%252F10.3389%252Ffnhum.2011.00039%26rft_id%253Dinfo%253Apmid%252F21629826%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [39]: /lookup/external-ref?access_num=10.3389/fnhum.2011.00039&link_type=DOI [40]: /lookup/external-ref?access_num=21629826&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom [41]: #xref-ref-8-1 "View reference 8 in text" [42]: {openurl}?query=rft.jtitle%253DPLoS%2Bbiology%26rft.stitle%253DPLoS%2BBiol%26rft.aulast%253DCollins%26rft.auinit1%253DA.%26rft.volume%253D10%26rft.issue%253D3%26rft.spage%253De1001293%26rft.epage%253De1001293%26rft.atitle%253DReasoning%252C%2Blearning%252C%2Band%2Bcreativity%253A%2Bfrontal%2Blobe%2Bfunction%2Band%2Bhuman%2Bdecision-making.%26rft_id%253Dinfo%253Adoi%252F10.1371%252Fjournal.pbio.1001293%26rft_id%253Dinfo%253Apmid%252F22479152%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [43]: /lookup/external-ref?access_num=10.1371/journal.pbio.1001293&link_type=DOI [44]: /lookup/external-ref?access_num=22479152&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom [45]: #xref-ref-9-1 "View reference 9 in text" [46]: {openurl}?query=rft.jtitle%253DScience%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.1252254%26rft_id%253Dinfo%253Apmid%252F24876345%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [47]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNDQvNjE5MS8xNDgxIjtzOjQ6ImF0b20iO3M6MjM6Ii9zY2kvMzY5LzY1MDcvMTA1Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30= [48]: #xref-ref-10-1 "View reference 10 in text" [49]: {openurl}?query=rft.jtitle%253DPhilos.%2BTrans.%2BR.%2BSoc.%2BLondon%2BSer.%2BB%26rft_id%253Dinfo%253Adoi%252F10.1098%252Frstb.2013.0478%26rft_id%253Dinfo%253Apmid%252F25267820%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [50]: /lookup/external-ref?access_num=10.1098/rstb.2013.0478&link_type=DOI [51]: /lookup/external-ref?access_num=25267820&link_type=MED&atom=%2Fsci%2F369%2F6507%2F1056.atom
领域	气候变化 ; 资源环境
URL	查看原文
引用统计
文献类型	期刊论文
条目标识符	http://119.78.100.173/C666/handle/2XK7JSWQ/293214
专题	气候变化资源环境科学
推荐引用方式 GB/T 7714	Saurabh Steixner-Kumar,Jan Gläscher. Strategies for navigating a dynamic world[J]. Science,2020.
APA	Saurabh Steixner-Kumar,&Jan Gläscher.(2020).Strategies for navigating a dynamic world.Science.
MLA	Saurabh Steixner-Kumar,et al."Strategies for navigating a dynamic world".Science (2020).