1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
      <samp id="qm3rj"></samp>
      <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

      <video id="qm3rj"><code id="qm3rj"></code></video>

        1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
            1. 2.845

              2023影響因子

              (CJCR)

              • 中文核心
              • EI
              • 中國科技核心
              • Scopus
              • CSCD
              • 英國科學(xué)文摘

              留言板

              尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復。謝謝您的支持!

              姓名
              郵箱
              手機號碼
              標題
              留言?xún)热?/th>
              驗證碼

              逆強化學(xué)習算法、理論與應用研究綜述

              宋莉 李大字 徐昕

              宋莉, 李大字, 徐昕. 逆強化學(xué)習算法、理論與應用研究綜述. 自動(dòng)化學(xué)報, 2024, 50(9): 1704?1723 doi: 10.16383/j.aas.c230081
              引用本文: 宋莉, 李大字, 徐昕. 逆強化學(xué)習算法、理論與應用研究綜述. 自動(dòng)化學(xué)報, 2024, 50(9): 1704?1723 doi: 10.16383/j.aas.c230081
              Song Li, Li Da-Zi, Xu Xin. A survey of inverse reinforcement learning algorithms, theory and applications. Acta Automatica Sinica, 2024, 50(9): 1704?1723 doi: 10.16383/j.aas.c230081
              Citation: Song Li, Li Da-Zi, Xu Xin. A survey of inverse reinforcement learning algorithms, theory and applications. Acta Automatica Sinica, 2024, 50(9): 1704?1723 doi: 10.16383/j.aas.c230081

              逆強化學(xué)習算法、理論與應用研究綜述

              doi: 10.16383/j.aas.c230081 cstr: 32138.14.j.aas.c230081
              基金項目: 國家自然科學(xué)基金(62273026) 資助
              詳細信息
                作者簡(jiǎn)介:

                宋莉:北京化工大學(xué)信息科學(xué)與技術(shù)學(xué)院博士研究生. 主要研究方向為強化學(xué)習, 深度學(xué)習, 逆強化學(xué)習. E-mail: slili516@foxmail.com

                李大字:北京化工大學(xué)信息科學(xué)與技術(shù)學(xué)院教授. 主要研究方向為機器學(xué)習與人工智能, 先進(jìn)控制, 分數階系統, 復雜系統建模與優(yōu)化. 本文通信作者. E-mail: lidz@mail.buct.edu.cn

                徐昕:國防科技大學(xué)智能科學(xué)學(xué)院教授. 主要研究方向為智能控制, 強化學(xué)習, 機器學(xué)習, 機器人和智能車(chē)輛. E-mail: xinxu@nudt.edu.cn

              A Survey of Inverse Reinforcement Learning Algorithms, Theory and Applications

              Funds: Supported by National Natural Science Foundation of China (62273026)
              More Information
                Author Bio:

                SONG Li Ph.D. candidate at the College of Information Science and Technology, Beijing University of Chemical Technology. Her research interest covers reinforcement learning, deep learning, and inverse reinforcement learning

                LI Da-Zi Professor at the College of Information Science and Technology, Beijing University of Chemical Technology. Her research interest covers machine learning and artificial intelligence, advanced control, fractional order systems, and complex system modeling and optimization. Corresponding author of this paper

                XU Xin Professor at the College of Intelligence Science and Technology, National University of Defense Technology. His research interest covers intelligent control, reinforcement learning, machine learning, robotics, and autonomous vehicles

              • 摘要: 隨著(zhù)高維特征表示與逼近能力的提高, 強化學(xué)習(Reinforcement learning, RL)在博弈與優(yōu)化決策、智能駕駛等現實(shí)問(wèn)題中的應用也取得顯著(zhù)進(jìn)展. 然而強化學(xué)習在智能體與環(huán)境的交互中存在人工設計獎勵函數難的問(wèn)題, 因此研究者提出了逆強化學(xué)習(Inverse reinforcement learning, IRL)這一研究方向. 如何從專(zhuān)家演示中學(xué)習獎勵函數和進(jìn)行策略?xún)?yōu)化是一個(gè)重要的研究課題, 在人工智能領(lǐng)域具有十分重要的研究意義. 本文綜合介紹了逆強化學(xué)習算法的最新進(jìn)展, 首先介紹了逆強化學(xué)習在理論方面的新進(jìn)展, 然后分析了逆強化學(xué)習面臨的挑戰以及未來(lái)的發(fā)展趨勢, 最后討論了逆強化學(xué)習的應用進(jìn)展和應用前景.
              • 圖  1  強化學(xué)習模型

                Fig.  1  Model of reinforcement learning

                圖  2  MDP ((a)和(c)是確定性MDP;(b)和(d)是隨機性MDP)

                Fig.  2  MDP ((a) and (c) are the deterministic MDP; (b) and (d) are the stochastic MDP)

                圖  3  RL、IRL、BC的算法框架

                Fig.  3  Frameworks for RL, IRL, BC

                圖  4  逆強化學(xué)習算法分類(lèi)

                Fig.  4  Classification of IRL algorithms

                圖  5  貝葉斯逆強化學(xué)習模型

                Fig.  5  Bayesian inverse reinforcement learning model

                圖  6  深度學(xué)徒學(xué)習模型結構

                Fig.  6  Model structure of deep apprenticeship learning

                圖  7  最大熵深度逆強化學(xué)習的結構

                Fig.  7  Structure of maximum entropy deep inverse reinforcement learning

                圖  8  基于序列專(zhuān)家演示的逆強化學(xué)習進(jìn)程

                Fig.  8  The inverse reinforcement learning process based on sequential expert demonstration

                圖  9  估計獎勵函數的神經(jīng)網(wǎng)絡(luò )模型結構

                Fig.  9  Structure of the neural network model for estimating the reward function

                圖  10  多尺度全卷積網(wǎng)絡(luò )架構

                Fig.  10  Multi-scale fully convolutional network architecture

                圖  11  非線(xiàn)性逆強化學(xué)習框架

                Fig.  11  Framework of nonlinear inverse reinforcement learning

                圖  12  利用深度最大熵逆強化學(xué)習軌跡規劃結構圖

                Fig.  12  Structure of trajectory planning using deep maximum entropy IRL

                圖  13  機械臂的卷積神經(jīng)網(wǎng)絡(luò )結構

                Fig.  13  Convolutional neural network structure for robotic arm

                表  1  逆強化學(xué)習算法的研究歷程

                Table  1  Timeline of inverse reinforcement learning algorithm

                逆強化學(xué)習算法面臨的挑戰解決的問(wèn)題作者 (年份)
                有限和大狀態(tài)空間的MDP/R問(wèn)題Ng等[9] (2000)
                線(xiàn)性求解MDP/R問(wèn)題Abbeel等[11] (2004)
                基于邊際的逆強化學(xué)習模糊歧義策略的最大化結構與預測問(wèn)題Ratliff等[12] (2006)
                復雜多維任務(wù)問(wèn)題Bogdanovic等[22] (2015)
                現實(shí)任務(wù)的適用性問(wèn)題Hester等[23] (2018)
                基于貝葉斯的逆強化學(xué)習先驗知識的選取難、計算復雜結合先驗知識和專(zhuān)家數據推導獎勵的概率分布問(wèn)題Ramachandran等[21] (2007)
                基于概率的逆強化學(xué)習在復雜動(dòng)態(tài)環(huán)境中適應性差最大熵約束下的特征匹配問(wèn)題Ziebart等[13] (2008)
                轉移函數未知的MDP/R問(wèn)題Boularias等[14] (2011)
                基于高斯過(guò)程的逆強化學(xué)習計算復雜獎勵的非線(xiàn)性求解問(wèn)題Levine[19]等 (2011)
                基于最大熵的深度逆強化學(xué)習 計算復雜、過(guò)擬合、專(zhuān)家
                演示數據不平衡、有限
                從人類(lèi)駕駛演示中學(xué)習復雜城市環(huán)境中獎勵的問(wèn)題Wulfmeier等[15] (2016)
                從數據中提取策略的對抗性逆強化學(xué)習問(wèn)題Ho等[18] (2016)
                多個(gè)獎勵稀疏分散的線(xiàn)性可解非確定性MDP/R問(wèn)題Budhraja等[59] (2017)
                自動(dòng)駕駛車(chē)輛在交通中的規劃問(wèn)題You等[38] (2019)
                無(wú)模型積分逆RL的獎勵問(wèn)題Lian等[70] (2021)
                利用最大因果熵推斷獎勵函數的問(wèn)題Gleave等[94] (2022)
                基于神經(jīng)網(wǎng)絡(luò )的逆強化學(xué)習過(guò)擬合、不穩定具有大規模高維狀態(tài)空間的自動(dòng)導航的IRL問(wèn)題Chen等[62] (2019)
                下載: 導出CSV

                表  2  逆強化學(xué)習算法的比較

                Table  2  Comparison of inverse reinforcement learning algorithms

                逆強化學(xué)習算法獎勵值函數
                ALIRL[11]38.7932.66
                FIRL[27]31.895.22
                GPIRL[19]2.660.42
                MWAL[95]206.4443.32
                MMP[12]38.3834.20
                MMPBoost[30]31.5623.56
                MEIRL[13]36.3613.12
                下載: 導出CSV
                1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
                  <samp id="qm3rj"></samp>
                  <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

                  <video id="qm3rj"><code id="qm3rj"></code></video>

                    1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
                        亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页
                      1. [1] 柴天佑. 工業(yè)人工智能發(fā)展方向. 自動(dòng)化學(xué)報, 2020, 46(10): 2005?2012 doi: 10.16383/j.aas.c200796

                        Chai Tian-You. Development directions of industrial artificial intelligence. Acta Automatica Sinica, 2020, 46(10): 2005?2012 doi: 10.16383/j.aas.c200796
                        [2] Dai X Y, Zhao C, Li X S, Wang X, Wang F Y. Traffic signal control using offline reinforcement learning. In: Proceedings of the China Automation Congress (CAC). Beijing, China: IEEE, 2021. 8090?8095
                        [3] Li J N, Ding J L, Chai T Y, Lewis F L. Nonzero-sum game reinforcement learning for performance optimization in large-scale industrial processes. IEEE Transactions on Cybernetics, 2020, 50(9): 4132?4145 doi: 10.1109/TCYB.2019.2950262
                        [4] 趙冬斌, 邵坤, 朱圓恒, 李棟, 陳亞冉, 王海濤, 等. 深度強化學(xué)習綜述: 兼論計算機圍棋的發(fā)展. 控制理論與應用, 2016, 33(6): 701?717

                        Zhao Dong-Bin, Shao Kun, Zhu Yuan-Heng, Li Dong, Chen Ya-Ran, Wang Hai-Tao, et al. Review of deep reinforcement learning and discussions on the development of computer Go. Control Theory & Applications, 2016, 33(6): 701?717
                        [5] Song T H, Li D Z, Yang W M, Hirasawa K. Recursive least-squares temporal difference with gradient correction. IEEE Transactions on Cybernetics, 2021, 51(8): 4251?4264 doi: 10.1109/TCYB.2019.2902342
                        [6] Bain M, Sammut C. A framework for behavioural cloning. Machine Intelligence 15: Intelligent Agents. Oxford: Oxford University, 1995. 103?129
                        [7] Couto G C K, Antonelo E A. Generative adversarial imitation learning for end-to-end autonomous driving on urban environments. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI). Orlando, USA: IEEE, 2021. 1?7
                        [8] Samak T V, Samak C V, Kandhasamy S. Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning. SAE International Journal of Connected and Automated Vehicles, 2021, 4(3): 279?295
                        [9] Ng A Y, Russell S J. Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML). Stanford, USA: ACM, 2000. 663?670
                        [10] Imani M, Ghoreishi S F. Scalable inverse reinforcement learning through multifidelity Bayesian optimization. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 4125?4132 doi: 10.1109/TNNLS.2021.3051012
                        [11] Abbeel P, Ng A Y. Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML). Ban, Canada: ACM, 2004. 1?8
                        [12] Ratliff N D, Bagnell J A, Zinkevich M A. Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning (ICML). Pittsburgh, USA: ACM, 2006. 729?736
                        [13] Ziebart B D, Maas A, Bagnell J A, Dey A K. Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI). Chicago, USA: AAAI, 2008. 1433?1438
                        [14] Boularias A, Kober J, Peters J. Relative entropy inverse reinforcement learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). Fort Lauderdale, USA: 2011. 182?189
                        [15] Wulfmeier M, Wang D Z, Posner I. Watch this: Scalable cost-function learning for path planning in urban environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea (South): IEEE, 2016. 2089?2095
                        [16] Guo H Y, Chen Q X, Xia Q, Kang C Q. Deep inverse reinforcement learning for objective function identification in bidding models. IEEE Transactions on Power Systems, 2021, 36(6): 5684?5696 doi: 10.1109/TPWRS.2021.3076296
                        [17] Shi Y C, Jiu B, Yan J K, Liu H W, Li K. Data-driven simultaneous multibeam power allocation: When multiple targets tracking meets deep reinforcement learning. IEEE Systems Journal, 2021, 15(1): 1264?1274 doi: 10.1109/JSYST.2020.2984774
                        [18] Ho J, Ermon S. Generative adversarial imitation learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS). Barcelona, Spain: Curran Associates Inc., 2016. 4572?4580
                        [19] Levine S, Popovi? Z, Koltun V. Nonlinear inverse reinforcement learning with Gaussian processes. In: Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS). Granada, Spain: Curran Associates Inc., 2011. 19?27
                        [20] Liu J H, Huang Z H, Xu X, Zhang X L, Sun S L, Li D Z. Multi-kernel online reinforcement learning for path tracking control of intelligent vehicles. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(11): 6962?6975 doi: 10.1109/TSMC.2020.2966631
                        [21] Ramachandran D, Amir E. Bayesian inverse reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI). Hyderabad, India: Morgan Kaufmann, 2007. 2586?2591
                        [22] Bogdanovic M, Markovikj D, Denil M, de Freitas N. Deep apprenticeship learning for playing video games. In: Proceedings of the AAAI Workshop on Learning for General Competency in Video Games. Austin, USA: AAAI, 2015. 7?9
                        [23] Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, et al. Deep Q-learning from demonstrations. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI, 2018. Article No. 394
                        [24] Nguyen H T, Garratt M, Bui L T, Abbass H. Apprenticeship learning for continuous state spaces and actions in a swarm-guidance shepherding task. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI). Xiamen, China: IEEE, 2019. 102?109
                        [25] Hwang M, Jiang W C, Chen Y J. A critical state identification approach to inverse reinforcement learning for autonomous systems. International Journal of Machine Learning and Cybernetics, 2022, 13(4): 1409?1423
                        [26] 金卓軍, 錢(qián)徽, 陳沈軼, 朱淼良. 基于回報函數逼近的學(xué)徒學(xué)習綜述. 華中科技大學(xué)學(xué)報(自然科學(xué)版), 2008, 36(S1): 288?290 doi: 10.13245/j.hust.2008.s1.081

                        Jin Zhuo-Jun, Qian Hui, Chen Shen-Yi, Zhu Miao-Liang. Survey of apprenticeship learning based on reward function approximating. Journal of Huazhong University of Science & Technology (Nature Science Edition), 2008, 36(S1): 288?290 doi: 10.13245/j.hust.2008.s1.081
                        [27] Levine S, Popovi? Z, Koltun V. Feature construction for inverse reinforcement learning. In: Proceedings of the 23th International Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: Curran Associates Inc., 2010. 1342?1350
                        [28] Pan W, Qu R P, Hwang K S, Lin H S. An ensemble fuzzy approach for inverse reinforcement learning. International Journal of Fuzzy Systems, 2019, 21(1): 95?103 doi: 10.1007/s40815-018-0535-y
                        [29] Lin J L, Hwang K S, Shi H B, Pan W. An ensemble method for inverse reinforcement learning. Information Sciences, 2020, 512: 518?532 doi: 10.1016/j.ins.2019.09.066
                        [30] Ratliff N, Bradley D, Bagnell J A, Chestnutt J. Boosting structured prediction for imitation learning. In: Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: MIT Press, 2006. 1153?1160
                        [31] Choi D, An T H, Ahn K, Choi J. Future trajectory prediction via RNN and maximum margin inverse reinforcement learning. In: Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA). Orlando, USA: IEEE, 2018. 125?130
                        [32] 高振海, 閆相同, 高菲. 基于逆向強化學(xué)習的縱向自動(dòng)駕駛決策方法. 汽車(chē)工程, 2022, 44(7): 969?975 doi: 10.19562/j.chinasae.qcgc.2022.07.003

                        Gao Zhen-Hai, Yan Xiang-Tong, Gao Fei. A decision-making method for longitudinal autonomous driving based on inverse reinforcement learning. Automotive Engineering, 2022, 44(7): 969?975 doi: 10.19562/j.chinasae.qcgc.2022.07.003
                        [33] Finn C, Levine S, Abbeel P. Guided cost learning: Deep inverse optimal control via policy optimization. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML). New York, USA: JMLR., 2016. 49?58
                        [34] Fu J, Luo K, Levine S. Learning robust rewards with adversarial inverse reinforcement learning. In: Proceedings of the 6th International Conference on Learning Representations (ICLR). Vancouver, Canada: Elsevier, 2018. 1?15
                        [35] Huang D A, Kitani K M. Action-reaction: Forecasting the dynamics of human interaction. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 489?504
                        [36] Levine S, Koltun V. Continuous inverse optimal control with locally optimal examples. In: Proceedings of the 29th International Conference on Machine Learning (ICML). Edinburgh, Scotland: Omnipress, 2012. 475?482
                        [37] Aghasadeghi N, Bretl T. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). San Francisco, USA: IEEE, 2011. 1561?1566
                        [38] You C X, Lu J B, Filev D, Tsiotras P. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 2019, 114: 1?18 doi: 10.1016/j.robot.2019.01.003
                        [39] Das N, Chattopadhyay A. Inverse reinforcement learning with constraint recovery. arXiv preprint arXiv: 2305.08130, 2023.
                        [40] Krishnan S, Garg A, Liaw R, Miller L, Pokorny F T, Goldberg K. HIRL: Hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv: 1604.06508, 2016.
                        [41] Zhou Z Y, Bloem M, Bambos N. Infinite time horizon maximum causal entropy inverse reinforcement learning. IEEE Transactions on Automatic Control, 2018, 63(9): 2787?2802 doi: 10.1109/TAC.2017.2775960
                        [42] Wu Z, Sun L T, Zhan W, Yang C Y, Tomizuka M. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robotics and Automation Letters, 2020, 5(4): 5355?5362 doi: 10.1109/LRA.2020.3005126
                        [43] Huang Z Y, Wu J D, Lv C. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 10239?10251 doi: 10.1109/TITS.2021.3088935
                        [44] Song L, Li D Z, Xu X. Sparse online maximum entropy inverse reinforcement learning via proximal optimization and truncated gradient. Knowledge-Based Systems, 2022, 252: Article No. 109443
                        [45] Gleave A, Habryka O. Multi-task maximum causal entropy inverse reinforcement learning. arXiv: 1805.08882, 2018.
                        [46] Zhang T, Liu Y, Hwang M, Hwang K S, Ma C Y, Cheng J. An end-to-end inverse reinforcement learning by a boosting approach with relative entropy. Information Sciences, 2020, 520: 1?14
                        [47] 吳少波, 傅啟明, 陳建平, 吳宏杰, 陸悠. 基于相對熵的元逆強化學(xué)習方法. 計算機科學(xué), 2021, 48(9): 257?263 doi: 10.11896/jsjkx.200700044

                        Wu Shao-Bo, Fu Qi-Ming, Chen Jian-Ping, Wu Hong-Jie, Lu You. Meta-inverse reinforcement learning method based on relative entropy. Computer Science, 2021, 48(9): 257?263 doi: 10.11896/jsjkx.200700044
                        [48] Lin B Y, Cook D J. Analyzing sensor-based individual and population behavior patterns via inverse reinforcement learning. Sensors, 2020, 20(18): Article No. 5207 doi: 10.3390/s20185207
                        [49] Zhou W C, Li W C. A hierarchical Bayesian approach to inverse reinforcement learning with symbolic reward machines. In: Proceedings of the 39th International Conference on Machine Learning. Baltimore, USA: PMLR, 2022. 27159?27178
                        [50] Ezzeddine A, Mourad N, Araabi B N, Ahmadabadi M N. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement. Expert Systems With Applications, 2018, 112: 331?341
                        [51] Ranchod P, Rosman B, Konidaris G. Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE, 2015. 471?477
                        [52] Choi J, Kim K E. Nonparametric Bayesian inverse reinforcement learning for multiple reward functions. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc., 2012. 305?313
                        [53] Okal B, Arras K O. Learning socially normative robot navigation behaviors with Bayesian inverse reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016. 2889?2895
                        [54] Trinh T, Brown D S. Autonomous assessment of demonstration sufficiency via bayesian inverse reinforcement learning. arXiv: 2211.15542, 2022.
                        [55] Huang W H, Braghin F, Wang Z. Learning to drive via apprenticeship learning and deep reinforcement learning. In: Proceedings of the IEEE 31st International Conference on Tools With Artificial Intelligence (ICTAI). Portland, USA: IEEE, 2019. 1536?1540
                        [56] Markovikj D. Deep apprenticeship learning for playing games. arXiv preprint arXiv: 2205.07959, 2022.
                        [57] Lee D, Srinivasan S, Doshi-Velez F. Truly batch apprenticeship learning with deep successor features. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). Macao, China: Morgan Kaufmann, 2019. 5909?5915
                        [58] Xia C, El Kamel A. Neural inverse reinforcement learning in autonomous navigation. Robotics and Autonomous Systems, 2016, 84: 1?14
                        [59] Budhraja K K, Oates T. Neuroevolution-based inverse reinforcement learning. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC). Donostia, Spain: IEEE, 2017. 67?76
                        [60] Memarian F, Xu Z, Wu B, Wen M, Topcu U. Active task-inference-guided deep inverse reinforcement learning. In: Proceedings of the 59th IEEE Conference on Decision and Control (CDC). Jeju, Korea (South): IEEE, 2020. 1932?1938
                        [61] Liu S, Jiang H, Chen S P, Ye J, He R Q, Sun Z Z. Integrating Dijkstra's algorithm into deep inverse reinforcement learning for food delivery route planning. Transportation Research Part E: Logistics and Transportation Review, 2020, 142: Article No. 102070 doi: 10.1016/j.tre.2020.102070
                        [62] Chen X L, Cao L, Xu Z X, Lai J, Li C X. A study of continuous maximum entropy deep inverse reinforcement learning. Mathematical Problems in Engineering, 2019, 2019: Article No. 4834516
                        [63] Choi D, Min K, Choi J. Regularising neural networks for future trajectory prediction via inverse reinforcement learning framework. IET Computer Vision, 2020, 14(5): 192?200 doi: 10.1049/iet-cvi.2019.0546
                        [64] Wang Y, Wan S, Li Q, Niu Y, Ma F. Modeling crossing behaviors of E-Bikes at intersection with deep maximum entropy inverse reinforcement learning using drone-based video data. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6): 6350?6361
                        [65] Fahad M, Chen Z, Guo Y. Learning how pedestrians navigate: A deep inverse reinforcement learning approach. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018. 819?826
                        [66] Zhou Y, Fu R, Wang C. Learning the car-following behavior of drivers using maximum entropy deep inverse reinforcement learning. Journal of Advanced Transportation, 2020, 2020: Article No. 4752651
                        [67] Song L, Li D Z, Wang X, Xu X. AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient. Information Sciences, 2022, 602: 328?350 doi: 10.1016/j.ins.2022.04.017
                        [68] Wang P, Liu D P, Chen J Y, Li H H, Chan C Y. Decision making for autonomous driving via augmented adversarial inverse reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Xi'an, China: IEEE, 2021. 1036?1042
                        [69] Sun J K, Yu L T, Dong P Q, Lu B, Zhou B L. Adversarial inverse reinforcement learning with self-attention dynamics model. IEEE Robotics and Automation Letters, 2021, 6(2): 1880?1886 doi: 10.1109/LRA.2021.3061397
                        [70] Lian B S, Xue W Q, Lewis F L, Chai T Y. Inverse reinforcement learning for adversarial apprentice games. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2021.3114612
                        [71] Jin Z J, Qian H, Zhu M L. Gaussian processes in inverse reinforcement learning. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC). Qingdao, China: IEEE, 2010. 225?230
                        [72] Li D C, He Y Q, Fu F. Nonlinear inverse reinforcement learning with mutual information and Gaussian process. In: Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO). Bali, Indonesia: IEEE, 2014. 1445?1450
                        [73] Michini B, Walsh T J, Agha-Mohammadi A A, How J P. Bayesian nonparametric reward learning from demonstration. IEEE Transactions on Robotics, 2015, 31(2): 369?386 doi: 10.1109/TRO.2015.2405593
                        [74] Sun L T, Zhan W, Tomizuka M. Probabilistic prediction of interactive driving behavior via hierarchical inverse reinforcement learning. In: Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, USA: IEEE, 2018. 2111?2117
                        [75] Rosbach S, Li X, Gro?johann S, Homoceanu S, Roth S. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020. 5187?5193
                        [76] Rosbach S, James V, Gro?johann S, Homoceanu S, Roth S. Driving with style: Inverse reinforcement learning in general-purpose planning for automated driving. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macao, China: IEEE, 2019. 2658?2665
                        [77] Fernando T, Denman S, Sridharan S, Fookes C. Deep inverse reinforcement learning for behavior prediction in autonomous driving: Accurate forecasts of vehicle motion. IEEE Signal Processing Magazine, 2021, 38(1): 87?96 doi: 10.1109/MSP.2020.2988287
                        [78] Fernando T, Denman S, Sridharan S, Fookes C. Neighbourhood context embeddings in deep inverse reinforcement learning for predicting pedestrian motion over long time horizons. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE, 2019. 1179?1187
                        [79] Kalweit G, Huegle M, Werling M, Boedecker J. Deep inverse Q-learning with constraints. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2020. Article No. 1198
                        [80] Zhu Z Y, Li N, Sun R Y, Xu D H, Zhao H J. Off-road autonomous vehicles traversability analysis and trajectory planning based on deep inverse reinforcement learning. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV). Las Vegas, USA: IEEE, 2020. 971?977
                        [81] Fang P Y, Yu Z P, Xiong L, Fu Z Q, Li Z R, Zeng D Q. A maximum entropy inverse reinforcement learning algorithm for automatic parking. In: Proceedings of the 5th CAA International Conference on Vehicular Control and Intelligence (CVCI). Tianjin, China: IEEE, 2021. 1?6
                        [82] Pan X, Ohn-Bar E, Rhinehart N, Xu Y, Shen Y L, Kitani K M. Human-interactive subgoal supervision for efficient inverse reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. Stockholm, Sweden: International Foundation for Autonomous Agents and Multiagent Systems, 2018.
                        [83] Peters J, Mülling K, Altün Y, Peters J, Mulling K, Altun Y. Relative entropy policy search. In: Proceedings of 24th AAAI Conference on Artificial Intelligence (AAAI). Atlanta, Georgia: AAAI, 2010. 1607?1612
                        [84] Singh A, Yang L, Hartikainen K, Finn C, Levine S. End-to-end robotic reinforcement learning without reward engineering. In: Proceedings of the Robotics: Science and Systems. Freiburg im Breisgau, Germany: the MIT Press, 2019.
                        [85] Wang H, Liu X F, Zhou X. Autonomous UAV interception via augmented adversarial inverse reinforcement learning. In: Proceedings of the International Conference on Autonomous Unmanned Systems (ICAUS). Changsha, China: Springer, 2022. 2073?2084
                        [86] Choi S, Kim S, Kim H J. Inverse reinforcement learning control for trajectory tracking of a multirotor UAV. International Journal of Control, Automation and Systems, 2017, 15(4): 1826?1834 doi: 10.1007/s12555-015-0483-3
                        [87] Nguyen H T, Garratt M, Bui L T, Abbass H. Apprenticeship bootstrapping: Inverse reinforcement learning in a multi-skill UAV-UGV coordination task. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). Stockholm, Sweden: International Foundation for Autonomous Agents and Multiagent Systems, 2018. 2204?2206
                        [88] Sun W, Yan D S, Huang J, Sun C H. Small-scale moving target detection in aerial image by deep inverse reinforcement learning. Soft Computing, 2020, 24(8): 5897?5908 doi: 10.1007/s00500-019-04404-6
                        [89] Pattanayak K, Krishnamurthy V, Berry C. Meta-cognition. An inverse-inverse reinforcement learning approach for cognitive radars. In: Proceedings of the 25th International Conference on Information Fusion (FUSION). Link?ping, Sweden: IEEE, 2022. 1?8
                        [90] Kormushev P, Calinon S, Saegusa R, Metta G. Learning the skill of archery by a humanoid robot iCub. In: Proceedings of the 10th IEEE-RAS International Conference on Humanoid Robots. Nashville, USA: IEEE, 2010. 417?423
                        [91] Koller D, Milch B. Multi-agent influence diagrams for representing and solving games. Games and Economic Behavior, 2003, 45(1): 181?221 doi: 10.1016/S0899-8256(02)00544-4
                        [92] Syed U, Schapire R E. A game-theoretic approach to apprenticeship learning. In: Proceedings of the 22nd Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: Curran Associates Inc., 2007. 1449?1456
                        [93] Halperin I, Liu J Y, Zhang X. Combining reinforcement learning and inverse reinforcement learning for asset allocation recommendations. arXiv: 2201.01874, 2022.
                        [94] Gleave A, Toyer S. A primer on maximum causal entropy inverse reinforcement learning. arXiv: 2203.11409, 2022.
                        [95] Adams S, Cody T, Beling P A. A survey of inverse reinforcement learning. Artificial Intelligence Review, 2022, 55(6): 4307?4346 doi: 10.1007/s10462-021-10108-x
                        [96] Li X J, Liu H S, Dong M H. A general framework of motion planning for redundant robot manipulator based on deep reinforcement learning. IEEE Transactions on Industrial Informatics, 2022, 18(8): 5253?5263 doi: 10.1109/TII.2021.3125447
                        [97] Jeon W, Su C Y, Barde P, Doan T, Nowrouzezahrai D, Pineau J. Regularized inverse reinforcement learning. In: Proceedings of the 9th International Conference on Learning Representations (ICLR). Vienna, Austria: Ithaca, 2021. 1?26
                        [98] Krishnamurthy V, Angley D, Evans R, Moran B. Identifying cognitive radars-inverse reinforcement learning using revealed preferences. IEEE Transactions on Signal Processing, 2020, 68: 4529?4542 doi: 10.1109/TSP.2020.3013516
                        [99] 陳建平, 陳其強, 傅啟明, 高振, 吳宏杰, 陸悠. 基于生成對抗網(wǎng)絡(luò )的最大熵逆強化學(xué)習. 計算機工程與應用, 2019, 55(22): 119?126 doi: 10.3778/j.issn.1002-8331.1904-0238

                        Chen Jian-Ping, Chen Qi-Qiang, Fu Qi-Ming, Gao Zhen, Wu Hong-Jie, Lu You. Maximum entropy inverse reinforcement learning based on generative adversarial networks. Computer Engineering and Applications, 2019, 55(22): 119?126 doi: 10.3778/j.issn.1002-8331.1904-0238
                        [100] Gruver N, Song J M, Kochenderfer M J, Ermon S. Multi-agent adversarial inverse reinforcement learning with latent variables. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). Auckland, New Zealand: International Foundation for Autonomous Agents and Multiagent Systems, 2020. 1855?1857
                        [101] Giwa B H, Lee C G. A marginal log-likelihood approach for the estimation of discount factors of multiple experts in inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE, 2021. 7786?7791
                        [102] Ghosh S, Srivastava S. Mapping language to programs using multiple reward components with inverse reinforcement learning. In: Proceedings of the Findings of the Association for Computational Linguistics. Punta Cana, Dominican Republic: ACL, 2021. 1449?1462
                        [103] Gronauer S, Diepold K. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, 2021, 55(2): 895?943
                        [104] Bergerson S. Multi-agent inverse reinforcement learning: Suboptimal demonstrations and alternative solution concepts. arXiv: 2109.01178, 2021.
                        [105] Zhao J C. Safety-aware multi-agent apprenticeship learning. arXiv: 2201.08111, 2022.
                        [106] Hwang R, Lee H, Hwang H J. Option compatible reward inverse reinforcement learning. Pattern Recognition Letters, 2022, 154: 83?89 doi: 10.1016/j.patrec.2022.01.016
                      2. 加載中
                      3. 圖(13) / 表(2)
                        計量
                        • 文章訪(fǎng)問(wèn)數:  5631
                        • HTML全文瀏覽量:  1483
                        • PDF下載量:  615
                        • 被引次數: 0
                        出版歷程
                        • 收稿日期:  2023-02-24
                        • 錄用日期:  2023-04-25
                        • 網(wǎng)絡(luò )出版日期:  2023-07-03
                        • 刊出日期:  2024-09-19

                        目錄

                          /

                          返回文章
                          返回