1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
      <samp id="qm3rj"></samp>
      <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

      <video id="qm3rj"><code id="qm3rj"></code></video>

        1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
            1. 2.765

              2022影響因子

              (CJCR)

              • 中文核心
              • EI
              • 中國科技核心
              • Scopus
              • CSCD
              • 英國科學文摘

              留言板

              尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

              姓名
              郵箱
              手機號碼
              標題
              留言內容
              驗證碼

              逆強化學習算法、理論與應用研究綜述

              宋莉 李大字 徐昕

              宋莉, 李大字, 徐昕. 逆強化學習算法、理論與應用研究綜述. 自動化學報, xxxx, xx(x): x?xx doi: 10.16383/j.aas.c230081
              引用本文: 宋莉, 李大字, 徐昕. 逆強化學習算法、理論與應用研究綜述. 自動化學報, xxxx, xx(x): x?xx doi: 10.16383/j.aas.c230081
              Song Li, Li Da-Zi, Xu Xin. A survey of inverse reinforcement learning algorithms, theory and applications. Acta Automatica Sinica, xxxx, xx(x): x?xx doi: 10.16383/j.aas.c230081
              Citation: Song Li, Li Da-Zi, Xu Xin. A survey of inverse reinforcement learning algorithms, theory and applications. Acta Automatica Sinica, xxxx, xx(x): x?xx doi: 10.16383/j.aas.c230081

              逆強化學習算法、理論與應用研究綜述

              doi: 10.16383/j.aas.c230081
              基金項目: 國家自然科學基金(62273026) 資助
              詳細信息
                作者簡介:

                宋莉:北京化工大學信息科學與技術學院博士研究生. 主要研究方向為強化學習, 深度學習, 逆強化學習. E-mail: slili516@foxmail.com

                李大字:北京化工大學信息科學與技術學院教授. 主要研究方向為機器學習與人工智能, 先進控制, 分數階系統, 復雜系統建模與優化. 本文通信作者. E-mail: lidz@mail.buct.edu.cn

                徐昕:國防科技大學智能科學學院教授. 主要研究方向為智能控制, 強化學習, 機器學習, 機器人和智能車輛. E-mail: xinxu@nudt.edu.cn

              A Survey of Inverse Reinforcement Learning Algorithms, Theory and Applications

              Funds: Supported by National Natural Science Foundation of China (62273026)
              More Information
                Author Bio:

                SONG Li Ph.D. candidate at the College of Information Science and Technology, Beijing University of Chemical Technology. Her research interest covers reinforcement learning, deep learning, inverse reinforcement learning

                LI Da-Zi Professor at the College of Information Science and Technology, Beijing University of Chemical Technology. Her research interest covers machine learning and artificial intelligence, advanced control, fractional order systems, and complex system modeling and optimization. Corresponding author of this paper

                XU Xin Professor at the College of Intelligence Science and Technology, National University of Defense Technology. His research interest covers intelligent control, reinforcement learning, machine learning, robotics, and autonomous vehicles

              • 摘要: 隨著深度強化學習的研究與發展, 強化學習在博弈與優化決策、智能駕駛等現實問題中的應用也取得顯著進展. 然而強化學習在智能體與環境的交互中存在人工設計獎勵函數難的問題, 因此研究者提出了逆強化學習這一研究方向. 如何從專家演示中學習獎勵函數和進行策略優化是一個新穎且重要的研究課題, 在人工智能領域具有十分重要的研究意義. 本文綜合介紹了逆強化學習算法的最新進展, 首先介紹了逆強化學習在理論方面的新進展, 然后分析了逆強化學習面臨的挑戰以及未來的發展趨勢, 最后討論了逆強化學習的應用進展和應用前景.
              • 圖  1  強化學習模型

                Fig.  1  Model of reinforcement learning

                圖  2  MDP ((a)和(c)是確定性MDP;(b)和(d)是隨機性MDP)[13]

                Fig.  2  MDP ((a) and (c)are the deterministic MDP; (b) and (d)are the stochastic MDP)[13]

                圖  3  RL、IRL、BC的算法框架

                Fig.  3  Framework for RL, IRL, BC

                圖  4  逆強化學習算法分類

                Fig.  4  Classification of IRL algorithms

                圖  5  貝葉斯逆強化學習模型

                Fig.  5  Bayesian inverse reinforcement learning model

                圖  6  深度學徒學習模型結構[22]

                Fig.  6  Model structure of deep apprenticeship learning[22]

                圖  7  最大熵深度逆強化學習的結構[15]

                Fig.  7  Structure of maximum entropy deep inverse reinforcement learning[15]

                圖  8  基于序列專家演示的逆強化學習進程[62]

                Fig.  8  The inverse reinforcement learning process based on sequential expert demonstration[62]

                圖  9  估計獎勵函數的神經網絡模型結構[66]

                Fig.  9  Structure of the neural network model for estimating the reward function[66]

                圖  10  多尺度全卷積網絡架構[77]

                Fig.  10  Multi-Scale Fully Convolutional Network architecture[77]

                圖  11  非線性逆強化學習框架

                Fig.  11  Framework of nonlinear inverse reinforcement learning

                圖  12  利用深度最大熵逆強化學習軌跡規劃結構圖[80]

                Fig.  12  Structure of trajectory planning using deep maximum entropy IRL[80]

                圖  13  路徑規劃: 停車場[82]

                Fig.  13  Path planning: parking lot[82]

                圖  14  機械臂的卷積神經網絡結構[84]

                Fig.  14  Convolutional neural network structure for robotic arm[84]

                表  1  逆強化學習算法的研究歷程

                Table  1  Timeline of inverse reinforcement learning algorithm

                逆強化學習算法面臨的挑戰解決的問題作者 (年份)
                有限和大狀態空間的MDP/R問題Ng等[9] (2000)
                線性求解MDP/R問題Abbeel等[11] (2004)
                基于邊際的逆強化學習模糊歧義策略的最大化結構與預測問題Ratliff等[12] (2006)
                復雜多維任務問題Bogdanovic等[22] (2015)
                現實任務的適用性問題Hester等[23] (2018)
                基于貝葉斯的逆強化學習先驗知識的選取難、計算復雜結合先驗知識和專家數據推導獎勵的概率分布問題Ramachandran等[21] (2007)
                基于概率的逆強化學習在復雜動態環境中適應性差最大熵約束下的特征匹配問題Ziebart等[13] (2008)
                轉移函數未知的MDP/R問題Boularias等[14] (2011)
                基于高斯過程的逆強化學習計算復雜獎勵的非線性求解問題Levine[19]等 (2011)
                基于最大熵的深度逆強化學習 計算復雜、過擬合、專家
                演示數據不平衡、有限
                從人類駕駛演示中學習復雜城市環境中獎勵的問題Wulfmeier等[15] (2016)
                從數據中提取策略的對抗題性逆強化學習問題Ho等[18] (2016)
                多個獎勵稀疏分散的線性可解非確定性MDP/R問題Budhraja等[59] (2017)
                自動駕駛車輛在交通中的規劃問題You等[38] (2019)
                無模型積分逆RL的獎勵問題Lian等[70] (2021)
                利用最大因果熵推斷獎勵函數的問題Gleave等[94] (2022)
                基于神經網絡的逆強化學習過擬合、不穩定具有大規模高維狀態空間的自動導航的IRL問題Chen等[62] (2019)
                下載: 導出CSV

                表  2  逆強化學習算法的比較

                Table  2  Comparison of inverse reinforcement learning algorithms

                逆強化學習算法獎勵值函數
                ALIRL[11]38.7932.66
                FIRL[27]31.895.22
                GPIRL[19]2.660.42
                MWAL[95]206.4443.32
                MMP[12]38.3834.20
                MMPBoost[30]31.5623.56
                MEIRL[13]36.3613.12
                下載: 導出CSV
                1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
                  <samp id="qm3rj"></samp>
                  <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

                  <video id="qm3rj"><code id="qm3rj"></code></video>

                    1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
                        亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页
                      1. [1] 柴天佑. 工業人工智能發展方向. 自動化學報, 2020, 46(10): 2005-2012 doi: 10.16383/j.aas.c200796

                        Chai Tian-You. Development directions of industrial artificial intelligence. Acta Automatica Sinica, 2020, 46(10): 2005-2012 doi: 10.16383/j.aas.c200796
                        [2] Dai X Y, Zhao C, Li X S, Wang X, Wang F Y. Traffic signal control using offline reinforcement learning. In: Proceedings of the China Automation Congress (CAC). Beijing, China: IEEE, 2021. 8090?8095
                        [3] Li J N, Ding J L, Chai T Y, Lewis F L. Nonzero-sum game reinforcement learning for performance optimization in large-Scale industrial processes. IEEE Transactions on Cybernetics, 2020, 50(9): 4132-4145 doi: 10.1109/TCYB.2019.2950262
                        [4] 趙冬斌, 邵坤, 朱圓恒, 李棟, 陳亞冉, 王海濤, 等. 深度強化學習綜述: 兼論計算機圍棋的發展. 控制理論與應用, 2016, 33(6): 701-717

                        Zhao Dong-Bin, Shao Kun, Zhu Yuan-Heng, Li Dong, Chen Ya-Ran, Wang Hai-Tao, et al. Review of deep reinforcement learning and discussions on the development of computer Go. Control Theory & Applications, 2016, 33(6): 701-717
                        [5] Song T H, Li D Z, Yang W M, Hirasawa K. Recursive least-squares temporal difference with gradient correction. IEEE Transactions on Cybernetics, 2021, 51(8): 4251-4264 doi: 10.1109/TCYB.2019.2902342
                        [6] Bain M, Sammut C. A framework for Behavioural cloning. Machine Intelligence 15: Intelligent Agents, 1995: 103?129
                        [7] Couto G C K, Antonelo E A. Generative adversarial imitation learning for end-to-end autonomous driving on urban environments. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI). Orlando, USA: IEEE, 2021. 1?7
                        [8] Samak T V, Samak C V, Kandhasamy S. Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning. SAE International Journal of Connected and Automated Vehicles, 2021, 4(3): 279-295
                        [9] Ng A Y, Russell S J. Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML). Stanford, USA: Morgan Kaufmann Publishers Inc, 2000. 663?670
                        [10] Imani M, Ghoreishi S F. Scalable inverse reinforcement learning through multifidelity Bayesian optimization. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 4125-4132 doi: 10.1109/TNNLS.2021.3051012
                        [11] Abbeel P, Ng A Y. Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML). Ban, Canada: ACM, 2004. 1?8
                        [12] Ratliff N D, Bagnell J A. Zinkevich M A. Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning (ICML). Pittsburgh, USA: ACM, 2006. 729?736
                        [13] Ziebart B D, Maas A, Bagnell J A, Dey A K. Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI). Chicago, USA: AAAI, 2008. 1433?1438
                        [14] Boularias A, Kober J, Peters J. Relative entropy inverse reinforcement learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). Fort Lauderdale, USA: 2011. 182?189
                        [15] Wulfmeier M, Wang D Z, Posner I. Watch this: Scalable cost-function learning for path planning in urban environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea (South): IEEE, 2016. 2089?2095
                        [16] Guo H Y, Chen Q X, Xia Q, Kang C Q. Deep inverse reinforcement learning for objective function identification in bidding models. IEEE Transactions on Power Systems, 2021, 36(6): 5684-5696 doi: 10.1109/TPWRS.2021.3076296
                        [17] Shi YC, Jiu B, Yan J K, Liu H W, Li K. Data-driven simultaneous multibeam power allocation: When multiple targets tracking meets deep reinforcement learning. IEEE Systems Journal, 2021, 15(1): 1264-1274 doi: 10.1109/JSYST.2020.2984774
                        [18] Ho J, Ermon S. Generative adversarial imitation learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS). Barcelona, Spain: Curran Associates Inc, 2016. 4572?4580
                        [19] Levine S, Popovi? Z, Koltun V. Nonlinear inverse reinforcement learning with Gaussian processes. In: Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS). Granada, Spain: Curran Associates Inc, 2011. 19?27
                        [20] Liu J H, Huang Z H, Xu X, Zhang X L, Sun S L, Li D Z. Multi-kernel online reinforcement learning for path tracking control of intelligent vehicles. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(11): 6962-6975 doi: 10.1109/TSMC.2020.2966631
                        [21] Ramachandran D, Amir E. Bayesian inverse reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI). Hyderabad, India: Morgan Kaufmann, 2007. 2586?2591
                        [22] Bogdanovic M, Markovikj D, Denil M, de Freitas N. Deep apprenticeship learning for playing video games. In: Proceedings of the AAAI Workshop on Learning for General Competency in Video Games. Austin, USA: AAAI, 2015. 7?9
                        [23] Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, et al. Deep Q-learning from demonstrations. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI, 2018. Article No. 394
                        [24] Nguyen H T, Garratt M, Bui L T, Abbass H. Apprenticeship learning for continuous state spaces and actions in a swarm-guidance shepherding task. In: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI). Xiamen, China: IEEE, 2019. 102?109
                        [25] Hwang M, Jiang W C, Chen Y J. A critical state identification approach to inverse reinforcement learning for autonomous systems. International Journal of Machine Learning and Cybernetics, 2022, 13(4): 1409-1423
                        [26] 金卓軍, 錢徽, 陳沈軼, 朱淼良. 基于回報函數逼近的學徒學習綜述. 華中科技大學學報(自然科學版), 2008, 36(S1): 288-290, 294 doi: 10.13245/j.hust.2008.s1.081

                        Jin Zhuo-Jun, Qian Hui, Chen Shen-Yi, Zhu Miao-Liang. Survey of apprenticeship learning based on reward function approximating. Journal of Huazhong University of Science & Technology (Nature Science Edition), 2008, 36(S1): 288-290, 294 doi: 10.13245/j.hust.2008.s1.081
                        [27] Levine S, Popovi? Z, Koltun V. Feature construction for inverse reinforcement learning. In: Proceedings of 23rh International Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: Curran Associates Inc, 2010. 1342?1350
                        [28] Pan W, Qu R P, Hwang K S, Lin H S. An Ensemble fuzzy approach for inverse reinforcement learning. International Journal of Fuzzy Systems, 2019, 21(1): 95-103 doi: 10.1007/s40815-018-0535-y
                        [29] Lin J L, Hwang K S, Shi H B, Pan W. An ensemble method for inverse reinforcement learning. Information Sciences, 2020, 512: 518-532 doi: 10.1016/j.ins.2019.09.066
                        [30] Ratliff N, Bradley D, Bagnell J A, Chestnutt J. Boosting structured prediction for imitation learning. In: Proceedings of 19th International Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: MIT Press, 2006. 1153?1160
                        [31] Choi D, An T H, Ahn K, Choi J. Future trajectory prediction via RNN and maximum margin inverse reinforcement learning. In: Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA). Orlando, USA: IEEE, 2018. 125?130
                        [32] 高振海, 閆相同, 高菲. 基于逆向強化學習的縱向自動駕駛決策方法. 汽車工程, 2022, 44(7): 969-975 doi: 10.19562/j.chinasae.qcgc.2022.07.003

                        Gao Zhen-Hai, Yan Xiang-Tong, Gao Fei. A decision-making method for longitudinal autonomous driving based on inverse reinforcement learning. Automotive Engineering, 2022, 44(7): 969-975 doi: 10.19562/j.chinasae.qcgc.2022.07.003
                        [33] Finn C, Levine S, Abbeel P. Guided cost learning: Deep inverse optimal control via policy optimization. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML). New York, USA: JMLR.org, 2016. 49?58
                        [34] Fu J, Luo K, Levine S. Learning robust rewards with adversarial inverse reinforcement learning. In: Proceedings of 6th International Conference on Learning Representations (ICLR). Vancouver, Canada: Elsevier, 2018. 1?15
                        [35] Huang D A, Kitani K M. Action-reaction: Forecasting the dynamics of human interaction. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 489?504
                        [36] Levine S, Koltun V. Continuous inverse optimal control with locally optimal examples. In: Proceedings of the 29th International Conference on Machine Learning (ICML). Edinburgh, Scotland: Omnipress, 2012. 475?482
                        [37] Aghasadeghi N, Bretl T. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). San Francisco, USA: IEEE, 2011. 1561?1566
                        [38] You C X, Lu J B, Filev D, Tsiotras P. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 2019, 114: 1-18 doi: 10.1016/j.robot.2019.01.003
                        [39] Das N, Chattopadhyay A. Inverse reinforcement learning with constraint recovery. arXiv preprint arXiv: 2305.08130, 2023.
                        [40] Krishnan S, Garg A, Liaw R, Miller L, Pokorny F T, Goldberg K. HIRL: Hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv: 1604.06508, 2016.
                        [41] Zhou Z Y, Bloem M, Bambos N. Infinite time horizon maximum causal entropy inverse reinforcement learning. IEEE Transactions on Automatic Control, 2018, 63(9): 2787-2802 doi: 10.1109/TAC.2017.2775960
                        [42] Wu Z, Sun L T, Zhan W, Yang C Y, Tomizuka M. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robotics and Automation Letters, 2020, 5(4): 5355-5362 doi: 10.1109/LRA.2020.3005126
                        [43] Huang Z Y, Wu J D, Lv C. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 10239-10251 doi: 10.1109/TITS.2021.3088935
                        [44] Song L, Li D Z, Xu X. Sparse online maximum entropy inverse reinforcement learning via proximal optimization and truncated gradient. Knowledge-Based Systems, 2022, 252: Article No. 109443
                        [45] Gleave A, Habryka O. Multi-task maximum causal entropy inverse reinforcement learning. arXiv: 1805.08882, 2018.
                        [46] Zhang T, Liu Y, Hwang M, Hwang K S, Ma C Y, Cheng J. An end-to-end inverse reinforcement learning by a boosting approach with relative entropy. Information Sciences, 2020, 520: 1-14
                        [47] 吳少波, 傅啟明, 陳建平, 吳宏杰, 陸悠. 基于相對熵的元逆強化學習方法. 計算機科學, 2021, 48(9): 257-263 doi: 10.11896/jsjkx.200700044

                        Wu Shao-Bo, Fu Qi-Ming, Chen Jian-Ping, Wu Hong-Wei, Lu You. Meta-inverse reinforcement learning method based on relative entropy. Computer Science, 2021, 48(9): 257-263 doi: 10.11896/jsjkx.200700044
                        [48] Lin B Y, Cook D J. Analyzing sensor-based individual and population behavior patterns via inverse reinforcement learning. Sensors, 2020, 20(18): Article No. 5207 doi: 10.3390/s20185207
                        [49] Zhou W C, Li W C. A hierarchical Bayesian approach to inverse reinforcement learning with symbolic reward machines. In: Proceedings of the 39th International Conference on Machine Learning. Baltimore, USA: PMLR, 2022. 27159?27178
                        [50] Ezzeddine A, Mourad N, Araabi B N, Ahmadabadi M N. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement. Expert Systems With Applications, 2018, 112: 331-341
                        [51] Ranchod P, Rosman B, Konidaris G. Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE, 2015. 471?477
                        [52] Choi J, Kim K E. Nonparametric Bayesian inverse reinforcement learning for multiple reward functions. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc, 2012. 305?313
                        [53] Okal B, Arras K O. Learning socially normative robot navigation behaviors with Bayesian inverse reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016. 2889?2895
                        [54] Trinh T, Brown D S. Autonomous assessment of demonstration sufficiency via bayesian inverse reinforcement learning. arXiv: 2211.15542, 2022.
                        [55] Huang W H, Braghin F, Wang Z. Learning to drive via apprenticeship learning and deep reinforcement learning. In: Proceedings of the IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). Portland, USA: IEEE, 2019. 1536?1540
                        [56] Markovikj D. Deep apprenticeship learning for playing games. arXiv preprint arXiv: 2205.07959, 2022.
                        [57] Lee D, Srinivasan S, Doshi-Velez F. Truly batch apprenticeship learning with deep successor features. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). Macao, China: Morgan Kaufmann, 2019. 5909?5915
                        [58] Xia C, El Kamel A. Neural inverse reinforcement learning in autonomous navigation. Robotics and Autonomous Systems, 2016, 84: 1-14.
                        [59] Budhraja K K, Oates T. Neuroevolution-based inverse reinforcement learning. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC). Donostia, Spain: IEEE, 2017. 67?76
                        [60] Memarian F, Xu Z, Wu B, Wen M, Topcu U. Active task-inference-guided deep inverse reinforcement learning. In: Proceedings of the 59th IEEE Conference on Decision and Control (CDC). Jeju, Korea (South): IEEE, 2020. 1932?1938
                        [61] Liu S, Jiang H, Chen S P, Ye J, He R Q, Sun Z Z. Integrating Dijkstra’s algorithm into deep inverse reinforcement learning for food delivery route planning. Transportation Research Part E: Logistics and Transportation Review, 2020, 142: Article No. 102070 doi: 10.1016/j.tre.2020.102070
                        [62] Chen X L, Cao L, Xu Z X, Lai J, Li C X. A study of continuous maximum entropy deep inverse reinforcement learning. Mathematical Problems in Engineering, 2019, 2019: Article No. 4834516
                        [63] Choi D, Min K, Choi J. Regularising neural networks for future trajectory prediction via inverse reinforcement learning framework. IET Computer Vision, 2020, 14(5): 192-200 doi: 10.1049/iet-cvi.2019.0546
                        [64] Wang Y, Wan S, Li Q, Niu Y, Ma F. Modeling crossing behaviors of E-Bikes at intersection with deep maximum entropy inverse reinforcement learning using drone-based video data. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6): 6350?6361
                        [65] Fahad M, Chen Z, Guo Y. Learning how pedestrians navigate: A deep inverse reinforcement learning approach. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018. 819?826
                        [66] Zhou Y, Fu R, Wang C. Learning the car-following behavior of drivers using maximum entropy deep inverse reinforcement learning. Journal of Advanced Transportation, 2020, 2020: Article No. 4752651
                        [67] Song L, Li D Z, Wang X, Xu X. AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient. Information Sciences, 2022, 602: 328-350 doi: 10.1016/j.ins.2022.04.017
                        [68] Wang P, Liu D P, Chen J Y, Li H H, Chan C Y. Decision making for autonomous driving via augmented adversarial inverse reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Xi’an, China: IEEE, 2021. 1036?1042
                        [69] Sun J K, Yu L T, Dong P Q, Lu B, Zhou B L. Adversarial inverse reinforcement learning with self-attention dynamics model. IEEE Robotics and Automation Letters, 2021, 6(2): 1880-1886 doi: 10.1109/LRA.2021.3061397
                        [70] Lian B S, Xue W Q, Lewis F L, Chai T Y. Inverse reinforcement learning for adversarial apprentice games. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2021.3114612
                        [71] Jin Z J, Qian H, Zhu M L. Gaussian processes in inverse reinforcement learning. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC). Qingdao, China: IEEE, 2010. 225?230
                        [72] Li D C, He Y Q, Fu F. Nonlinear inverse reinforcement learning with mutual information and Gaussian process. In: Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO). Bali, Indonesia: IEEE, 2014. 1445?1450
                        [73] Michini B, Walsh T J, Agha-Mohammadi A A, How J P. Bayesian nonparametric reward learning from demonstration. IEEE Transactions on Robotics, 2015, 31(2): 369-386 doi: 10.1109/TRO.2015.2405593
                        [74] Sun L T, Zhan W, Tomizuka M. Probabilistic prediction of interactive driving behavior via hierarchical inverse reinforcement learning. In: Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, USA: IEEE, 2018. 2111?2117
                        [75] Rosbach S, Li X, Gro?johann S, Homoceanu S, Roth S. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020. 5187?5193
                        [76] Rosbach S, James V, Gro?johann S, Homoceanu S, Roth S. Driving with style: Inverse reinforcement learning in general-purpose planning for automated driving. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau, China: IEEE, 2019. 2658?2665
                        [77] Fernando T, Denman S, Sridharan S, Fookes C. Deep inverse reinforcement learning for behavior prediction in autonomous driving: Accurate forecasts of vehicle motion. IEEE Signal Processing Magazine, 2021, 38(1): 87-96 doi: 10.1109/MSP.2020.2988287
                        [78] Fernando T, Denman S, Sridharan S, Fookes C. Neighbourhood context embeddings in deep inverse reinforcement learning for predicting pedestrian motion over long time horizons. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE, 2019. 1179?1187
                        [79] Kalweit G, Huegle M, Werling M, Boedecker J. Deep inverse Q-learning with constraints. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc, 2020. Article No. 1198
                        [80] Zhu Z Y, Li N, Sun R Y, Xu D H, Zhao H J. Off-road autonomous vehicles traversability analysis and trajectory planning based on deep inverse reinforcement learning. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV). Las Vegas, USA: IEEE, 2020. 971?977
                        [81] Fang P Y, Yu Z P, Xiong L, Fu Z Q, Li Z R, Zeng D Q. A maximum entropy inverse reinforcement learning algorithm for automatic parking. In: Proceedings of the 5th CAA International Conference on Vehicular Control and Intelligence (CVCI). Tianjin, China: IEEE, 2021. 1?6
                        [82] Pan X, Ohn-Bar E, Rhinehart N, Xu Y, Shen Y L, Kitani K M. Human-interactive subgoal supervision for efficient inverse reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. Stockholm, Sweden: International Foundation for Autonomous Agents and Multiagent Systems, 2018.
                        [83] Peters J, Mülling K, Altün Y, Peters J. Mulling K, Altun Y. Relative entropy policy search. In: Proceedings of 24th AAAI Conference on Artificial Intelligence (AAAI). Atlanta, Georgia: AAAI, 2010. 1607?1612
                        [84] Singh A, Yang L, Hartikainen K, Finn C, Levine S. End-to-end robotic reinforcement learning without reward engineering. In: Proceedings of the Robotics: Science and Systems. Freiburg im Breisgau, Germany: the MIT Press, 2019.
                        [85] Wang H, Liu X F, Zhou X. Autonomous UAV interception via augmented adversarial inverse reinforcement learning. In: Proceedings of the International Conference on Autonomous Unmanned Systems (ICAUS). Changsha, China: Springer, 2022. 2073?2084
                        [86] Choi S, Kim S, Kim H J. Inverse reinforcement learning control for trajectory tracking of a multirotor UAV. International Journal of Control, Automation and Systems, 2017, 15(4): 1826-1834 doi: 10.1007/s12555-015-0483-3
                        [87] Nguyen H T, Garratt M, Bui L T, Abbass H. Apprenticeship bootstrapping: Inverse reinforcement learning in a multi-skill UAV-UGV coordination task. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). Stockholm, Sweden: International Foundation for Autonomous Agents and Multiagent Systems, 2018. 2204?2206
                        [88] Sun W, Yan D S, Huang J, Sun C H. Small-scale moving target detection in aerial image by deep inverse reinforcement learning. Soft Computing, 2020, 24(8): 5897-5908 doi: 10.1007/s00500-019-04404-6
                        [89] Pattanayak K, Krishnamurthy V, Berry C. Meta-cognition. An inverse-inverse reinforcement learning approach for cognitive radars. In: Proceedings of the 25th International Conference on Information Fusion (FUSION). Link?ping, Sweden: IEEE, 2022. 1?8
                        [90] Kormushev P, Calinon S, Saegusa R, Metta G. Learning the skill of archery by a humanoid robot iCub. In: Proceedings of the 10th IEEE-RAS International Conference on Humanoid Robots. Nashville, USA: IEEE, 2010. 417?423
                        [91] Koller D, Milch B. Multi-agent influence diagrams for representing and solving games. Games and Economic Behavior, 2003, 45(1): 181-221 doi: 10.1016/S0899-8256(02)00544-4
                        [92] Syed U, Schapire R E. A Game-theoretic approach to apprenticeship learning. In: Proceedings of 22nd Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: Curran Associates Inc, 2007. 1449?1456
                        [93] Halperin I, Liu J Y, Zhang X. Combining Reinforcement Learning and Inverse Reinforcement Learning for Asset Allocation Recommendations. arXiv: 2201.01874, 2022.
                        [94] Gleave A, Toyer S. A primer on maximum causal entropy inverse reinforcement learning. arXiv: 2203.11409, 2022.
                        [95] Adams S, Cody T, Beling P A. A survey of inverse reinforcement learning. Artificial Intelligence Review, 2022, 55(6): 4307-4346 doi: 10.1007/s10462-021-10108-x
                        [96] Li X J, Liu H S, Dong M H. A general framework of motion planning for redundant robot manipulator based on deep reinforcement learning. IEEE Transactions on Industrial Informatics, 2022, 18(8): 5253-5263 doi: 10.1109/TII.2021.3125447
                        [97] Jeon W, Su C Y, Barde P, Doan T, Nowrouzezahrai D, Pineau J. Regularized inverse reinforcement learning. In: Proceedings of 9th International Conference on Learning Representations (ICLR). Vienna, Austria: Ithaca, 2021. 1?26
                        [98] Krishnamurthy V, Angley D, Evans R, Moran B. Identifying cognitive radars-inverse reinforcement learning using revealed preferences. IEEE Transactions on Signal Processing, 2020, 68: 4529-4542 doi: 10.1109/TSP.2020.3013516
                        [99] 陳建平, 陳其強, 傅啟明, 高振, 吳宏杰, 陸悠. 基于生成對抗網絡的最大熵逆強化學習. 計算機工程與應用, 2019, 55(22): 119-126 doi: 10.3778/j.issn.1002-8331.1904-0238

                        Chen Jian-Ping, Chen Qi-Qiang, Fu Qi-Ming, Gao Zhen, Wu Hong-Jie, Lu You. Maximum entropy inverse reinforcement learning based on generative adversarial networks. Computer Engineering and Applications, 2019, 55(22): 119-126 doi: 10.3778/j.issn.1002-8331.1904-0238
                        [100] Gruver N, Song J M, Kochenderfer M J, Ermon S. Multi-agent adversarial inverse reinforcement learning with latent variables. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). Auckland, New Zealand: International Foundation for Autonomous Agents and Multiagent Systems, 2020. 1855?1857
                        [101] Giwa B H, Lee C G. A marginal log-likelihood approach for the estimation of discount factors of multiple experts in inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE, 2021. 7786?7791
                        [102] Ghosh S, Srivastava S. Mapping language to programs using multiple reward components with inverse reinforcement learning. In: Proceedings of the Findings of the Association for Computational Linguistics. Punta Cana, Dominican Republic: ACL, 2021. 1449?1462
                        [103] Gronauer S, Diepold K. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, 2021, 55(2): 895-943
                        [104] Bergerson S. Multi-agent inverse reinforcement learning: Suboptimal demonstrations and alternative solution concepts. arXiv: 2109.01178, 2021.
                        [105] Zhao J C. Safety-aware multi-agent apprenticeship learning. arXiv: 2201.08111, 2022.
                        [106] Hwang R, Lee H, Hwang H J. Option compatible reward inverse reinforcement learning. Pattern Recognition Letters, 2022, 154: 83-89 doi: 10.1016/j.patrec.2022.01.016
                      2. 加載中
                      3. 計量
                        • 文章訪問數:  1735
                        • HTML全文瀏覽量:  994
                        • 被引次數: 0
                        出版歷程
                        • 收稿日期:  2023-02-24
                        • 錄用日期:  2023-04-25
                        • 網絡出版日期:  2023-07-03

                        目錄

                          /

                          返回文章
                          返回