1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
      <samp id="qm3rj"></samp>
      <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

      <video id="qm3rj"><code id="qm3rj"></code></video>

        1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
            1. 2.765

              2022影響因子

              (CJCR)

              • 中文核心
              • EI
              • 中國科技核心
              • Scopus
              • CSCD
              • 英國科學文摘

              留言板

              尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

              姓名
              郵箱
              手機號碼
              標題
              留言內容
              驗證碼

              基于距離信息的追逃策略: 信念狀態連續隨機博弈

              陳靈敏 馮宇 李永強

              陳靈敏, 馮宇, 李永強. 基于距離信息的追逃策略: 信念狀態連續隨機博弈. 自動化學報, 2024, 50(4): 828?840 doi: 10.16383/j.aas.c230018
              引用本文: 陳靈敏, 馮宇, 李永強. 基于距離信息的追逃策略: 信念狀態連續隨機博弈. 自動化學報, 2024, 50(4): 828?840 doi: 10.16383/j.aas.c230018
              Chen Ling-Min, Feng Yu, Li Yong-Qiang. Distance information based pursuit-evasion strategy: Continuous stochastic game with belief state. Acta Automatica Sinica, 2024, 50(4): 828?840 doi: 10.16383/j.aas.c230018
              Citation: Chen Ling-Min, Feng Yu, Li Yong-Qiang. Distance information based pursuit-evasion strategy: Continuous stochastic game with belief state. Acta Automatica Sinica, 2024, 50(4): 828?840 doi: 10.16383/j.aas.c230018

              基于距離信息的追逃策略: 信念狀態連續隨機博弈

              doi: 10.16383/j.aas.c230018
              基金項目: 國家自然科學基金(61973276, 62073294), 浙江省自然科學基金(LZ21F030003)資助
              詳細信息
                作者簡介:

                陳靈敏:浙江工業大學信息工程學院碩士研究生. 2020年獲得紹興文理學院學士學位. 主要研究方向為博弈論與機器學習在決策問題中的應用. E-mail: 2112003096@zjut.edu.cn

                馮宇:浙江工業大學信息工程學院教授. 2011 年獲得法國南特礦業大學博士學位. 主要研究方向為網絡化控制系統, 分布式濾波, 不確定系統的魯棒分析與控制, 以及博弈論與機器學習在決策問題中的應用. 本文通信作者. E-mail: yfeng@zjut.edu.cn

                李永強:浙江工業大學信息工程學院副教授. 2014 年獲得北京交通大學博士學位. 主要研究方向為強化學習, 非線性控制以及深度學習. E-mail: yqli@zjut.edu.cn

              Distance Information Based Pursuit-evasion Strategy: Continuous Stochastic Game With Belief State

              Funds: Supported by National Natural Science Foundation of China (61973276, 62073294) and Natural Science Foundation of Zhejiang Province (LZ21F030003)
              More Information
                Author Bio:

                CHEN Ling-Min Master student at College of Information Engineering, Zhejiang University of Technology. She received her bachelor degree from Shaoxing University in 2020. Her research interest covers game theory and machine learning in decision-making

                FENG Yu Professor at College of Information Engineering, Zhejiang University of Technology. He received his Ph.D. degree from Ecole des Mines de Nantes in 2011. His research interest covers networked control systems, distributed filtering, and robust analysis and control for uncertainty systems, and applications of game theory and machine learning in decision-making. Corresponding author of this paper

                LI Yong-Qiang Associate professor at College of Information Engineering, Zhejiang University of Technology. He received his Ph.D. degree from Beijing Jiaotong University in 2014. His research interest covers reinforcement learning, nonlinear control and deep learning

              • 摘要: 追逃問題的研究在對抗、追蹤以及搜查等領域極具現實意義. 借助連續隨機博弈與馬爾科夫決策過程(Markov decision process, MDP), 研究使用測量距離求解多對一追逃問題的最優策略. 在此追逃問題中, 追捕群體僅領導者可測量與逃逸者間的相對距離, 而逃逸者具有全局視野. 追逃策略求解被分為追博弈與馬爾科夫決策兩個過程. 在求解追捕策略時, 通過分割環境引入信念區域狀態以估計逃逸者位置, 同時使用測量距離對信念區域狀態進行修正, 構建起基于信念區域狀態的連續隨機追博弈, 并借助不動點定理證明了博弈平穩納什均衡策略的存在性. 在求解逃逸策略時, 逃逸者根據全局信息建立混合狀態下的馬爾科夫決策過程及相應的最優貝爾曼方程. 同時給出了基于強化學習的平穩追逃策略求解算法, 并通過案例驗證了該算法的有效性.
              • 圖  1  追逃問題環境

                Fig.  1  Environment of pursuit-evasion problem

                圖  2  (a) $ L $個區域; (b) 追捕群體的劃分

                Fig.  2  (a) $ L $ regions; (b) Division of pursuit group

                圖  3  警戒區域

                Fig.  3  Warning area

                圖  4  第$ m $個區域

                Fig.  4  The $m\text{-}{\rm{th}}$ area

                圖  5  預測距離

                Fig.  5  Prediction distance

                圖  6  地圖尺寸

                Fig.  6  Size of map

                圖  7  追博弈中追捕群體的收益

                Fig.  7  Pursuits' reward in the pursuit game

                圖  8  MDP中逃逸者的收益

                Fig.  8  Evader's reward in MDP

                圖  9  算法測試過程

                Fig.  9  Algorithm testing process

                圖  10  追捕群體與逃逸者的運動軌跡圖

                Fig.  10  Trajectories of pursuits and evader

                表  1  結果對比

                Table  1  Result comparison

                算法 捕捉平均步數 捕捉成功率
                本文算法 41 95%
                本文算法(未修正) 43 87%
                MAPPO[40] 88 59%
                MASAC[41] 85 61%
                MADDPG[42] 99 56%
                幾何估計追捕[33] 78 72%
                基于三角定位追捕[34] 61 94%
                至少一人全局視野追捕[23] 62 85%
                自動追蹤追捕[36] 82 71%
                自適應切換追捕[37] 65 66%
                隨機策略 152 10%
                下載: 導出CSV
                1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
                  <samp id="qm3rj"></samp>
                  <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

                  <video id="qm3rj"><code id="qm3rj"></code></video>

                    1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
                        亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页
                      1. [1] 杜永浩, 邢立寧, 蔡昭權. 無人飛行器集群智能調度技術綜述. 自動化學報, 2020, 46(2): 222?241

                        Du Yong-Hao, Xing Li-Ning, Cai Zhao-Quan. Survey on intelligent scheduling technologies for unmanned flying craft clusters. Acta Automatica Sinica, 2020, 46(2): 222?241
                        [2] 寇立偉, 項基. 基于輸出反饋線性化的多移動機器人目標包圍控制. 自動化學報, 2022, 48(5): 1285?1291

                        Kou Li-Wei, Xiang Ji. Target fencing control of multiple mobile robots using output feedback linearization. Acta Automatica Sinica, 2022, 48(5): 1285?1291
                        [3] Ferrari S, Fierro R, Perteet B, Cai C H, Baumgartner K. A geometric optimization approach to detecting and intercepting dynamic targets using a mobile sensor network. SIAM Journal on Control and Optimization, 2009, 48(1): 292?320 doi: 10.1137/07067934X
                        [4] Isaacs R. Differential Games. New York: Wiley, 1965.
                        [5] Osborne M J, Rubinstein A. A Course in Game Theory. Cambridge: MIT Press, 1994.
                        [6] 施偉, 馮旸赫, 程光權, 黃紅藍, 黃金才, 劉忠, 等. 基于深度強化學習的多機協同空戰方法研究. 自動化學報, 2021, 47(7): 1610?1623

                        Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang Hong-Lan, Huang Jin-Cai, Liu Zhong, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(7): 1610?1623
                        [7] 耿遠卓, 袁利, 黃煌, 湯亮. 基于終端誘導強化學習的航天器軌道追逃博弈. 自動化學報, 2023, 49(5): 974?984

                        Geng Yuan-Zhuo, Yuan Li, Huang Huang, Tang Liang. Terminal-guidance based reinforcement-learning for orbital pursuit-evasion game of the spacecraft. Acta Automatica Sinica, 2023, 49(5): 974?984
                        [8] Engin S, Jiang Q Y, Isler V. Learning to play pursuit-evasion with visibility constraints. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE, 2021. 3858?3863
                        [9] Al-Talabi A A. Multi-player pursuit-evasion differential game with equal speed. In: Proceedings of the IEEE International Automatic Control Conference (CACS). Pingtung, Taiwan, China: IEEE, 2017. 1?6
                        [10] Selvakumar J, Bakolas E. Feedback strategies for a reach-avoid game with a single evader and multiple pursuers. IEEE Transactions on Cybernetics, 2021, 51(2): 696?707 doi: 10.1109/TCYB.2019.2914869
                        [11] de Souza C, Newbury R, Cosgun A, Castillo P, Vidolov B, Kuli? D. Decentralized multi-agent pursuit using deep reinforcement learning. IEEE Robotics and Automation Letters, 2021, 6(3): 4552?4559 doi: 10.1109/LRA.2021.3068952
                        [12] Zhou Z J, Xu H. Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning. Neurocomputing, 2022, 484: 46?58 doi: 10.1016/j.neucom.2021.01.141
                        [13] Garcia E, Casbeer D W, Von Moll A, Pachter M. Multiple pursuer multiple evader differential games. IEEE Transactions on Automatic Control, 2021, 66(5): 2345?2350 doi: 10.1109/TAC.2020.3003840
                        [14] Pierson A, Wang Z J, Schwager M. Intercepting rogue robots: An algorithm for capturing multiple evaders with multiple pursuers. IEEE Robotics and Automation Letters, 2017, 2(2): 530?537 doi: 10.1109/LRA.2016.2645516
                        [15] Gibbons R. A Primer in Game Theory. Harlow: Prentice Education Limited, 1992.
                        [16] Parthasarathy T. Discounted, positive, and noncooperative stochastic games. International Journal of Game Theory, 1973, 2(1): 25?37 doi: 10.1007/BF01737555
                        [17] Maitra A, Parthasarathy T. On stochastic games. Journal of Optimization Theory and Applications, 1970, 5(4): 289?300 doi: 10.1007/BF00927915
                        [18] Liu S Y, Zhou Z Y, Tomlin C, Hedrick K. Evasion as a team against a faster pursuer. In: Proceedings of the American Control Conference. Washington, USA: IEEE, 2013. 5368?5373
                        [19] Huang L N, Zhu Q Y. A dynamic game framework for rational and persistent robot deception with an application to deceptive pursuit-evasion. IEEE Transactions on Automation Science and Engineering, 2022, 19(4): 2918?2932 doi: 10.1109/TASE.2021.3097286
                        [20] Qi D D, Li L Y, Xu H L, Tian Y, Zhao H Z. Modeling and solving of the missile pursuit-evasion game problem. In: Proceedings of the 40th Chinese Control Conference (CCC). Shanghai, China: IEEE, 2021. 1526?1531
                        [21] 劉坤, 鄭曉帥, 林業茗, 韓樂, 夏元清. 基于微分博弈的追逃問題最優策略設計. 自動化學報, 2021, 47(8): 1840?1854

                        Liu Kun, Zheng Xiao-Shuai, Lin Ye-Ming, Han Le, Xia Yuan-Qing. Design of optimal strategies for the pursuit-evasion problem based on differential game. Acta Automatica Sinica, 2021, 47(8): 1840?1854
                        [22] Xu Y H, Yang H, Jiang B, Polycarpou M M. Multiplayer pursuit-evasion differential games with malicious pursuers. IEEE Transactions on Automatic Control, 2022, 67(9): 4939?4946 doi: 10.1109/TAC.2022.3168430
                        [23] Lin W, Qu Z H, Simaan M A. Nash strategies for pursuit-evasion differential games involving limited observations. IEEE Transactions on Aerospace and Electronic Systems, 2015, 51(2): 1347?1356 doi: 10.1109/TAES.2014.130569
                        [24] Fang X, Wang C, Xie L H, Chen J. Cooperative pursuit with multi-pursuer and one faster free-moving evader. IEEE Transactions on Cybernetics, 2022, 52(3): 1405?1414 doi: 10.1109/TCYB.2019.2958548
                        [25] Lopez V G, Lewis F L, Wan Y, Sanchez E N, Fan L L. Solutions for multiagent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors. IEEE Transactions on Automatic Control, 2020, 65(5): 1911?1923 doi: 10.1109/TAC.2019.2926554
                        [26] 鄭延斌, 樊文鑫, 韓夢云, 陶雪麗. 基于博弈論及Q學習的多Agent協作追捕算法. 計算機應用, 2020, 40(6): 1613?1620

                        Zheng Yan-Bin, Fan Wen-Xin, Han Meng-Yun, Tao Xue-Li. Multi-agent collaborative pursuit algorithm based on game theory and Q-learning. Journal of Computer Applications, 2020, 40(6): 1613?1620
                        [27] Zhu J G, Zou W, Zhu Z. Learning evasion strategy in pursuit-evasion by deep Q-network. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR). Beijing, China: IEEE, 2018. 67?72
                        [28] Bilgin A T, Kadioglu-Urtis E. An approach to multi-agent pursuit evasion games using reinforcement learning. In: Proceedings of the International Conference on Advanced Robotics (ICAR). Istanbul, Turkey: IEEE, 2015. 164?169
                        [29] Wang Y D, Dong L, Sun C Y. Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing, 2020, 412: 101?114 doi: 10.1016/j.neucom.2020.06.031
                        [30] Zhang R L, Zong Q, Zhang X Y, Dou L Q, Tian B L. Game of drones: Multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2022.3146976
                        [31] Coleman D, Bopardikar S D, Tan X B. Observability-aware target tracking with range only measurement. In: Proceedings of the American Control Conference (ACC). New Orleans, USA: IEEE, 2021. 4217?4224
                        [32] Chen W, Sun R S. Range-only SLAM for underwater navigation system with uncertain beacons. In: Proceedings of the 10th International Conference on Modelling, Identification and Control (ICMIC). Guiyang, China: IEEE, 2018. 1?5
                        [33] Bopardikar S D, Bullo F, Hespanha J P. A pursuit game with range-only measurements. In: Proceedings of the 47th IEEE Conference on Decision and Control. Cancun, Mexico: IEEE, 2008. 4233?4238
                        [34] Lima R, Ghose D. Target localization and pursuit by sensor-equipped UAVs using distance information. In: Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS). Miami, USA: IEEE, 2017. 383?392
                        [35] Fidan B, Kiraz F. On convexification of range measurement based sensor and source localization problems. Ad Hoc Networks, 2014, 20: 113?118 doi: 10.1016/j.adhoc.2014.04.003
                        [36] Chaudhary G, Sinha A. Capturing a target with range only measurement. In: Proceedings of the European Control Conference (ECC). Zurich, Switzerland: IEEE, 2013. 4400?4405
                        [37] Güler S, Fidan B. Target capture and station keeping of fixed speed vehicles without self-location information. European Journal of Control, 2018, 43: 1?11 doi: 10.1016/j.ejcon.2018.06.003
                        [38] Sutton R S, Barto A G. Reinforcement Learning: An Introduction (Second edition). Cambridge: MIT Press, 2018.
                        [39] Kreyszig E. Introductory Functional Analysis With Applications. New York: John Wiley & Sons, 1991.
                        [40] Yu C, Velu A, Vinitsky E, Gao J X, Wang Y, Bayen A, et al. The surprising effectiveness of PPO in cooperative multi-agent games. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA: NIPS, 2022.
                        [41] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018. 1861?1870
                        [42] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations. San Juan, Puerto Rico: ICLR, 2015.
                      2. 加載中
                      3. 圖(10) / 表(1)
                        計量
                        • 文章訪問數:  744
                        • HTML全文瀏覽量:  283
                        • PDF下載量:  151
                        • 被引次數: 0
                        出版歷程
                        • 收稿日期:  2023-01-12
                        • 錄用日期:  2023-04-04
                        • 網絡出版日期:  2023-05-11
                        • 刊出日期:  2024-04-26

                        目錄

                          /

                          返回文章
                          返回