1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
      <samp id="qm3rj"></samp>
      <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

      <video id="qm3rj"><code id="qm3rj"></code></video>

        1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
            1. 2.765

              2022影響因子

              (CJCR)

              • 中文核心
              • EI
              • 中國科技核心
              • Scopus
              • CSCD
              • 英國科學文摘

              留言板

              尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

              姓名
              郵箱
              手機號碼
              標題
              留言內容
              驗證碼

              融合自適應評判的隨機系統數據驅動策略優化

              王鼎 王將宇 喬俊飛

              王鼎, 王將宇, 喬俊飛. 融合自適應評判的隨機系統數據驅動策略優化. 自動化學報, 2024, 50(5): 980?990 doi: 10.16383/j.aas.c230678
              引用本文: 王鼎, 王將宇, 喬俊飛. 融合自適應評判的隨機系統數據驅動策略優化. 自動化學報, 2024, 50(5): 980?990 doi: 10.16383/j.aas.c230678
              Wang Ding, Wang Jiang-Yu, Qiao Jun-Fei. Data-driven policy optimization for stochastic systems involving adaptive critic. Acta Automatica Sinica, 2024, 50(5): 980?990 doi: 10.16383/j.aas.c230678
              Citation: Wang Ding, Wang Jiang-Yu, Qiao Jun-Fei. Data-driven policy optimization for stochastic systems involving adaptive critic. Acta Automatica Sinica, 2024, 50(5): 980?990 doi: 10.16383/j.aas.c230678

              融合自適應評判的隨機系統數據驅動策略優化

              doi: 10.16383/j.aas.c230678
              基金項目: 國家自然科學基金 (62222301, 61890930-5, 62021003), 科技創新2030 ——“新一代人工智能”重大項目 (2021ZD0112302, 2021ZD0112301) 資助
              詳細信息
                作者簡介:

                王鼎:北京工業大學信息學部教授. 2009 年獲得東北大學碩士學位, 2012 年獲得中國科學院自動化研究 所博士學位. 主要研究方向為強化學 習與智能控制. 本文通信作者. E-mail: dingwang@bjut.edu.cn

                王將宇:北京工業大學信息學部博士研究生. 主要研究方向為強化學習和智能控制. E-mail: wangjiangyu@emails.bjut.edu.cn

                喬俊飛:北京工業大學信息學部教授. 主要研究方向為污水處理過程智能控制和神經網絡結構設計與優化. E-mail: adqiao@bjut.edu.cn

              Data-driven Policy Optimization for Stochastic Systems Involving Adaptive Critic

              Funds: Supported by National Natural Science Foundation of China (62222301, 61890930-5, 62021003) and National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301)
              More Information
                Author Bio:

                WANG Ding Professor at the Faculty of Information Technology, Beijing University of Technology. He received his master degree from Northeastern University in 2009 and Ph.D. degree from Institute of Automation, Chinese Academy of Sciences in 2012. His research interest covers reinforcement learning and intelligent control. Corresponding author of this paper

                WANG Jiang-Yu Ph.D. candidate at the Faculty of Information Technology, Beijing University of Technology. His research interest covers reinforcement learning and intelligent control

                QIAO Jun-Fei Professor at the Faculty of Information Technology, Beijing University of Technology. His research interest covers intelligent control of wastewater treatment processes, structure design and optimization of neural networks

              • 摘要: 自適應評判技術已經廣泛應用于求解復雜非線性系統的最優控制問題, 但利用其求解離散時間非線性隨機系統的無限時域最優控制問題還存在一定局限性. 本文融合自適應評判技術, 建立一種數據驅動的離散隨機系統折扣最優調節方法. 首先, 針對寬松假設下的非線性隨機系統, 研究帶有折扣因子的無限時域最優控制問題. 所提的隨機系統 Q-learning 算法能夠將初始的容許策略單調不增地優化至最優策略. 基于數據驅動思想, 隨機系統 Q-learning 算法在不建立模型的情況下直接利用數據進行策略優化. 其次, 利用執行?評判神經網絡方案, 實現了隨機系統 Q-learning 算法. 最后, 通過兩個基準系統, 驗證本文提出的隨機系統 Q-learning 算法的有效性.
              • 圖  1  Q 網絡權值曲線 (基準系統 I)

                Fig.  1  Curves of Q network weights (Benchmark system I)

                圖  2  執行網絡權值曲線 (基準系統 I)

                Fig.  2  Curves of action network weights (Benchmark system I)

                圖  3  控制策略測試曲線 (基準系統 I)

                Fig.  3  Curves of control policies for performance test (Benchmark system I)

                圖  4  球臺平衡系統示意圖 (基準系統II)

                Fig.  4  Schematic diagram of the ball-and-beam system (Benchmark system II)

                圖  5  Q 網絡權值曲線 (基準系統II)

                Fig.  5  Curves of Q network weights (Benchmark system II)

                圖  6  執行網絡權值曲線 (基準系統 II)

                Fig.  6  Curves of action network weights (Benchmark system II)

                圖  7  系統狀態曲線 (基準系統 II)

                Fig.  7  Curves of system states (Benchmark system II)

                圖  8  系統控制輸入曲線 (基準系統 II)

                Fig.  8  Curves of system control inputs (Benchmark system II)

                圖  9  代價函數曲線 (基準系統 II)

                Fig.  9  Curves of cost-to-go (Benchmark system II)

                表  1  隨機 Q-learning 算法的主要參數

                Table  1  Main parameters of the stochastic Q-learning algorithm

                算法參數$\mathcal{Q}$$\mathcal{R}$${\rho}_{\max}$$\lambda$$\epsilon$
                基準系統I$2I_2$2.03000.970.01
                基準系統II$0.1I_4$0.15000.990.01
                下載: 導出CSV

                表  2  球臺平衡系統的主要參數

                Table  2  Main parameters of the ball-and-beam system

                符號及取值物理意義
                $S_t=0.001 \;{\rm{N}}/ {\rm{m}}$驅動機械剛度
                $L_\omega =0.5\; {\rm{m}}$平臺半徑
                $L =0.48 \; {\rm{m}}$電機作用半徑
                $f_c= 1\; {\rm{N_s/ m}}$驅動電機的機械摩擦系數
                $I_\omega = 0.140\;25 \;{\rm{kg} }\cdot {\rm{m} }^2$平臺慣性矩
                $g = 9.8\; {\rm{m/s}}^2$重力加速度
                $\varpi =0.016\;2 \;{\rm{kg} }$球體質量
                $\tau =0.02 \;{\rm{m}}$球體滾動半徑
                $I_b=4.32\times10^{-5} \;{\rm{kg}}\cdot {\rm{m}}^2$球體轉動慣量
                下載: 導出CSV
                1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
                  <samp id="qm3rj"></samp>
                  <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

                  <video id="qm3rj"><code id="qm3rj"></code></video>

                    1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
                        亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页
                      1. [1] Liu D R, Xue S, Zhao B, Luo B, Wei Q L. Adaptive dynamic programming for control: A survey and recent advances. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(1): 142?160 doi: 10.1109/TSMC.2020.3042876
                        [2] Bellman R. Dynamic programming. Science, 1966, 153(3731): 34?37 doi: 10.1126/science.153.3731.34
                        [3] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39?47 doi: 10.1109/MCI.2009.932261
                        [4] Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997?1007 doi: 10.1109/72.623201
                        [5] Zhao M M, Wang D, Qiao J F, Ha M M, Ren J. Advanced value iteration for discrete-time intelligent critic control: A survey. Artificial Intelligence Review, 2023, 56(10): 12315?12346 doi: 10.1007/s10462-023-10497-1
                        [6] Wang D, Gao N, Liu D R, Li J N, Lewis F L. Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA Journal of Automatica Sinica, 2024, 11(1): 18?36 doi: 10.1109/JAS.2023.123843
                        [7] Liu T, Tian B, Ai Y F, Li L, Cao D P, Wang F Y. Parallel reinforcement learning: A framework and case study. IEEE/CAA Journal of Automatica Sinica, 2018, 5(4): 827?835 doi: 10.1109/JAS.2018.7511144
                        [8] Miao Q H, Lv Y S, Huang M, Wang X, Wang F Y. Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real. IEEE/CAA Journal of Automatica Sinica, 2023, 10(3): 603?631 doi: 10.1109/JAS.2023.123375
                        [9] Zhao M M, Wang D, Ha M M, Qiao J F. Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Transactions on Cybernetics, 2023, 53(7): 4487?4499 doi: 10.1109/TCYB.2022.3198078
                        [10] 王鼎, 胡凌治, 趙明明, 哈明鳴, 喬俊飛. 未知非線性零和博弈最優跟蹤的事件觸發控制設計. 自動化學報, 2023, 49(1): 91?101

                        Wang Ding, Hu Ling-Zhi, Zhao Ming-Ming, Ha Ming-Ming, Qiao Jun-Fei. Event-triggered control design for optimal tracking of unknown nonlinear zero-sum games. Acta Automatica Sinica, 2023, 49(1): 91?101
                        [11] 王鼎. 一類離散動態系統基于事件的迭代神經控制. 工程科學學報, 2022, 44(3): 411?419

                        Wang Ding. Event-based iterative neural control for a type of discrete dynamic plant. Chinese Journal of Engineering, 2022, 44(3): 411?419
                        [12] Wang D, Hu L Z, Zhao M M, Qiao J F. Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(3): 1584?1595 doi: 10.1109/TSMC.2022.3201671
                        [13] Wang D, Li X, Zhao M M, Qiao J F. Adaptive critic control design with knowledge transfer for wastewater treatment applications. IEEE Transactions on Industrial Informatics, DOI: 10.1109/TⅡ.2023.3278875
                        [14] 王鼎, 趙慧玲, 李鑫. 基于多目標粒子群優化的污水處理系統自適應評判控制. 工程科學學報, 2024, 46(5): 908?917

                        Wang Ding, Zhao Hui-Ling, Li Xin. Adaptive critic control for wastewater treatment systems based on multi-objective particle swarm optimization. Chinese Journal of Engineering, 2024, 46(5): 908?917
                        [15] Wu T Y, He S Z, Liu J P, Sun S Q, Liu K, Han Q L, et al. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 2023, 10(5): 1122?1136 doi: 10.1109/JAS.2023.123618
                        [16] Luo B, Liu D R, Wu H N, Wang D, Lewis F L. Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Transactions on Cybernetics, 2017, 47(10): 3341?3354 doi: 10.1109/TCYB.2016.2623859
                        [17] Luo B, Yang Y, Liu D R. Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Transactions on Cybernetics, 2021, 51(7): 3630?3640 doi: 10.1109/TCYB.2020.2970969
                        [18] Lin M D, Zhao B, Liu D R. Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(6): 3692?3703 doi: 10.1109/TSMC.2021.3071968
                        [19] Su S, Zhu Q Y, Liu J Q, Tang T, Wei Q L, Cao Y. A data-driven iterative learning approach for optimizing the train control strategy. IEEE Transactions on Industrial Informatics, 2023, 19(7): 7885?7893 doi: 10.1109/TII.2022.3195888
                        [20] Wei Q L, Song R Z, Yan P F. Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(2): 444?458 doi: 10.1109/TNNLS.2015.2464080
                        [21] Liang M M, Wang D, Liu D R. Improved value iteration for neural-network-based stochastic optimal control design. Neural Networks, 2020, 124: 280?295 doi: 10.1016/j.neunet.2020.01.004
                        [22] Pang B, Jiang Z P. Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Transactions on Automatic Control, 2023, 68(4): 2383?2390 doi: 10.1109/TAC.2022.3172250
                        [23] Wei Q L, Zhou T M, Lu J W, Liu Y, Su S, Xiao J. Continuous-time stochastic policy iteration of adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(10): 6375?6387 doi: 10.1109/TSMC.2023.3284612
                        [24] Lee J, Haddad W M, Lanchares M. Finite time stability and optimal finite time stabilization for discrete-time stochastic dynamical systems. IEEE Transactions on Automatic Control, 2023, 68(7): 3978?3991
                        [25] Liang M M, Wang D, Liu D R. Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 3972?3985 doi: 10.1109/TSMC.2019.2907991
                        [26] 王鼎, 趙明明, 哈明鳴, 喬俊飛. 基于折扣廣義值迭代的智能最優跟 蹤及應用驗證. 自動化學報, 2022, 48(1): 182?193

                        Wang Ding, Zhao Ming-Ming, Ha Ming-Ming, Qiao Jun-Fei. Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Automatica Sinica, 2022, 48(1): 182?193
                        [27] Wang D, Ren J, Ha M M, Qiao J F. System stability of learning-based linear optimal control with general discounted value iteration. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6504?6514 doi: 10.1109/TNNLS.2021.3137524
                        [28] Lincoln B, Rantzer A. Relaxing dynamic programming. IEEE Transactions on Automatic Control, 2006, 51(8): 1249?1260 doi: 10.1109/TAC.2006.878720
                        [29] Ha M M, Wang D, Liu D R. Generalized value iteration for discounted optimal control with stability analysis. Systems & Control Letters, 2021, 147: Article No. 104847
                        [30] Ha M M, Wang D, Liu D R. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA Journal of Automatica Sinica, 2022, 9(7): 1262?1272 doi: 10.1109/JAS.2022.105692
                        [31] Yang X, Wei Q L. Adaptive critic designs for optimal event-driven control of a CSTR system. IEEE Transactions on Industrial Informatics, 2021, 17(1): 484?493 doi: 10.1109/TII.2020.2972383
                        [32] Heydari A. Revisiting approximate dynamic programming and its convergence. IEEE Transactions on Cybernetics, 2014, 44(12): 2733?2743 doi: 10.1109/TCYB.2014.2314612
                        [33] Ha M M, Wang D, Liu D R. Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Networks, 2021, 144: 176?186 doi: 10.1016/j.neunet.2021.08.025
                        [34] Wang D, Wang J Y, Zhao M M, Xin P, Qiao J F. Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control. IEEE/CAA Journal of Automatica Sinica, 2023, 10(9): 1797?1809 doi: 10.1109/JAS.2023.123684
                        [35] Liu D R, Wei Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621?634 doi: 10.1109/TNNLS.2013.2281663
                        [36] Zhong X N, Ni Z, He H B. Gr-GDHP: A new architecture for globalized dual heuristic dynamic programming. IEEE Transactions on Cybernetics, 2017, 47(10): 3318?3330 doi: 10.1109/TCYB.2016.2598282
                        [37] Ha M M, Wang D, Liu D R. A novel value iteration scheme with adjustable convergence rate. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7430?7442 doi: 10.1109/TNNLS.2022.3143527
                      2. 加載中
                      3. 圖(9) / 表(2)
                        計量
                        • 文章訪問數:  178
                        • HTML全文瀏覽量:  71
                        • PDF下載量:  65
                        • 被引次數: 0
                        出版歷程
                        • 收稿日期:  2023-11-02
                        • 錄用日期:  2024-01-08
                        • 網絡出版日期:  2024-02-19
                        • 刊出日期:  2024-05-20

                        目錄

                          /

                          返回文章
                          返回