1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
      <samp id="qm3rj"></samp>
      <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

      <video id="qm3rj"><code id="qm3rj"></code></video>

        1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
            1. 2.845

              2023影響因子

              (CJCR)

              • 中文核心
              • EI
              • 中國科技核心
              • Scopus
              • CSCD
              • 英國科學(xué)文摘

              留言板

              尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復。謝謝您的支持!

              姓名
              郵箱
              手機號碼
              標題
              留言?xún)热?/th>
              驗證碼

              自適應動(dòng)態(tài)規劃綜述

              張化光 張欣 羅艷紅 楊珺

              張化光, 張欣, 羅艷紅, 楊珺. 自適應動(dòng)態(tài)規劃綜述. 自動(dòng)化學(xué)報, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
              引用本文: 張化光, 張欣, 羅艷紅, 楊珺. 自適應動(dòng)態(tài)規劃綜述. 自動(dòng)化學(xué)報, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
              ZHANG Hua-Guang, ZHANG Xin, LUO Yan-Hong, YANG Jun. An Overview of Research on Adaptive Dynamic Programming. ACTA AUTOMATICA SINICA, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
              Citation: ZHANG Hua-Guang, ZHANG Xin, LUO Yan-Hong, YANG Jun. An Overview of Research on Adaptive Dynamic Programming. ACTA AUTOMATICA SINICA, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303

              自適應動(dòng)態(tài)規劃綜述

              doi: 10.3724/SP.J.1004.2013.00303
              詳細信息
                通訊作者:

                張化光

              An Overview of Research on Adaptive Dynamic Programming

              • 摘要: 自適應動(dòng)態(tài)規劃(Adaptive dynamic programming, ADP)是最優(yōu)控制領(lǐng)域新興起的一種近似最優(yōu)方法, 是當前國際最優(yōu)化領(lǐng)域的研究熱點(diǎn). ADP方法 利用函數近似結構來(lái)近似哈密頓--雅可比--貝爾曼(Hamilton-Jacobi-Bellman, HJB)方程的解, 采用離線(xiàn)迭代或者在線(xiàn)更新的方法, 來(lái)獲得系統的近似最優(yōu)控制策略, 從而能夠有效地解決非線(xiàn)性系統的優(yōu)化控制問(wèn)題. 本文按照ADP的結構變化、算法的發(fā)展和應用三個(gè)方面介紹ADP方法. 對目前ADP方法的研究成果加以總結, 并對這 一研究領(lǐng)域仍需解決的問(wèn)題和未來(lái)的發(fā)展方向作了進(jìn)一步的展望.
              • [1] Bellman R E. Dynamic Programming. Princeton: Princeton University Press, 1957[2] Dreyfus S E, Law A M. The Art and Theory of Dynamic Programming. New York: Academic Press, 1977[3] White D A, Sofge D A. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold, 1992[4] Werbos P J. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22: 25-38[5] Werbos P J. A Menu of Designs for Reinforcement Learning over Time. Cambridge, MA: MIT Press, 1990. 67-95[6] Widrow B, Gupta N, Maitra S. Punish/reward: learning with a critic in adaptive threshold systems. IEEE Transactions on Systems, Man, and Cybernetics, 1973, 3(5): 455- 465[7] Chen Zong-Hai, Wen Feng, Wang Zhi-Ling. Neural network control of nonlinear systems based on adaptive critic. Control and Decision, 2007, 22(7): 765-768, 773(陳宗海, 文峰, 王智靈. 基于自適應評價(jià)的非線(xiàn)性系統神經(jīng)網(wǎng)絡(luò )控制. 控制與決策, 2007, 22(7): 765-768, 773)[8] Lendaris G G, Paintz C. Training strategies for critic and action neural networks in dual heuristic programming method. In: Proceedings of the 1997 IEEE International Conference on Neural Networks. Houston, USA: IEEE, 1997. 712-717[9] Werbos P J. Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 1990, 3(2): 179-189[10] Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996[11] Bertsekas D P. Dynamic programming and optimal control. Approximate Dynamic Programming (Fourth edition) II. Belmont: Athena Scientific, 2012[12] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and reviews, 2002, 32(2): 140-153[13] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998[14] Si J, Barto A G, Powell W B, Wunsch D. Handbook of Learning and Approximate Dynamic Programming. Hoboken: Wiley-IEEE Press, 2004[15] Powell W B. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Princeton: Wiley, 2007[16] Balakrishnan S N, Ding J, Lewis F L. Issues on stability of ADP feedback controllers for dynamical systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 913-917[17] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47[18] Prokhorov D V, Wunsch D C II. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007[19] Padhi R, Unnikrishnan N, Wang X H, Balakrishnan S N. A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10): 1648-1660[20] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791[21] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949[22] Zhang H G, Wei Q L, Luo Y H. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 937-942[23] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503[24] Wei Q L, Zhang H G, Liu D R, Zhao Y. An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Automatica Sinica, 2010, 36(1): 121-129[25] Song R Z, Zhang H G, Luo Y H, Wei Q L. Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing, 2010, 73(16-18): 3020-3027[26] Zhang H G, Song R Z, Wei Q L, Zhang T Y. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Transaction on Neural Networks, 2011, 22(12): 1851-1862[27] Al-Tamimi A, Abu-Khalaf M, Lewis F L. Adaptive critic designs for discrete-time zero-sum games with application to H∞ control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2007, 37(1): 240-247[28] Abu-Khalaf M, Lewis F L, Huang J. Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989-1995[29] Abu-Khalaf M, Lewis F L, Huang J. Neurodynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252[30] Zhang X, Zhang H G, Wang X Y, Luo Y H. A new iteration approach to solve a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game. International Journal of Innovative Computing, Information and Control, 2011, 7(2): 597-608[31] Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47(1): 207- 214[32] Wei Q L, Zhang H G, Cui L L. Data-based optimal control for discrete-time zero-sum games of 2-D systems using adaptive critic designs. Acta Automatica Sinica, 2009, 35(6): 682-692[33] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36[34] Lin Xiao-Feng, Zhang Heng, Song Shao-Jian, Song Chun-Ning. Adaptive dynamic programming with ε-error bound for nonlinear discrete-time systems. Control and Decision, 2011, 26(10): 1586-1590, 1595(林小峰, 張衡, 宋紹劍, 宋春寧. 非線(xiàn)性離散時(shí)間系統帶ε誤差限的自適應動(dòng)態(tài)規劃. 控制與決策, 2011, 26(10): 1586-1590, 1595)[35] Vamvoudakis K G, Vrabie D, Lewis F L. Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem. In: Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Nashville, USA: IEEE, 2009. 36-41[36] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888[37] Dierks T, Jagannthan S. Optimal control of affine nonlinear discrete-time systems. In: Proceedings of the 17th Mediterranean Conference on Control and Automation. Thessaloniki, Greece: IEEE, 2009. 1390-1395[38] Dierks T, Jagannathan S. Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the 48th IEEE Conference on Decision and Control and Conference on Chinese Control. Shanghai, China: IEEE, 2009. 6750-6755[39] Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860[40] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 2011, 22(12): 2226-2236[42] Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556-1569[41] Dierks T, Jagannathan S. Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference (ACC). Baltimore, USA: IEEE, 2010. 1568-1573[43] Liu W X, Venayagamoorthy G K, Wunsch D C II. A heuristic-dynamic-programming-based power system stabilizer for a turbogenerator in a single-machine power system. IEEE Transactions on Industry Applications, 2005, 41(5): 1377-1385[44] Park J W, Harley R G, Venayagamoorthy G K. Adaptive-critic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Transactions on Industry Applications, 2003, 39(5): 1529-1540[45] Venayagamoorthy G K, Harley R G, Wunsch D C. Dual heuristic programming excitation neurocontrol for generators in a multimachine power system. IEEE Transactions on Industry Applications, 2003, 39(2): 382-394[46] Lu C, Si J, Xie X R. Direct heuristic dynamic programming for damping oscillations in a large power system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1008-1013[47] Sun Jian, Liu Feng, Si J, Guo Wen-Tao, Mei Sheng-Wei. An improved approximate dynamic programming and its application in SVC control. Electric Machines and Control, 2011, 15(5): 95-102 (孫健, 劉鋒, Si J, 郭文濤, 梅生偉. 一種改進(jìn)的近似動(dòng)態(tài)規劃方法及其在SVC的應用. 電機與控制學(xué)報, 2011, 15(5): 95-102)[48] Bazzan A L C. A distributed approach for coordination of traffic signal agents. Autonomous Agents and Multi-Agent Systems, 2005, 10(1): 131-164[49] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 677-681(趙冬斌, 劉德榮, 易建強. 基于自適應動(dòng)態(tài)規劃的城市交通信號優(yōu)化控制方法綜述. 自動(dòng)化學(xué)報, 2009, 35(6): 677-681)[50] Ray S, Venayagamoorthy G K, Chaudhuri B, Majumder R. Comparison of adaptive critic-based and classical wide-area controllers for power systems. IEEE Transactions Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1002-1007[51] Li T, Zhao D B, Yi J Q. Heuristic dynamic programming strategy with eligibility traces. In: Proceedings of the 2008 American Control Conference. Seattle, USA: IEEE, 2008. 4535-4540[52] Bai X R, Zhao D B, Yi J Q, Xu J. Coordinated control of multiple ramp metering based on DHP(λ) controller. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 351-356[53] Cai C. An approximate dynamic programming strategy for responsive traffic signal control. In: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, USA: IEEE, 2007. 303-310[54] Li T, Zhao D B, Yi J Q. Adaptive dynamic programming for multi-intersections traffic signal intelligent control. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 286-291[55] Bertsekas D P, Homer M L, Logan D A, Patek S D, Sandell N R. Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2000, 30(1): 42-51[56] Ferrari S, Stengel R F. Online adaptive critic flight control. Journal of Guidance, Control, and Dynamics, 2004, 27(5): 777-786[57] Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 988-993[58] Liu D R, Zhang Y, Zhang H G. A self-learning call admission control scheme for CDMA cellular networks. IEEE Transactions on Neural Networks, 2005, 16(5): 1219-1228
              • 加載中
              計量
              • 文章訪(fǎng)問(wèn)數:  8206
              • HTML全文瀏覽量:  344
              • PDF下載量:  7118
              • 被引次數: 0
              出版歷程
              • 收稿日期:  2012-07-19
              • 修回日期:  2012-10-29
              • 刊出日期:  2013-04-20

              目錄

                /

                返回文章
                返回
                1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
                  <samp id="qm3rj"></samp>
                  <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

                  <video id="qm3rj"><code id="qm3rj"></code></video>

                    1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
                        1. 亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页