自適應動(dòng)態(tài)規劃綜述
doi: 10.3724/SP.J.1004.2013.00303
-
1.
東北大學(xué)信息科學(xué)與工程學(xué)院 沈陽(yáng) 110819;
-
2.
東北大學(xué)流程工業(yè)綜合自動(dòng)化國家重點(diǎn) 實(shí)驗室 沈陽(yáng) 110819;
-
3.
中國石油大學(xué)(華東)信息與控制工程學(xué)院 青島 266580
An Overview of Research on Adaptive Dynamic Programming
-
1.
School of Information Science and Engineering, Northeastern University, Shenyang 110819;
-
2.
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819;
-
3.
College of Information and Control Engineering, China University of Petroleum, Qingdao 266580
-
摘要: 自適應動(dòng)態(tài)規劃(Adaptive dynamic programming, ADP)是最優(yōu)控制領(lǐng)域新興起的一種近似最優(yōu)方法, 是當前國際最優(yōu)化領(lǐng)域的研究熱點(diǎn). ADP方法 利用函數近似結構來(lái)近似哈密頓--雅可比--貝爾曼(Hamilton-Jacobi-Bellman, HJB)方程的解, 采用離線(xiàn)迭代或者在線(xiàn)更新的方法, 來(lái)獲得系統的近似最優(yōu)控制策略, 從而能夠有效地解決非線(xiàn)性系統的優(yōu)化控制問(wèn)題. 本文按照ADP的結構變化、算法的發(fā)展和應用三個(gè)方面介紹ADP方法. 對目前ADP方法的研究成果加以總結, 并對這 一研究領(lǐng)域仍需解決的問(wèn)題和未來(lái)的發(fā)展方向作了進(jìn)一步的展望.
-
關(guān)鍵詞:
- 自適應動(dòng)態(tài)規劃 /
- 神經(jīng)網(wǎng)絡(luò ) /
- 非線(xiàn)性系統 /
- 穩定性
Abstract: Adaptive dynamic programming (ADP) is a novel approximate optimal control scheme, which has recently become a hot topic in the field of optimal control. As a standard approach in the field of ADP, a function approximation structure is used to approximate the solution of Hamilton-Jacobi-Bellman (HJB) equation. The approximate optimal control policy is obtained by using the offline iteration algorithm or the online update algorithm. This paper gives a review of ADP in the order of the variation on the structure of ADP scheme, the development of ADP algorithms and applications of ADP scheme, aiming to bring the reader into this novel field of optimization technology. Furthermore, the future studies are pointed out. -
[1] Bellman R E. Dynamic Programming. Princeton: Princeton University Press, 1957[2] Dreyfus S E, Law A M. The Art and Theory of Dynamic Programming. New York: Academic Press, 1977[3] White D A, Sofge D A. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold, 1992[4] Werbos P J. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22: 25-38[5] Werbos P J. A Menu of Designs for Reinforcement Learning over Time. Cambridge, MA: MIT Press, 1990. 67-95[6] Widrow B, Gupta N, Maitra S. Punish/reward: learning with a critic in adaptive threshold systems. IEEE Transactions on Systems, Man, and Cybernetics, 1973, 3(5): 455- 465[7] Chen Zong-Hai, Wen Feng, Wang Zhi-Ling. Neural network control of nonlinear systems based on adaptive critic. Control and Decision, 2007, 22(7): 765-768, 773(陳宗海, 文峰, 王智靈. 基于自適應評價(jià)的非線(xiàn)性系統神經(jīng)網(wǎng)絡(luò )控制. 控制與決策, 2007, 22(7): 765-768, 773)[8] Lendaris G G, Paintz C. Training strategies for critic and action neural networks in dual heuristic programming method. In: Proceedings of the 1997 IEEE International Conference on Neural Networks. Houston, USA: IEEE, 1997. 712-717[9] Werbos P J. Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 1990, 3(2): 179-189[10] Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996[11] Bertsekas D P. Dynamic programming and optimal control. Approximate Dynamic Programming (Fourth edition) II. Belmont: Athena Scientific, 2012[12] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and reviews, 2002, 32(2): 140-153[13] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998[14] Si J, Barto A G, Powell W B, Wunsch D. Handbook of Learning and Approximate Dynamic Programming. Hoboken: Wiley-IEEE Press, 2004[15] Powell W B. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Princeton: Wiley, 2007[16] Balakrishnan S N, Ding J, Lewis F L. Issues on stability of ADP feedback controllers for dynamical systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 913-917[17] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47[18] Prokhorov D V, Wunsch D C II. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007[19] Padhi R, Unnikrishnan N, Wang X H, Balakrishnan S N. A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10): 1648-1660[20] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791[21] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949[22] Zhang H G, Wei Q L, Luo Y H. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 937-942[23] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503[24] Wei Q L, Zhang H G, Liu D R, Zhao Y. An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Automatica Sinica, 2010, 36(1): 121-129[25] Song R Z, Zhang H G, Luo Y H, Wei Q L. Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing, 2010, 73(16-18): 3020-3027[26] Zhang H G, Song R Z, Wei Q L, Zhang T Y. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Transaction on Neural Networks, 2011, 22(12): 1851-1862[27] Al-Tamimi A, Abu-Khalaf M, Lewis F L. Adaptive critic designs for discrete-time zero-sum games with application to H∞ control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2007, 37(1): 240-247[28] Abu-Khalaf M, Lewis F L, Huang J. Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989-1995[29] Abu-Khalaf M, Lewis F L, Huang J. Neurodynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252[30] Zhang X, Zhang H G, Wang X Y, Luo Y H. A new iteration approach to solve a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game. International Journal of Innovative Computing, Information and Control, 2011, 7(2): 597-608[31] Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47(1): 207- 214[32] Wei Q L, Zhang H G, Cui L L. Data-based optimal control for discrete-time zero-sum games of 2-D systems using adaptive critic designs. Acta Automatica Sinica, 2009, 35(6): 682-692[33] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36[34] Lin Xiao-Feng, Zhang Heng, Song Shao-Jian, Song Chun-Ning. Adaptive dynamic programming with ε-error bound for nonlinear discrete-time systems. Control and Decision, 2011, 26(10): 1586-1590, 1595(林小峰, 張衡, 宋紹劍, 宋春寧. 非線(xiàn)性離散時(shí)間系統帶ε誤差限的自適應動(dòng)態(tài)規劃. 控制與決策, 2011, 26(10): 1586-1590, 1595)[35] Vamvoudakis K G, Vrabie D, Lewis F L. Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem. In: Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Nashville, USA: IEEE, 2009. 36-41[36] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888[37] Dierks T, Jagannthan S. Optimal control of affine nonlinear discrete-time systems. In: Proceedings of the 17th Mediterranean Conference on Control and Automation. Thessaloniki, Greece: IEEE, 2009. 1390-1395[38] Dierks T, Jagannathan S. Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the 48th IEEE Conference on Decision and Control and Conference on Chinese Control. Shanghai, China: IEEE, 2009. 6750-6755[39] Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860[40] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 2011, 22(12): 2226-2236[42] Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556-1569[41] Dierks T, Jagannathan S. Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference (ACC). Baltimore, USA: IEEE, 2010. 1568-1573[43] Liu W X, Venayagamoorthy G K, Wunsch D C II. A heuristic-dynamic-programming-based power system stabilizer for a turbogenerator in a single-machine power system. IEEE Transactions on Industry Applications, 2005, 41(5): 1377-1385[44] Park J W, Harley R G, Venayagamoorthy G K. Adaptive-critic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Transactions on Industry Applications, 2003, 39(5): 1529-1540[45] Venayagamoorthy G K, Harley R G, Wunsch D C. Dual heuristic programming excitation neurocontrol for generators in a multimachine power system. IEEE Transactions on Industry Applications, 2003, 39(2): 382-394[46] Lu C, Si J, Xie X R. Direct heuristic dynamic programming for damping oscillations in a large power system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1008-1013[47] Sun Jian, Liu Feng, Si J, Guo Wen-Tao, Mei Sheng-Wei. An improved approximate dynamic programming and its application in SVC control. Electric Machines and Control, 2011, 15(5): 95-102 (孫健, 劉鋒, Si J, 郭文濤, 梅生偉. 一種改進(jìn)的近似動(dòng)態(tài)規劃方法及其在SVC的應用. 電機與控制學(xué)報, 2011, 15(5): 95-102)[48] Bazzan A L C. A distributed approach for coordination of traffic signal agents. Autonomous Agents and Multi-Agent Systems, 2005, 10(1): 131-164[49] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 677-681(趙冬斌, 劉德榮, 易建強. 基于自適應動(dòng)態(tài)規劃的城市交通信號優(yōu)化控制方法綜述. 自動(dòng)化學(xué)報, 2009, 35(6): 677-681)[50] Ray S, Venayagamoorthy G K, Chaudhuri B, Majumder R. Comparison of adaptive critic-based and classical wide-area controllers for power systems. IEEE Transactions Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1002-1007[51] Li T, Zhao D B, Yi J Q. Heuristic dynamic programming strategy with eligibility traces. In: Proceedings of the 2008 American Control Conference. Seattle, USA: IEEE, 2008. 4535-4540[52] Bai X R, Zhao D B, Yi J Q, Xu J. Coordinated control of multiple ramp metering based on DHP(λ) controller. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 351-356[53] Cai C. An approximate dynamic programming strategy for responsive traffic signal control. In: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, USA: IEEE, 2007. 303-310[54] Li T, Zhao D B, Yi J Q. Adaptive dynamic programming for multi-intersections traffic signal intelligent control. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 286-291[55] Bertsekas D P, Homer M L, Logan D A, Patek S D, Sandell N R. Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2000, 30(1): 42-51[56] Ferrari S, Stengel R F. Online adaptive critic flight control. Journal of Guidance, Control, and Dynamics, 2004, 27(5): 777-786[57] Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 988-993[58] Liu D R, Zhang Y, Zhang H G. A self-learning call admission control scheme for CDMA cellular networks. IEEE Transactions on Neural Networks, 2005, 16(5): 1219-1228
計量
- 文章訪(fǎng)問(wèn)數: 8206
- HTML全文瀏覽量: 344
- PDF下載量: 7118
- 被引次數: 0