1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
      <samp id="qm3rj"></samp>
      <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

      <video id="qm3rj"><code id="qm3rj"></code></video>

        1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
            1. 2.845

              2023影響因子

              (CJCR)

              • 中文核心
              • EI
              • 中國科技核心
              • Scopus
              • CSCD
              • 英國科學(xué)文摘

              留言板

              尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復。謝謝您的支持!

              姓名
              郵箱
              手機號碼
              標題
              留言?xún)热?/th>
              驗證碼

              不確定工業(yè)過(guò)程運行指標異步更新強化學(xué)習決策算法

              李金娜 袁林 丁進(jìn)良

              李金娜, 袁林, 丁進(jìn)良. 不確定工業(yè)過(guò)程運行指標異步更新強化學(xué)習決策算法. 自動(dòng)化學(xué)報, 2023, 49(2): 461?472 doi: 10.16383/j.aas.c210983
              引用本文: 李金娜, 袁林, 丁進(jìn)良. 不確定工業(yè)過(guò)程運行指標異步更新強化學(xué)習決策算法. 自動(dòng)化學(xué)報, 2023, 49(2): 461?472 doi: 10.16383/j.aas.c210983
              Li Jin-Na, Yuan Lin, Ding Jin-Liang. Asynchronous updating reinforcement learning algorithm for decision-making operational indices of uncertain industrial processes. Acta Automatica Sinica, 2023, 49(2): 461?472 doi: 10.16383/j.aas.c210983
              Citation: Li Jin-Na, Yuan Lin, Ding Jin-Liang. Asynchronous updating reinforcement learning algorithm for decision-making operational indices of uncertain industrial processes. Acta Automatica Sinica, 2023, 49(2): 461?472 doi: 10.16383/j.aas.c210983

              不確定工業(yè)過(guò)程運行指標異步更新強化學(xué)習決策算法

              doi: 10.16383/j.aas.c210983
              基金項目: 國家重點(diǎn)研發(fā)計劃項目 (2018YFB1701104), 國家自然科學(xué)基金 (62073158, 61673280, 61525302, 61833004), 遼寧省興遼計劃 (XLYC1808001), 遼寧省科技計劃項目 (2020JH2/10500001), 遼寧省自然基金重點(diǎn)領(lǐng)域聯(lián)合開(kāi)放基金 (2019-KF-03-06), 遼寧省教育廳基本科研項目(LJKZ0401) 資助
              詳細信息
                作者簡(jiǎn)介:

                李金娜:遼寧石油化工大學(xué)教授. 主要研究方向為運行優(yōu)化控制, 數據驅動(dòng)控制, 強化學(xué)習和多智能體優(yōu)化控制. 本文通信作者. E-mail: lijinna_721@126.com

                袁林:遼寧石油化工大學(xué)碩士研究生. 主要研究方向為運行優(yōu)化控制, 數據驅動(dòng)控制和強化學(xué)習. E-mail: lewinyuan@126.com

                丁進(jìn)良:東北大學(xué)教授. 主要研究方向為生產(chǎn)全流程運行優(yōu)化, 智能優(yōu)化, 神經(jīng)網(wǎng)絡(luò )和強化學(xué)習. E-mail: jlding@mail.neu.edu.cn

              Asynchronous Updating Reinforcement Learning Algorithm for Decision-making Operational Indices of Uncertain Industrial Processes

              Funds: Supported by National Key Research and Development Plan Project (2018YFB1701104), National Natural Science Foundation of China (62073158, 61673280, 61525302, 61833004), Project of Liaoning Province Prosperity Plan (XLYC1808001), Science and Technology Planning Project of Liaoning Province (2020 JH2/10500001), Open Project of Key Field Alliance of Liaoning Province (2019-KF-03-06), and Basic Research Project of Education Department of Liaoning Province (LJKZ0401)
              More Information
                Author Bio:

                LI Jin-Na Professor at Liaoning Petrochemical University. Her research interest covers optimal operational control, data-driven control, reinforcement learning, and optimal control of multi-agent systems. Corresponding author of this paper

                YUAN Lin Master student at Liaoning Petrochemical University. His research interest covers optimal operational control, data-driven control, and reinforcement learning

                DING Jin-Liang Professor at Northeastern University. His research interest covers optimization of the whole production process, intelligent optimization, neural networks, and reinforcement learning

              • 摘要: 運行指標決策問(wèn)題是實(shí)現工業(yè)過(guò)程運行安全和生產(chǎn)指標優(yōu)化的關(guān)鍵. 考慮到多運行指標決策問(wèn)題求解的復雜性和工業(yè)過(guò)程生產(chǎn)條件動(dòng)態(tài)波動(dòng)引發(fā)生產(chǎn)指標狀態(tài)的不確定性, 提出了一種策略異步更新強化學(xué)習算法自學(xué)習決策運行指標, 并給出算法收斂性的理論證明. 該算法在隨機自適應動(dòng)態(tài)規劃框架下, 利用樣本均值代替計算生產(chǎn)指標狀態(tài)轉移概率矩陣, 因此無(wú)需要求生產(chǎn)指標狀態(tài)轉移概率矩陣已知. 并且通過(guò)引入時(shí)鐘和定義其閾值, 采用集中式策略評估、多策略異步更新方式用以簡(jiǎn)化求解多運行指標決策問(wèn)題, 提高強化學(xué)習的學(xué)習效率. 利用可測量數據, 自學(xué)習得到的運行指標能夠保證生產(chǎn)指標優(yōu)化, 并且限制在規定范圍之內. 最后, 采用中國西部某大型選礦廠(chǎng)的實(shí)際數據進(jìn)行仿真驗證, 表明該方法的有效性.
              • 圖  1  工業(yè)過(guò)程運行指標決策問(wèn)題

                Fig.  1  Decision-making problem of operational indices in industrial processes

                圖  2  運行指標自學(xué)習機制

                Fig.  2  Self-learning mechanism of operational indices

                圖  3  多執行-評判結構下運行指標自學(xué)習決策流程圖

                Fig.  3  Flowchart of self-learning decision making of operational indices with multiple actors-critic structure

                圖  4  選礦過(guò)程流程圖

                Fig.  4  Flow chart of mineral separation process

                圖  5  精礦產(chǎn)量和精礦品位損失函數

                Fig.  5  Loss functions of the concentrate yield and concentrate grade

                圖  6  多執行神經(jīng)網(wǎng)絡(luò )權值

                Fig.  6  Evolution of weights of multi-actor neural networks

                圖  7  評判神經(jīng)網(wǎng)絡(luò )權值

                Fig.  7  Evolution of weights of critic neural network

                圖  8  200天的運行指標

                Fig.  8  200-day operational indices

                圖  9  200天的精礦品位

                Fig.  9  200-day concentrate grade

                圖  10  200天的精礦產(chǎn)量

                Fig.  10  200-day concentrate yield

                圖  11  策略異步更新和策略同步更新強化學(xué)習算法時(shí)間消耗對比

                Fig.  11  Comparison of time consumption betweenasynchronous policy update and synchronouspolicy update

                圖  12  考慮工況變化和不考慮工況變化統計結果對比

                Fig.  12  Statistic results with and without consideration of dynamics of production condition

                表  1  運行指標

                Table  1  Operational indices

                單元 運行指標 取值范圍 (%)
                豎爐 $a_1$: 磁管回收率 $a_{1\max} =84.8$
                $a_{1\min} =81.3$
                磨礦單元1 $a_2$: 磨礦粒度$a_{2\max} =84.0$
                $a_{2\min} =48.6$
                磨礦單元2 $a_3$: 磨礦粒度$a_{3\max} =88.8$
                $a_{3\min} =63.3$
                強磁選 $a_4$: 精礦品位$a_{4\max} =53.4$
                $a_{4\min} =45.9$
                $a_5$: 尾礦品位$a_{5\max} =23.2$
                $a_{5\min} =17.9$
                弱磁選 $a_6$: 精礦品位$a_{6\max} =57.8$
                $a_{6\min} =53.5$
                $a_7$: 尾礦品位$a_{7\max} =20.2$
                $a_{7\min} =15.9$
                下載: 導出CSV

                表  2  算法的實(shí)驗結果對比

                Table  2  Comparison results between differentalgorithms

                實(shí)驗 方法 產(chǎn)量 (噸) 品位 (%)
                30天 本文算法 240369.8 54.13
                多執行網(wǎng)絡(luò )集成算法[11]206202.254.10
                Reinforce[11, 33]203907.654.07
                實(shí)際值199650.652.86
                1天本文算法8030.254.17
                多執行網(wǎng)絡(luò )集成算法[11]5730.754.15
                Reinforce[11, 33]5648.352.58
                實(shí)際值 5659.4 52.58
                下載: 導出CSV
                1. <button id="qm3rj"><thead id="qm3rj"></thead></button>
                  <samp id="qm3rj"></samp>
                  <source id="qm3rj"><menu id="qm3rj"><pre id="qm3rj"></pre></menu></source>

                  <video id="qm3rj"><code id="qm3rj"></code></video>

                    1. <tt id="qm3rj"><track id="qm3rj"></track></tt>
                        亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页
                      1. [1] 柴天佑. 生產(chǎn)制造全流程優(yōu)化控制對控制與優(yōu)化理論方法的挑戰. 自動(dòng)化學(xué)報, 2009, 35(6): 641-649 doi: 10.3724/SP.J.1004.2009.00641

                        Chai Tian-You. Challenges of optimal control for plant-wide production processes in terms of control and optimization theories. Acta Automatica Sinica, 2009, 35(6): 641-649 doi: 10.3724/SP.J.1004.2009.00641
                        [2] 丁進(jìn)良, 楊翠娥, 陳遠東, 柴天佑. 復雜工業(yè)過(guò)程智能優(yōu)化決策系統的現狀與展望. 自動(dòng)化學(xué)報, 2018, 44(11): 1931-1943

                        Ding Jin-Liang, Yang Cui-E, Chen Yuan-Dong, Chai Tian-You. Research progress and prospects of intelligent optimization decision making in complex industrial process. Acta Automatica Sinica, 2018, 44(11): 1931-1943
                        [3] 柴天佑, 丁進(jìn)良, 王宏, 蘇春翌. 復雜工業(yè)過(guò)程運行的混合智能優(yōu)化控制方法. 自動(dòng)化學(xué)報, 2008, 34(5): 505?515

                        Chai Tian-You, Ding Jin-Liang, Wang Hong, Su Chun-Yi. Hybrid intelligent optimal control method for operation of complex industrial processes. Acta Automatica Sinica, 2008, 34(5): 505?515
                        [4] Huang X, Chu Y, Hu Y, Chai T. Production process management system for production indices optimization of mineral processing. IFAC Proceedings Volumes, 2005, 38(1): 178?183
                        [5] Ochoa S, Wozny G, Repke J U. Plantwide optimizing control of a continuous bioethanol production process. Journal of process Control, 2010, 20(9): 983?998 doi: 10.1016/j.jprocont.2010.06.010
                        [6] Ding J, Chai T, Wang H, Wang J, Zheng X. An intelligent factory-wide optimal operation system for continuous production process. Enterprise Information Systems, 2016, 10(3): 286?302 doi: 10.1080/17517575.2015.1065346
                        [7] Ding J, Modares H, Chai T, Lewis F L. Data-based multiobjective plant-wide performance optimization of industrial processes under dynamic environments. IEEE Transactions on Industrial Informatics, 2016, 12(2): 454?465 doi: 10.1109/TII.2016.2516973
                        [8] Chai T, Ding J, Wang H. Multi-objective hybrid intelligent optimization of operational indices for industrial processes and application. IFAC Proceedings Volumes, 2011, 44(1): 10517?10522 doi: 10.3182/20110828-6-IT-1002.01753
                        [9] Ding J, Yang C, Chai T. Recent progress on data-based optimization for mineral processing plants. Engineering, 2017, 3(2): 183?187 doi: 10.1016/J.ENG.2017.02.015
                        [10] Li J, Ding J, Chai T, Lewis F L. Nonzero-sum game reinforcement learning for performance optimization in large-scale industrial processes. IEEE Transactions on Cybernetics, 2019, 50(9): 4132?4145
                        [11] Liu C, Ding J, Sun J. Reinforcement learning based decision making of operational indices in process industry under changing environment. IEEE Transactions on Industrial Informatics, 2021, 17(4): 2727?2736 doi: 10.1109/TII.2020.3005207
                        [12] Lewis F L, Vrabie D, Vamvoudakis K. Reinforcement learning and feedback control. IEEE Control Systems, 2012, 32(6): 76?105 doi: 10.1109/MCS.2012.2214134
                        [13] Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Nashua: Athena Scientific, 1996.
                        [14] Bertsekas D P. Proper policies in infinite-state stochastic shortest path problems. IEEE Transactions on Automatic Control, 2018, 63(11): 3787?3792 doi: 10.1109/TAC.2018.2811781
                        [15] Liu D, Wang D, Li H. Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Transactions on Neural Networks and Learning Systems, 2013, 25(2): 418?428
                        [16] Na J, hao J, Gao G, Li Z. Output-feedback robust control of uncertain systems via online data-Driven learning. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(6): 2650?2662
                        [17] Song R, Lewis F L, Wei Q. Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(3): 704?713
                        [18] Modares H, Nageshrao S P, Lopes G A D, Babuska R, Lewis F L. Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning. Automatica, 2016, 71: 334?341 doi: 10.1016/j.automatica.2016.05.017
                        [19] Bertsekas D P. Multiagent reinforcement learning: rollout and policy iteration. IEEE/CAA Journal of Automatica Sinica, 2021, 8(2): 249?272 doi: 10.1109/JAS.2021.1003814
                        [20] Liang M, Wang D, Liu D. Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 50(11): 3972?3985
                        [21] Zhang H, Luo Y, Liu D. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490?1503 doi: 10.1109/TNN.2009.2027233
                        [22] Marvi Z, Kiumarsi B. Safe reinforcement learning: a control barrier function optimization approach. International Journal of Robust and Nonlinear Control, 2021, 31(6): 1923?1940 doi: 10.1002/rnc.5132
                        [23] Greene M L, Deptula P, Nivison S, Dixon W E. Sparse learning-based approximate dynamic programming with barrier constraints. IEEE Control Systems Letters, 2020, 4(3): 743?748 doi: 10.1109/LCSYS.2020.2977927
                        [24] Bellman R, ?str?m K J. On structural identifiability. Mathematical Biosciences, 1970, 7(3-4): 329?339 doi: 10.1016/0025-5564(70)90132-X
                        [25] Luo B, Yang Y, Liu D. Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Transactions on Cybernetics, 2021, 51(7): 3630?3640 doi: 10.1109/TCYB.2020.2970969
                        [26] Kiumarsi B, Lewis F L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140?151
                        [27] Zhang R, Tao J. Data-driven modeling using improved multi-objective optimization based neural network for coke furnace system. IEEE Transactions on Industrial Electronics, 2017, 64(4): 3147?3155 doi: 10.1109/TIE.2016.2645498
                        [28] Wang D, Ha M, Qiao J. Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Transactions on Automatic Control, 2020, 65(3): 1272?1279 doi: 10.1109/TAC.2019.2926167
                        [29] Lewis F L, Liu D. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New York: John Wiley & Sons, 2013.
                        [30] Li J, Ding J, Chai T, Lewis F L, Jagannathan S. Adaptive interleaved reinforcement learning: robust stability of affine nonlinear systems with unknown uncertainty. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(1): 270-280 doi: 10.1109/TNNLS.2020.3027653
                        [31] 袁兆麟, 何潤姿, 姚超, 李佳, 班曉娟. 基于強化學(xué)習的濃密機底流濃度在線(xiàn)控制算法. 自動(dòng)化學(xué)報, 2021, 47(7): 1558-1571

                        Yuan Zhao-Lin, He Run-Zi, Yao Chao, Li Jia, Ban Xiao-Juan. Online reinforcement learning control algorithm for concentration of thickener underflow. Acta Automatica Sinica, 2021, 47(7): 1558-1571
                        [32] Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 2017, 6379-6390
                        [33] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT press, 2018.
                      2. 加載中
                      3. 圖(12) / 表(2)
                        計量
                        • 文章訪(fǎng)問(wèn)數:  1401
                        • HTML全文瀏覽量:  186
                        • PDF下載量:  309
                        • 被引次數: 0
                        出版歷程
                        • 收稿日期:  2021-10-18
                        • 錄用日期:  2022-04-28
                        • 網(wǎng)絡(luò )出版日期:  2023-01-10
                        • 刊出日期:  2023-02-20

                        目錄

                          /

                          返回文章
                          返回