Virtual Sample Generation Method Based on Hybrid Optimization With Multi-objective PSO
-
摘要: 受限于檢測技術難度、高時間與經濟成本等原因, 難測參數的軟測量模型建模樣本存在數量少、分布稀疏與不平衡等問題, 嚴重制約了數據驅動模型的泛化性能. 針對以上問題, 提出一種基于多目標粒子群優化(Multi-objective particle swarm optimization, MOPSO)混合優化的虛擬樣本生成(Virtual sample generation, VSG)方法. 首先, 設計綜合學習粒子群優化算法的種群表征機制, 使其能夠同時編碼用于連續變量和離散變量; 然后, 定義具有多階段多目標特性的綜合學習粒子群優化算法適應度函數, 使其能夠在確保模型泛化性能的同時最小化虛擬樣本數量; 最后, 提出面向虛擬樣本生成的多目標混合優化任務以改進綜合學習粒子群優化算法, 使其能夠適應虛擬樣本優選過程的變維特性并提高收斂速度. 同時, 首次借鑒度量學習提出用于評價虛擬樣本質量的綜合評價指標和分布相似指標. 利用基準數據集和真實工業數據集驗證了所提方法的有效性和優越性.Abstract: Due to the difficulty of detection technology, and high time and economic cost, the modeling samples of soft-sensing model with difficult parameters have some problems, such as small numbers, sparse distribution, and imbalance, which seriously restrict the generalization performance of data-driven models. To solve the above problems, a virtual sample generation (VSG) method based on multi-objective particle swarm optimization (MOPSO) hybrid optimization is proposed. First, the population representation mechanism of the integrated learning particle swarm optimization algorithm is designed, so that it can simultaneously encode the continuous and the discrete variables. Then, the fitness function of the integrated learning particle swarm optimization algorithm with multi-stage and multi-objective characteristics is defined to minimize the number of virtual samples while ensuring the generalization performance of the model. Finally, a multi-objective hybrid optimization task is generated for virtual samples to improve the integrated learning particle swarm optimization algorithm, so that it can adapt to the variable dimension characteristics of the virtual sample optimization process and improve the convergence speed. At the same time, the comprehensive evaluation index and distribution similarity index are proposed for evaluating the quality of virtual samples by referring to metric learning for the first time. In this paper, two benchmark datasets and an actual industrial dataset are used to verify the effectiveness and superiority of the proposed method.
-
表 1 本文采用符號的含義
Table 1 The meaning of the symbols used in this article
序號 符號 含義 1 ${\rho _i}$ 全局最優粒子選擇指標 2 ${\rho _j}$ 虛擬樣本綜合評價指標 3 $\eta $ 數據分布相似度 4 ${{\boldsymbol{F}}}\left( {{\boldsymbol{z}}} \right)$ 多目標優化問題的目標函數集 5 ${ {\boldsymbol{z} } }, {\boldsymbol{z}}_n^p\left( {t + 1} \right)$ 優化問題決策變量(粒子的位置矢量), 表示第$t + 1$次迭代時, 粒子$p$的第$n$維位置值 6 ${ {\boldsymbol{v} } }, {\boldsymbol{v} }_n^p( {t + 1} )$ 粒子的速度矢量, 表示第$t + 1$次迭代時, 粒子$p$的第$n$維速度值 7 ${w_{{{\rm{inertia}}}}}$ 粒子速度更新的慣性權重 8 ${{{\boldsymbol{d}}}^p}\left( {t + 1} \right)$ 第$t + 1$次迭代時, 粒子$p$的個體最優 9 $E_n^p$ 粒子$p$的第$n$維的學習樣例值 10 ${N_{{{\rm{refresh}}}}}$ 個體最優未更新閾值, 用于控制學習樣例的更新 11 $P_c^p$ 粒子$p$的學習概率, 用于控制學習樣例的更新概率 12 $ran{k^p}$ 粒子$p$個體最優的適應度在種群中排名 13 $K$ RF模型中決策樹數量 14 ${L_F}$ RF模型中切分特征數 15 ${\theta _{{{\rm{leaf}}}}}$ RF模型中決策樹的葉節點包含樣本數量的閾值 16 $F_{_{{{\rm{sel}}}}}^q$ RF模型中決策樹的節點$q$最佳切分特征 17 ${s^q}$ RF模型中決策樹的節點$q$最佳分裂點取值 18 $f_{_{{{\rm{tree}}}}}^k\left( \cdot \right)$ RF模型中第$k$個決策樹模型 19 $f_{_{{{\rm{RF}}}}}^{}\left( \cdot \right)$ RF 模型 20 ${{\boldsymbol{z}}}_{_{{{\rm{para}}}}}^{}$ 指導候選虛擬樣本生成的參數決策變量 21 ${{\boldsymbol{z}}}_{_{{{\rm{vss}}}}}^{}$ 篩選候選虛擬樣本選擇決策變量 22 $\mathop {{\boldsymbol{R}}}\nolimits_{_{{{\rm{train}}}}} $ 原始小樣本訓練集 23 ${ { {\boldsymbol{x} } }_{ { {\rm{vsg\text{-}min} } } }}, { { {\boldsymbol{x} } }_{ { {\rm{vsg\text{-}max} } } }}$ 采用改進MTD進行擴展后的輸入擴展域的上限和下限 24 ${y_{ { {\rm{vsg\text{-}min} } } }}, {y_{ { {\rm{vsg\text{-}max} } } }}$ 采用改進MTD進行擴展后的輸出擴展域的上限和下限 25 $\mathop { {\boldsymbol{X} } }\nolimits_{_{ { {\rm{vs\text{-}g} } } }}$ 混合插值生成的虛擬樣本輸入 26 $\mathop {{\boldsymbol{X}}}\nolimits_{_{{{\rm{equal}}}}} , \mathop {{\boldsymbol{X}}}\nolimits_{_{{{\rm{rand}}}}} $ 等間隔插值、隨機插值生成的虛擬樣本輸入 27 $\mathop { {\boldsymbol{y} } }\nolimits_{_{ { {\rm{vs\text{-}g1} } } } } , \mathop { {\boldsymbol{y} } }\nolimits_{_{ { {\rm{vs\text{-}g2} } } } }$ 基于虛擬樣本輸入, 結合RF、RWNN映射模型生成的虛擬樣本輸出 28 ${ {\boldsymbol{R} } }_{ { {\rm{vs\text{-}g1} } } }^p, { {\boldsymbol{R} } }_{ { {\rm{vs\text{-}g2} } } }^p$ 基于虛擬樣本輸入, 結合RF、RWNN 映射模型生成的虛擬樣本 29 $\mathop { {\boldsymbol{R} } }\nolimits_{_{ { {\rm{vs\text{-}g} } } }}$ 生成的混合虛擬樣本 30 $\mathop { {\boldsymbol{R} } }\nolimits_{_{ { {\rm{vs\text{-}d} } } } }$ 對$\mathop { {\boldsymbol{R} } }\nolimits_{_{ { {\rm{vs\text{-}g} } } }}$進行刪減后的候選虛擬樣本 31 $\mathop { {\boldsymbol{R} } }\nolimits_{_{ { {\rm{vs\text{-}s} } } }}$ 對候選虛擬樣本進行選擇后獲得的虛擬樣本 32 $\mathop {{\boldsymbol{R}}}\nolimits_{_{{{\rm{valid}}}}} $ 原始小樣本驗證集 33 $\mathop {{\boldsymbol{R}}}\nolimits_{_{{{\rm{vs}}}}} $ 最優虛擬樣本 34 ${f_{{{\rm{num}}}}}({{\boldsymbol{z}}})$ 多目標優化問題的目標之一, 篩選后的虛擬樣本數量 35 ${f_{{{\rm{mod}}}}}({{\boldsymbol{z}}})$ 多目標優化問題的目標之一, 篩選后的虛擬樣本與原始訓練集構建RF模型的性能指標 36 $z_{{\rm{MTD}}}$ 粒子的參數決策變量之一, 對應基于MTD方法的擴展率${\gamma _{{{\rm{extend}}}}}$ 37 $z_{{{\rm{RF}}}}^{{\rm{1}}}$ 粒子的參數決策變量之一, 對應RF映射模型的切分特征數${L_F}$ 38 $z_{{{\rm{RF}}}}^{{\rm{2}}}$ 粒子的參數決策變量之一, 對應RF映射模型中決策樹的中葉節點包含樣本數量的閾值${\theta _{{{\rm{leaf}}}}}$ 39 ${z_{{{\rm{RWNN}}}}}$ 粒子的參數決策變量之一, 對應RWNN映射模型的隱含層神經元數量$I$ 40 ${\gamma _{{{\rm{extend}}}}}$ 基于MTD方法的擴展率 41 $I$ RWNN映射模型的隱含層神經元數量 42 $\mathop {{\boldsymbol{X}}}\nolimits_{_{{{\rm{train}}}}} $ 原始小樣本訓練集輸入 43 ${{{\boldsymbol{y}}}_{{{\rm{train}}}}}$ 原始小樣本訓練集輸出 44 ${y_{{{\rm{ave}}}}}$ ${{{\boldsymbol{y}}}_{{{\rm{train}}}}}$的均值 45 ${{{\boldsymbol{y}}}_{{{\rm{high}}}}}, {{{\boldsymbol{y}}}_{{{\rm{low}}}}}$ $\mathop {{\boldsymbol{X}}}\nolimits_{_{{{\rm{train}}}}} $中大于/小于${y_{{{\rm{ave}}}}}$的輸出集合 46 ${y_{{{\rm{max}}}}}, {y_{{{\rm{min}}}}}$ ${{{\boldsymbol{y}}}_{{{\rm{train}}}}}$中最大值、最小值 47 ${y_{ { {\rm{H\text{-}ave} } } } }, {y_{ { {\rm{L\text{-}ave} } } } }$ ${{{\boldsymbol{y}}}_{{{\rm{high}}}}}, {{{\boldsymbol{y}}}_{{{\rm{low}}}}}$的均值 48 $\mathop N\nolimits_{_{{{\rm{equal}}}}} , \mathop N\nolimits_{_{{{\rm{rand}}}}} $ 等間隔插值、隨機插值倍數 49 ${{\boldsymbol{W}}}, {{\boldsymbol}}$ RWNN模型輸入層與隱含層間神經元的連接權重與偏置 50 ${{\boldsymbol{H}}}_{}^{_{{{\rm{ori}}}}}$ RWNN模型隱含層輸出矩陣 51 ${{\boldsymbol{\beta}}}$ RWNN模型隱含層與輸出層神經元的連接權重 52 $\mathop N\nolimits_{_{ { {\rm{vs\text{-}g} } } } } \mathop {, N}\nolimits_{{ { {\rm{vs\text{-}d} } } } } , \mathop N\nolimits_{_{ { {\rm{vs\text{-}s} } } } }$ 生成、候選、選擇后虛擬樣本的數量 53 ${\theta _{{{\rm{select}}}}}$ 虛擬樣本的選擇閾值 54 ${{{\boldsymbol{\tilde z}}}_{{{\rm{vss}}}}}$ 對${{{\boldsymbol{z}}}_{{{\rm{vss}}}}}$進行變維度處理后獲得 55 $F$ 使用虛擬樣本集$\mathop { {\boldsymbol{R} } }\nolimits_{_{ { {\rm{vs\text{-}s} } } }}$的建模性能指標 56 $\mathop {{\boldsymbol{R}}}\nolimits_{_{{{\rm{mix}}}}} $ 原始訓練集$\mathop {{\boldsymbol{R}}}\nolimits_{_{{{\rm{train}}}}} $與$\mathop { {\boldsymbol{R} } }\nolimits_{_{ { {\rm{vs\text{-}s} } } }}$的混合樣本集 57 ${P_{{{\rm{num}}}}}$ 種群中粒子數量 58 ${N_{{{\rm{iter}}}}}$ 種群迭代次數 59 ${{\boldsymbol{A}}}$ 種群的外部檔案, 保存非支配解 表 2 基準數據集劃分
Table 2 Benchmark data set partitioning
數據集 特征數 訓練集 驗證集 測試集 數據集編號 數量 $\eta $ 數量 $\eta $ 數量 $\eta $ 混凝土抗壓強度 8 20 0.3327 20 0.3598 100 0.1255 A1 40 0.2444 40 0.2628 A2 60 0.1853 60 0.2070 A3 超導臨界溫度 81 20 0.3351 20 0.3388 100 0.1538 B1 40 0.2309 40 0.2423 B2 60 0.1949 60 0.1966 B3 表 3 基準數據基于多目標PSO混合優化的VSG參數設定
Table 3 Parameter setting of VSG based on hybrid optimization with multi-objective PSO for benchmark data
數據集 ${P_{{{\rm{num}}}}}$ ${N_{{{\rm{iter}}}}}$ ${N_{{{\rm{refresh}}}}}$ $K$ ${z_{{{\rm{MTD}}}}}$ $z_{{{\rm{RF}}}}^{{\rm{1}}}$ $z_{{{\rm{RF}}}}^{{\rm{2}}}$ ${z_{{{\rm{RWNN}}}}}$ 混凝土抗壓強度 30 30 3 30 (0, 1) (1, 6) (2, 10) (3, 20) 超導臨界溫度 30 30 3 50 (0, 1) (1, 30) (2, 10) (3, 20) 表 4 基準數據基于多目標PSO混合優化獲得的最優虛擬樣本
Table 4 Optimal virtual samples obtained based on multi-objective PSO hybrid optimization for benchmark data
數據集 ${\mathop {{\boldsymbol{X}}}\nolimits_{_{{{\rm{vs}}}}} }$ ${y_{{{\rm{vs}}}}}$ A1 396.50 117.40 0 176.40 11.42 876.70 796.90 60.23 58.83 200.50 16.35 115.80 161.60 8.27 1071.70 809.90 17.23 29.23 240.90 0 100.30 183.50 5.87 977.30 852.40 14.00 18.25 272.40 56.58 0 199.00 0 965.00 786.90 37.38 12.62 347.40 0 0 190.80 0 1116.40 718.20 15.08 3.42 B1 5.69 95.64 60.78 69.89 36.85 1.48 1.41 182.20 26.79 4.08 77.39 51.82 60.19 35.09 1.22 1.27 121.40 95.32 4.00 76.44 50.35 59.37 34.71 1.20 1.29 121.30 80.12 4.46 82.72 56.99 64.52 36.03 1.30 1.09 131.20 51.89 3.54 83.97 60.06 66.37 43.11 1.07 0.97 99.90 6.38 表 5 基準數據原始樣本輸入/輸出范圍
Table 5 Input/output range of original samples for benchmark data
數據集 輸入 輸出 A1 最小值 102.0 0 0 121.8 0 801.0 594.0 1.0 2.3 最大值 540.0 359.4 200.1 247.0 32.2 1145.0 992.6 365.0 82.6 B1 最小值 1.0 6.9 6.4 5.3 2.0 0 0 0 0 最大值 9.0 209.0 209.0 209.0 209.0 2.0 2.0 208.0 185.0 表 6 基準數據基于多目標PSO混合優化的全局最優解的統計結果
Table 6 Statistical results of global optimal solution based on hybrid optimization with multi-objective PSO for benchmark data
數據集 超參數 虛擬樣本數量 驗證集 測試集 混合樣本$\eta $ ${\gamma _{{{\rm{extend}}}}}$ ${L_F}$ ${\theta _{{{\rm{leaf}}}}}$ $I$ 平均${\rm{RMSE}}$ 平均$\rho $ 平均${\rm{RMSE}}$ 平均$\rho $ A1 0.6033 3 9 18 82 10.36 0.026 11.59 0.012 0.2354 A2 0.6245 6 5 19 128 10.03 0.012 10.73 0.003 0.2099 A3 0.6528 6 9 20 150 10.40 0.006 10.28 0.002 0.2002 B1 0.3951 5 5 16 20 16.44 0.300 19.07 0.169 0.2407 B2 0.4892 8 6 14 69 20.14 0.019 17.86 0.051 0.2118 B3 0.6775 19 6 15 70 19.57 0 18.05 0.023 0.2076 表 7 基準數據不同VSG方法的對比統計結果
Table 7 Comparative statistical results of different VSG methods for benchmark data
數據集 方法 虛擬樣本數量 混合樣本$\eta $ 測試${\rm{RMSE} }$ 測試$\rho $ 均值 方差 最優 均值$(\times{10^{ - 3} })$ 方差$(\times{10^{ - 4} })$ 最優$(\times{10^{ - 3} })$ A1 N-VSG 219 0.2770 16.47 8.785 14.11 4.09 15.44 4.62 M-VSG 238 0.3018 17.08 8.575 13.65 2.26 19.73 4.55 PSO-VSG 55 0.4235 16.35 3.822 12.75 3.76 30.20 5.88 MP-VSG 165 0.2641 14.03 4.525 12.93 6.04 9.93 7.19 MoHo-VSG 82 0.2354 11.59 0.107 9.67 12.46 1.34 14.72 B1 N-VSG 176 0.2945 24.38 10.541 21.96 13.87 17.96 14.25 M-VSG 281 0.3100 25.33 12.786 20.12 12.63 56.11 14.12 PSO-VSG 36 0.3317 26.11 17.710 20.38 1.69 71.20 8.23 MP-VSG 134 0.2513 20.84 3.452 19.47 17.43 4.37 18.89 MoHo-VSG 20 0.2076 18.05 0.062 17.84 169.26 1.57 178.69 表 8 DXN數據基于多目標PSO混合優化的VSG算法參數設定
Table 8 Parameter setting of VSG algorithm based on multi-objective PSO hybrid optimization for DXN data
參數 ${P_{{{\rm{num}}}}}$ ${N_{{{\rm{iter}}}}}$ ${N_{{{\rm{refresh}}}}}$ $K$ ${z_{{{\rm{MTD}}}}}$ $z_{{{\rm{RF}}}}^{{\rm{1}}}$ $z_{{{\rm{RF}}}}^{{\rm{2}}}$ ${z_{{{\rm{RWNN}}}}}$ 數據 30 30 3 50 (0, 1) (1, 35) (2, 10) (3, 20) 表 9 DXN數據基于多目標PSO混合優化獲得的最優虛擬樣本
Table 9 Optimal virtual samples obtained based on multi-objective PSO hybrid optimization for DXN data
${\mathop {{\boldsymbol{X}}}\nolimits_{_{{{\rm{vs}}}}} }$ ${y_{{{\rm{vs}}}}}$ 4.366 1.54 68.78 27.31 241.4 3.96 334.7 0.0289 4.206 0 68.94 28.15 222.5 3.77 306.8 0.0458 4.449 7.69 72.48 30.23 222.8 3.98 315.8 0.0685 4.432 10.00 71.83 30.00 225.9 3.99 319.5 0.0163 4.461 17.69 74.65 30.77 228.5 3.99 321.8 0.0029 表 10 DXN數據面向VSG的多目標PSO混合優化全局最優解
Table 10 DXN data for VSG-oriented multi-objective PSO hybrid optimization global optimal solution
性能指標 最優解 超參數${\gamma _{{{\rm{extend}}}}}$ 0.1206 超參數${L_F}$ 2 超參數${\theta _{{{\rm{leaf}}}}}$ 5 超參數$I$ 15 虛擬樣本數量 40 驗證集的平均${\rm{RMSE}}$ 0.0231 驗證集的平均$\rho $ 4.41 ×${10^{ - 5}}$ 測試集的平均${\rm{RMSE}}$ 0.0238 測試集的平均$\rho $ 3.18 ×${10^{ - 5}}$ 驗證集, 小樣本建模的${\rm{RMSE}}$ 0.0259 測試集, 小樣本建模的${\rm{RMSE}}$ 0.0251 表 11 DXN數據的不同VSG方法對比統計結果
Table 11 Comparative statistical results of different VSG methods based on DXN dataset
方法 虛擬樣本
數量測試集的${\rm{RMSE} }$ 測試集的$\rho $ 均值 方差$(\times {10^{ - 4} })$ 最優 均值$(\times {10^{ - 5} })$ 方差 最優$(\times{10^{ - 5} })$ N-VSG 129 0.0406 0.695 0.0262 0.19 1.94 ×${10^{ - 5}}$ 0.36 M-VSG 116 0.0403 1.331 0.0231 0.26 8.83 ×${10^{ - 5}}$ 0.53 PSO-VSG 27 0.0328 0.519 0.0245 0.56 8.44 ×${10^{ - 5}}$ 1.02 MP-VSG 68 0.0377 1.208 0.0218 1.04 5.16 ×${10^{ - 7}}$ 1.78 MoHo-VSG 40 0.0231 0.691 0.0220 3.18 4.47 ×${10^{ - 9}}$ 3.45 亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] 喬俊飛, 郭子豪, 湯健. 面向城市固廢焚燒過程的二噁英排放濃度檢測方法綜述. 自動化學報, 2020, 46(6): 1063?1089 doi: 10.16383/j.aas.c190005Qiao Jun-Fei, Guo Zi-Hao, Tang Jian. A review on the determination of dioxin emission concentration in municipal solid waste incineration process. Acta Automatica Sinica, 2020, 46(6): 1063?1089 doi: 10.16383/j.aas.c190005 [2] 柴天佑. 工業過程控制系統研究現狀與發展方向. 中國科學: 信息科學, 2016, 46(8): 1003?1015 doi: 10.1360/N112016-00062Chai Tian-You. Industrial process control systems: Research status and development direction. Scientia Sinica Informationis, 2016, 46(8): 1003?1015 doi: 10.1360/N112016-00062 [3] Arafat H A, Jijakli K, Ahsan A. Environmental performance and energy recovery potential of five processes for municipal solid waste treatment. Journal of Cleaner Production, 2015, 105: 233?240 doi: 10.1016/j.jclepro.2013.11.071 [4] Zhou H, Meng A, Long Y Q, Li Q H, Zhang Y G. A review of dioxin-related substances during municipal solid waste incineration. Waste Management, 2015, 36(8): 106?118 [5] Jones P H, Degerlache J, Marti E, Mischer G, Niessen H J. The global exposure of man to dioxins: A perspective on industrial-waste incineration. Chemosphere, 1993, 26: 1491?1497 doi: 10.1016/0045-6535(93)90216-R [6] 湯健, 喬俊飛. 基于選擇性集成核學習算法的固廢焚燒過程二噁英排放濃度軟測量. 化工學報, 2019, 70(2): 696?706 doi: 10.11949/j.issn.0438-1157.20181354Tang Jian, Qiao Jun-Fei. Soft sensor of dioxin emission concentration in solid waste incineration process based on selective ensemble kernel learning algorithm. Journal of Chemical Engineering and Technology, 2019, 70(2): 696?706 doi: 10.11949/j.issn.0438-1157.20181354 [7] He A, Li T, Li N, Wang K, Fu H. CABNet: Category attention block for imbalanced diabetic retinopathy grading. IEEE Transactions on Medical Imaging, 2021, 40(1): 143?153 doi: 10.1109/TMI.2020.3023463 [8] Wang Q, Wang K, Li Q, Yang Z, Jin G, Wang H. MBNN: A multi-branch neural network capable of utilizing industrial sample unbalance for fast inference. IEEE Sensors Journal, 2021, 21(2): 1809?1819 doi: 10.1109/JSEN.2020.3017686 [9] 湯健, 喬俊飛, 柴天佑, 劉卓, 吳志偉. 基于虛擬樣本生成技術的多組分機械信號建模. 自動化學報, 2018, 44(9): 1569?1589 doi: 10.16383/j.aas.2017.c170204Tang Jian, Qiao Jun-Fei, Chai Tian-You, Liu Zhuo, Wu Zhi-Wei. Multi-component mechanical signal modeling based on virtual sample generation technology. Acta Automatica Sinica, 2018, 44(9): 1569?1589 doi: 10.16383/j.aas.2017.c170204 [10] Lin Y S, Li D C. The generalized-trend-diffusion modeling algorithm for small data sets in the early stages of manufacturing systems. European Journal of Operational Research, 2010, 207(1): 121?130 doi: 10.1016/j.ejor.2010.03.026 [11] Zhu Q X, Chen Z, Zhang X H, Rajabifard A, Chen Y. Dealing with small sample size problems in process industry using virtual sample generation: A Kriging-based approach. Soft Computing, 2020, 24(9): 6889?6902 doi: 10.1007/s00500-019-04326-3 [12] Zhang T, Chen J, Xie J, Pan T. SASLN: Signals augmented self-taught learning networks for mechanical fault diagnosis under small sample condition. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1?11 [13] Poggio T, Vetter T. Recognition and structure from one 2D model view: Observations on-prototypes, object classes and symmetries. Laboratory Massachusetts Institute of Technology, 1992: Article No. 1347 [14] Li D C, Lin L S, Chen C C, Yu W H. Using virtual samples to improve learning performance for small datasets with multi-modal distributions. Soft Computing, 2019, 23(22): 11883?11900 doi: 10.1007/s00500-018-03744-z [15] Niyogi P, Girosi F, Poggio T. Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 1998, 86(11): 2196?2209 doi: 10.1109/5.726787 [16] Li D C, Hsu H C, Tsai T I, Te J L, Susan C H. A new method to help diagnose cancers for small sample size. Expert Systems With Applications, 2007, 33(2): 420?424 doi: 10.1016/j.eswa.2006.05.028 [17] Zhu Y, Yao J. A novel reliability assessment method based on virtual sample generation and failure physical model. In: Proceedings of the 12th International Conference on Reliability, Maintainability, and Safety. Shanghai, China: 2018. 99?102 [18] Schlkopf B, Simard P, Smola A J, Vapnik V. Prior knowledge in support vector kernels. In: Proceedings of Neural Information Processing Systems. Denver, USA: 1997. 640?646 [19] Cai W D, Ma B, Zhang L, Han Y M. A pointer meter recognition method based on virtual sample generation technology. Measurement, 2020, 163: Article No. 107962 doi: 10.1016/j.measurement.2020.107962 [20] Gang H, Yuan X, Wei Z, Shi Y. An effective method for face recognition by creating virtual training samples based on pixel processing. In: Proceedings of the 10th International Conference on Intelligent Human-Machine Systems and Cybernetics. Hangzhou, China: 2018. 177?180 [21] Luo J, Tjahjadi T. Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation. IEEE Access, 2020, 8: 32485?32501 doi: 10.1109/ACCESS.2020.2973898 [22] Li D C, Lin Y S. Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 2006, 175(1): 413? 434 doi: 10.1016/j.ejor.2005.05.005 [23] Li D C, Lin L S. A new approach to assess product lifetime performance for small data sets. European Journal of Operational Research, 2013, 230(2): 290?298 doi: 10.1016/j.ejor.2013.04.016 [24] Lin L S, Li D C, Yu W H, Hsueh Y M. Generating multi-modality virtual samples with soft DBSCAN for small dataset learning. In: Proceedings of the 3rd International Conference on App-lied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence. Okaya-ma, Japan: 2015. 363?368 [25] Zhang X H, Xu Y, He Y L, Zhu Q X. Novel manifold learning based virtual sample generation for optimizing soft sensor with small data. ISA Transactions, 2021, 109: 229?241 doi: 10.1016/j.isatra.2020.10.006 [26] Chen Z S, Zhu Q X, Xu Y, He Y L, Nagy Z K. Integrating virtual sample generation with input-training neural network for solving small sample size problems: Application to purified terephthalic acid solvent system. Soft Computing, 2021, 25(8): 6489?6504 [27] Li D C, Chen C C, Chang C J, Lin W K. A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Systems With Applications, 2012, 39(1): 1575?1581 doi: 10.1016/j.eswa.2011.08.071 [28] Zhu B, Chen Z S, Yu L A. A novel small sample mage-trend-diffusion technology. Journal of Chemical Industry and Technology, 2016, 67(3): 820?826 [29] He Y L, Wang P J, Zhang M Q, Zhu Q X, Xu Y A. A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: A case study of ethylene industry. Energy, 2018, 147: 418?427 doi: 10.1016/j.energy.2018.01.059 [30] 朱寶, 喬俊飛. 基于AANN特征縮放的虛擬樣本生成方法及其過程建模應用. 計算機與應用化學, 2019, 36(4): 304?307 doi: 10.16866/j.com.app.chem201904002Zhu Bao, Qiao Jun-Fei. Virtual sample generation method based on AANN feature scaling and its process modeling application. Computer and Applied Chemistry, 2019, 36(4): 304?307 doi: 10.16866/j.com.app.chem201904002 [31] Qiao J F, Guo Z H, Tang J. Virtual sample generation method based on improved megatrend diffusion and hidden layer interpolation and its application. Journal of Chemical Industry and Engineering, 2020, 71(12): 5681?5695 [32] Tang J, Jia M, Liu Z, Chai T Y, Yu W. Modeling high dimensional frequency spectral data based on virtual sample generation technique. In: Proceedings of the International Conference on Information and Automation. Lijiang, China: 2015. 1090? 1095 [33] Li D C, Wen I. A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing, 2014, 143: 222?230 doi: 10.1016/j.neucom.2014.06.004 [34] Chen Z S, Zhu B, He Y L, Yu L A. A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Engineering Applications of Artificial Intelligence, 2016, 59: 236?243 [35] 湯健, 王丹丹, 郭子豪, 喬俊飛. 基于虛擬樣本優化選擇的城市固廢焚燒過程二噁英排放濃度預測. 北京工業大學學報, 2021, 47(5): 431?443Tang Jian, Wang Dan-Dan, Guo Zi-Hao, Qiao Jun-Fei. Prediction of dioxin emission concentration in urban solid waste incineration process based on virtual sample optimization selection. Journal of Beijing University of Technology, 2021, 47(5): 431?443 [36] 湯健, 夏恒, 喬俊飛, 郭子豪. 深度集成森林回歸建模方法及應用研究. 北京工業大學學報, 2021, 47(11): 1219?1229Tang Jian, Xia Heng, Qiao Jun-Fei, Guo Zi-Hao. Research on deeply integrated forest regression modeling method and its application. Journal of Beijing University of Technology, 2021, 47(11): 1219?1229 [37] Liang J J, Qin A K, Suganthan P N, Baskar S. Comprehensive learning particle swarm optimizer for global optimization of multi-modal functions. IEEE Transactions on Evolutionary Computation, 2006, 10(3): 281?295 doi: 10.1109/TEVC.2005.857610 [38] Tang J, Zhang J, Yu G, Zhang W P, Yu W. Multi-source latent feature selective ensemble modeling approach for small-sample high-dimension process data in application. IEEE Access, 2020, 8: 148475?148488 doi: 10.1109/ACCESS.2020.3015875 [39] 林越, 劉廷章, 王哲河. 具有兩類上限條件的虛擬樣本生成數量優化. 廣西師范大學學報(自然科學版), 2019, 37(1): 142?148 doi: 10.16088/j.issn.1001-6600.2019.01.016Lin Yue, Liu Ting-Zhang, Wang Zhe-He. Optimization of virtual sample generating quantity with two kinds of upper limit conditions. Journal of Guangxi Normal University (Natural Science Edition), 2019, 37(1): 142?148 doi: 10.16088/j.issn.1001-6600.2019.01.016 [40] Vallejo M, Espriella C, Gómez-Santamaría J, Ramírez-Barrera A F, Delgado-Trejos E. Soft metrology based on machine learning: A review. Measurement Science and Technology, 2020, 31(3): Article No. 32001 doi: 10.1088/1361-6501/ab4b39 [41] 湯健, 喬俊飛, 徐喆, 郭子豪. 基于特征約簡與選擇性集成算法的城市固廢焚燒過程二噁英排放濃度軟測量. 控制理論與應用, 2021, 38(1): 110?120Tang Jian, Qiao Jun-Fei, Xu Zhe, Guo Zi-Hao. Soft measurement of dioxin emission concentration in municipal solid waste incineration process based on feature reduction and selective integration algorithm. Control Theory & Applications, 2021, 38(1): 110?120 [42] Zhong K, Han M, Han B. Data-driven based fault prognosis for industrial systems: A concise overview. IEEE/CAA Journal of Automatica Sinica, 2020, 7(2): 330?345 doi: 10.1109/JAS.2019.1911804 [43] 朱寶. 虛擬樣本生成技術及建模應用研究[博士論文], 北京化工大學, 中國, 2017.Zhu Bao. Virtual Sample Generation Technology and Modeling Application [Ph.D. dissertation], Beijing University of Chemical Technology, China, 2017. [44] Li D C, Lin L S, Peng L J. Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency. Decision Support Systems, 2014, 59: 286?295 doi: 10.1016/j.dss.2013.12.007 [45] Chen Z S, Zhu B, He Y L, Yu L A. A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Engineering Applications of Artificial Intelligence, 2017, 59: 236?243 doi: 10.1016/j.engappai.2016.12.024 [46] Wang Y Q, Wang Z Y, Sun J Y, Zhang J J, Zissimos M. Gray bootstrap method for estimating frequency-varying random vibration signals with small samples. Chinese Journal of Aeronautics, 2014, 27(2): 383?389 doi: 10.1016/j.cja.2013.07.023 [47] Hong W C, Li M W, Geng J, Zhang Y. Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Applied Mathematical Modelling, 2019, 72: 425?443 doi: 10.1016/j.apm.2019.03.031 [48] Bloch G, Lauer F, Colin G, Chamaillard Y. Support vector regression from simulation data and few experimental samples. Information Sciences, 2008, 178(20): 3813?3827 doi: 10.1016/j.ins.2008.05.016 [49] Thomas P T, Edward A P. Small sample reliability growth modeling using a grey systems model. Grey Systems Theory and Application, 2018, 8(3): 246?271 doi: 10.1108/GS-02-2018-0011 [50] Shapiai M I, Ibrahim Z, Khalid M, Jau L W, Pavlovic V, Watada J. Function and surface approximation based on enhanced kernel regression for small sample set. International Journal of Innovative Computing, Information & Control: IJICIC, 2011, 7(10): 5947?5960 [51] Dai Z, Wei H, Li X, Lv M. Validation of issile simulation model based on Bayesian theory with extreme small sample. In: Proceedings of the 3rd International Conference on Electron Device and Mechanical Engineering. Suzhou, China: 2020. 683?686 [52] Hou Y, Zheng E, Guo W, Xiao Q, Xu Z. Learning Bayesian network parameters with small data set: A parameter extension under constraints method. IEEE Access, 2020, 8: 24979?24989 doi: 10.1109/ACCESS.2020.2971099 [53] 于旭, 楊靜, 謝志強. 虛擬樣本生成技術研究. 計算機科學, 2011, 38(3): 16?19 doi: 10.3969/j.issn.1002-137X.2011.03.004Yu Xu, Yang Jing, Xie Zhi-Qiang. Research on virtual sample generation technology. Computer Science, 2011, 38(3): 16?19 doi: 10.3969/j.issn.1002-137X.2011.03.004 [54] Bunsan S, Chen W Y, Chen H W, Grisdanurak N. Modeling the dioxin emission of a municipal solid waste incinerator using neural networks. Chemosphere, 2013, 92: 258?264 doi: 10.1016/j.chemosphere.2013.01.083 [55] Xiao X D, Lu J W, Hai J. Prediction of dioxin emissions in flue gas from waste incineration based on support vector regression. Renewable Energy Resources, 2017, 35(8): 1107?1114 [56] 喬俊飛, 郭子豪, 湯健. 基于多層特征選擇的固廢焚燒過程二噁英排放濃度軟測量. 信息與控制, 2021, 50(1): 75?87 doi: 10.13976/j.cnki.xk.2021.9663Qiao Jun-Fei, Guo Zi-Hao, Tang Jian. Soft sensing of dioxin emission concentration in solid waste incineration process based on multi-layer feature selection. Information and Control, 2021, 50(1): 75?87 doi: 10.13976/j.cnki.xk.2021.9663