基于光流與多尺度上下文的圖像序列運動(dòng)遮擋檢測
doi: 10.16383/j.aas.c210324 cstr: 32138.14.j.aas.c210324
-
1.
南昌航空大學(xué)測試與光電工程學(xué)院 南昌 330063
-
2.
中國科學(xué)院自動(dòng)化研究所模式識別國家重點(diǎn)實(shí)驗室 北京 100190
-
3.
南昌航空大學(xué)信息工程學(xué)院 南昌 330063
Occlusion Detection Based on Optical Flow and Multiscale Context
-
1.
School of Measuring and Optical Engineering, Nanchang Hangkong University, Nanchang 330063
-
2.
National Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
-
3.
School of Information Engineering, Nanchang Hangkong University, Nanchang 330063
-
摘要: 針對非剛性運動(dòng)和大位移場(chǎng)景下運動(dòng)遮擋檢測的準確性與魯棒性問(wèn)題, 提出一種基于光流與多尺度上下文的圖像序列運動(dòng)遮擋檢測方法. 首先, 設計基于擴張卷積的多尺度上下文信息聚合網(wǎng)絡(luò ), 通過(guò)圖像序列多尺度上下文信息獲取更大范圍的圖像特征; 然后, 采用特征金字塔構建基于多尺度上下文與光流的端到端運動(dòng)遮擋檢測網(wǎng)絡(luò )模型, 利用光流優(yōu)化非剛性運動(dòng)和大位移區域的運動(dòng)檢測遮擋信息; 最后, 構造基于運動(dòng)邊緣的網(wǎng)絡(luò )模型訓練損失函數, 獲取準確的運動(dòng)遮擋邊界. 分別采用MPI-Sintel和KITTI測試數據集對所提方法與現有的代表性方法進(jìn)行實(shí)驗對比與分析. 實(shí)驗結果表明, 所提方法能夠有效提高運動(dòng)遮擋檢測的準確性和魯棒性, 尤其在非剛性運動(dòng)和大位移等困難場(chǎng)景下具有更好的遮擋檢測魯棒性.
-
關(guān)鍵詞:
- 圖像序列 /
- 遮擋檢測 /
- 深度學(xué)習 /
- 多尺度上下文 /
- 非剛性運動(dòng)
Abstract: In order to improve the accuracy and robustness of occlusion detection under non-rigid motion and large displacements, we propose an occlusion detection method of image sequence motion based on optical flow and multiscale context. First, we design a multiscale context information aggregation network based on dilated convolution which obtains a wider range of image features through multiscale context information of image sequence. Then, we construct an end-to-end motion occlusion detection network model based on multiscale context and optical flow using feature pyramid, utilize the optical flow to optimize the performance of occlusion detection in areas of non-rigid motion and large displacements region. Finally, we present a novel motion edge training loss function to obtain the accurate motion occlusion boundary. We compare and analysis our method with the existing representative approaches by using the MPI-Sintel datasets and KITTI datasets, respectively. The experimental results show that the proposed method can effectively improve the accuracy and robustness of motion occlusion detection, especially gains the better occlusion detection robustness under non-rigid motion and large displacements.-
Key words:
- Image sequence /
- occlusion detection /
- deep learning /
- multiscale context /
- non-rigid motion
-
圖 3 多尺度上下文信息聚合網(wǎng)絡(luò )結構示意圖
Fig. 3 Structure diagram of multiscale context information aggregation network
圖 5 基于光流和多尺度上下文信息的遮擋檢測模型結構
Fig. 5 The structure of the occlusion detection model based on optical flow and multiscale context information
圖 6 本文方法和IRR-PWC方法遮擋檢測結果對比
Fig. 6 Comparison of occlusion detection results between our method and IRR-PWC method
圖 7 MPI-Sintel數據集非剛性運動(dòng)與大位移序列遮擋檢測結果對比圖. 從左至右分別是:alley_2、ambush_2、market_6以及temple_2序列
Fig. 7 Comparison results of occlusion detection between non-rigid motion and large displacement sequences on MPI-Sintel dataset. From left to right are alley_2, ambush_2, market_6, and temple_2 sequence
圖 8 各個(gè)遮擋檢測方法在KITTI數據集上的遮擋檢測結果對比圖. 從左至右分別是輸入圖像和Unflow、Back2Future、MaskFlownet、IRR-PWC以及本文方法的運動(dòng)遮擋檢測圖
Fig. 8 Comparison of occlusion detection results of each occlusion detection method on KITTI dataset. From left to right are the input image, Unflow, Back2Future, MaskFlownet, IRR-PWC and our method
圖 9 利用光流真實(shí)值生成的運動(dòng)遮擋掩膜部分示例圖$(N=3) $
Fig. 9 Examples of motion occlusion mask generated by ground truth of optical flow $(N=3 )$
表 2 MPI-Sintel數據集平均漏檢率與誤檢率對比結果(%)
Table 2 Comparison of average omission rate and false rate on MPI-Sintel dataset (%)
下載: 導出CSV表 3 非剛性運動(dòng)與大位移圖像序列運動(dòng)遮擋檢測平均F1分數對比結果
Table 3 Comparison of average F1 scores of motion occlusion detection between non-rigid motion and large displacement image sequences
對比方法 clean final alley_2 ambush_2 market_6 temple_2 alley_2 ambush_2 market_6 temple_2 Unflow[24] 0.414 9 0.431 3 0.433 0 0.324 3 0.405 7 0.392 0 0.449 9 0.312 0 Back2Future[25] 0.681 6 0.588 8 0.629 0 0.271 2 0.675 6 0.519 9 0.623 9 0.268 3 MaskFlownet[27] 0.505 7 0.540 3 0.466 0 0.383 8 0.503 9 0.408 5 0.473 5 0.350 8 IRR-PWC[26] 0.870 9 0.917 2 0.815 5 0.740 4 0.877 0 0.780 9 0.802 3 0.690 5 本文方法 0.881 1 0.921 6 0.830 4 0.774 7 0.876 4 0.795 9 0.810 6 0.710 3 注: 加粗字體表示各列最優(yōu)結果. 下載: 導出CSV表 5 MPI-Sintel全圖像序列平均F1分數對比
Table 5 Comparison of average F1 scores of whole image sequence on MPI-Sintel
模型類(lèi)型 MPI-Sintel 訓練數據集 clean final 運行時(shí)間(s) 訓練時(shí)間(d) 全模型 0.75 0.72 0.19 13 去除多尺度上下文網(wǎng)絡(luò ) 0.72 0.68 0.18 12 去除邊緣損失函數 0.74 0.71 0.19 13 注: 加粗字體表示評價(jià)最優(yōu)值. 下載: 導出CSV表 6 MPI-Sintel全圖像序列在不同運動(dòng)邊界區域內的平均F1分數對比
Table 6 Comparison of average F1 scores of whole image sequence in different motion boundary regions on MPI-Sintel
模型類(lèi)型 MPI-Sintel 訓練數據集 clean final $N=1 $ $N=3 $ $N=5 $ $N=10 $ $N=1 $ $N=3 $ $N=5 $ $N=10 $ 全模型 0.63 0.67 0.69 0.71 0.59 0.62 0.64 0.67 去除多尺度上下文網(wǎng)絡(luò ) 0.59 0.62 0.65 0.67 0.55 0.59 0.61 0.63 去除邊緣損失函數 0.60 0.64 0.67 0.69 0.56 0.60 0.62 0.64 注: 加粗字體表示評價(jià)最優(yōu)值. 下載: 導出CSV亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] 張世輝, 何琦, 董利健, 杜雪哲. 基于遮擋區域建模和目標運動(dòng)估計的動(dòng)態(tài)遮擋規避方法. 自動(dòng)化學(xué)報, 2019, 45(4): 771?786Zhang Shi-Hui, He Qi, Dong Li-Jian, Du Xue-Zhe. Dynamic occlusion avoidance approach by means of occlusion region model and object motion estimation. Acta Automatica Sinica, 2019, 45(4): 771?786 [2] Yu C, Bo Y, Bo W, Yan W D, Robby T. Occlusion-aware networks for 3D human pose estimation in video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE, 2019. 723−732 [3] 張聰炫, 陳震, 熊帆, 黎明, 葛利躍, 陳昊. 非剛性稠密匹配大位移運動(dòng)光流估計. 電子學(xué)報, 2019, 47(6): 1316?1323 doi: 10.3969/j.issn.0372-2112.2019.06.019Zhang Cong-Xuan, Chen Zhen, Xiong Fan, Li Ming, Ge Li-Yue, Chen Hao. Large displacement motion optical flow estimation with non-rigid dense patch matching. Acta Electronica Sinica, 2019, 47(6): 1316?1323 doi: 10.3969/j.issn.0372-2112.2019.06.019 [4] 姚乃明, 郭清沛, 喬逢春, 陳輝, 王宏安. 基于生成式對抗網(wǎng)絡(luò )的魯棒人臉表情識別. 自動(dòng)化學(xué)報, 2018, 44(5): 865?877Yao Nai-Ming, Guo Qing-Pei, Qiao Feng-Chun, Chen Hui, Wang Hong-An. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 2018, 44(5): 865?877 [5] Pan J Y, Bo H. Robust occlusion handling in object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Minneapolis, USA: IEEE, 2007. 1?8 [6] 劉鑫, 許華榮, 胡占義. 基于 GPU 和 Kinect 的快速物體重建. 自動(dòng)化學(xué)報, 2012, 38(8): 1288?1297Liu Xin, Xu Hua-Rong, Hu Zhan-Yi. GPU based fast 3D-object modeling with Kinect. Acta Automatica Sinica, 2012, 38(8): 1288?1297 [7] 張聰炫, 陳震, 黎明. 單目圖像序列光流三維重建技術(shù)研究綜述. 電子學(xué)報, 2016, 44(12): 3044?3052 doi: 10.3969/j.issn.0372-2112.2016.12.033Zhang Cong-Xuan, Chen Zhen, Li Ming. Review of the 3D reconstruction technology based on optical flow of monocular image sequence. Acta Electronica Sinica, 2016, 44(12): 3044?3052 doi: 10.3969/j.issn.0372-2112.2016.12.033 [8] Bailer C, Taetz B, Stricker D. Flow fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1879?1892 doi: 10.1109/TPAMI.2018.2859970 [9] Wolf L, Gadot D. PatchBatch: A batch augmented loss for optical flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4236?4245 [10] Li Y S, Song R, Hu Y L. Efficient coarse-to-fine patch match for large displacement optical flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 5704?5712 [11] Menze M, Heipke C, Geiger A. Discrete optimization for optical flow. In: Proceedings of the German Conference on Pattern Recognition (GCPR). Aachen, Germany: Springer Press, 2015. 16?28 [12] Chen Q F, Koltun V. Full flow: Optical flow estimation by global optimization over regular grids. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4706?4714 [13] Guney F, Geiger A. Deep discrete flow. In: Proceedings of the Asian Conference on Computer Vision (ACCV). Taipei, China: Springer Press, 2016. 207?224 [14] Hur J, Roth S. Joint optical flow and temporally consistent semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 163?177 [15] Ince S, Konrad J. Occlusion-aware optical flow estimation. IEEE Transactions on Image Process, 2008, 17(8): 1443?1451 doi: 10.1109/TIP.2008.925381 [16] Sun D Q, Liu C, Pfister H. Local layering for joint motion estimation and occlusion detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA: IEEE, 2014. 1098?1105 [17] Sun D Q, Sudderth E B, Black M J. Layered image motion with explicit occlusions, temporal consistency, and depth ordering. In: Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada: Curran Associates Inc., 2010. 2226?2234 [18] Vogel C, Roth S, Schindler K. View-consistent 3D scene flow estimation over multiple frames. In: Proceedings of the European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer Press, 2014. 263?278 [19] Zanfir A, Sminchisescu C. Large displacement 3D scene flow with occlusion reasoning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 4417?4425 [20] Zhang C X, Chen Z, Wang M R, Li M, Jiang S F. Robust non-local TV-L1 optical flow estimation with occlusion detection. IEEE Transactions on Image Process, 2017, 26(8): 4055?4067 doi: 10.1109/TIP.2017.2712279 [21] 張聰炫, 陳震, 汪明潤, 黎明, 江少鋒. 基于光流與Delaunay三角網(wǎng)格的圖像序列運動(dòng)遮擋檢測. 電子學(xué)報, 2018, 46(2): 479?485 doi: 10.3969/j.issn.0372-2112.2018.02.030Zhang Cong-Xuan, Chen Zhen, Wang Ming-Run, Li Ming, Jiang Shao-Feng. Motion occlusion detecting from image sequence based on optical flow and Delaunay triangulation. Acta Electronica Sinica, 2018, 46(2): 479?485 doi: 10.3969/j.issn.0372-2112.2018.02.030 [22] Kennedy R, Taylor C J. Optical flow with geometric occlusion estimation and fusion of multiple frames. In: Proceedings of International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR). Hong Kong, China: IEEE, 2015. 364?377 [23] Yu J J, Harley A W, Derpanis K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Proceedings of the European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 3?10 [24] Meister S, Hur J, Roth S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI). San Francisco, USA: AAAI, 2017. 7251?7259 [25] Janai J, Güney F, Ranjan A, Black M, Geiger A. Unsupervised learning of multi-frame optical flow with occlusions. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 713?731 [26] Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5747?5756 [27] Zhao S Y, Sheng Y L, Dong Y, Chang E I C, Xu Y. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Virtual Event: IEEE, 2020. 6277?6286 [28] Yang M K, Yu K, Zhang C, Li Z W, Yang K Y. DenseASPP for semantic segmentation in street scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 3684?3692 [29] Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 561?580 [30] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 1?9 [31] Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 833?851 [32] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions [Online], available: https://arxiv.org/abs/1511.07122, Apr 30, 2016 [33] Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the European Conference on Computer Vision (ECCV). Florence, Italy: Springer, 2012. 611?625 [34] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 2061?3070