Large Displacement Optical Flow Estimation Jointing Depthwise Over-parameterized Convolution and Cross Correlation Attention
-
摘要: 針對現有深度學習光流估計模型在大位移場景下的準確性和魯棒性問題, 本文提出了一種聯合深度超參數卷積和交叉關聯注意力的圖像序列光流估計方法. 首先, 通過聯合深層卷積和標準卷積構建深度超參數卷積以替代普通卷積, 提取更多特征并加快光流估計網絡訓練的收斂速度, 在不增加網絡推理量的前提下提高光流估計的準確性; 然后, 設計基于交叉關聯注意力的特征提取編碼網絡, 通過疊加注意力層數獲得更大的感受野, 以提取多尺度長距離上下文特征信息, 增強大位移場景下光流估計的魯棒性; 最后, 采用金字塔殘差迭代模型構建聯合深度超參數卷積和交叉關聯注意力的光流估計網絡, 提升光流估計的整體性能. 分別采用MPI-Sintel和KITTI測試圖像集對本文方法和現有代表性光流估計方法進行綜合對比分析, 實驗結果表明本文方法取得了較好的光流估計性能, 尤其在大位移場景下具有更好的估計準確性與魯棒性.Abstract: To improve the computation accuracy and robustness of deep-learning based optical flow models under large displacement scenes, we propose an optical flow estimation method jointing depthwise over-parameterized convolution and cross correlation attention. First, we construct a depthwise over-parameterized convolution model by combining the common convolution and depthwise convolution, which extracts more features and accelerates the convergence speed of optical flow network. This improves the optical flow accuracy without increasing computation complexity. Second, we exploit a feature extraction encoder based on cross correlation attention network, which extracts multi-scale long distance context feature information by stack the attention layers to obtain a larger receptive field. This improves the robustness of optical flow estimation under large displacement scenes. Finally, a pyramid residual iteration network by combing cross correlation attention and depthwise over-parameterized convolution is presented to improve the overall performance of optical flow estimation. We compare our method with the existing representative approaches by using the MPI-Sintel and KITTI datasets. The experimental results demonstrate that the proposed method achieves better computation accuracy and robustness, especially under large displacement areas.
-
表 1 MPI-Sintel數據集圖像序列光流估計結果
Table 1 Optical flow calculation results of image sequences in MPI-Sintel dataset
Clean Final 對比方法 All Matched Unmatched All Matched Unmatched IRR-PWC[14] 3.844 1.472 23.220 4.579 2.154 24.355 PPAC-HD3[36] 4.589 1.507 29.751 4.599 2.116 24.852 LiteFlowNet2[37] 3.483 1.383 20.637 4.686 2.248 24.571 IOFPL-ft[38] 4.394 1.611 27.128 4.224 1.956 22.704 PWC-Net[25] 4.386 1.719 26.166 5.042 2.445 26.221 HMFlow[39] 3.206 1.122 20.210 5.038 2.404 26.535 SegFlow153[40] 4.151 1.246 27.855 6.191 2.940 32.682 SAMFL[41] 4.477 1.763 26.643 4.765 2.282 25.008 本文方法 2.763 1.062 16.656 4.202 2.056 21.696 表 2 數據集運動邊緣與大位移指標對比結果
Table 2 Comparison results of motion edge and large displacement index in MPI-Sintel dataset
Clean Final 對比方法 ${d}_{0\text{-}10}$ ${d}_{10\text{-}60}$ ${d}_{60\text{-}140}$ ${s}_{0\text{-}10}$ ${s}_{10\text{-}40}$ ${s}_{40+}$ ${d}_{0\text{-}10}$ ${d}_{10\text{-}60}$ ${d}_{60\text{-}140}$ ${s}_{0\text{-}10}$ ${s}_{10\text{-}40}$ ${s}_{40+}$ IRR-PWC[14] 3.509 1.296 0.721 0.535 1.724 25.430 4.165 1.843 1.292 0.709 2.423 28.998 PPAC-HD3[36] 2.788 1.340 1.068 0.355 1.289 33.624 3.521 1.702 1.637 0.617 2.083 30.457 LiteFlowNet2[37] 3.293 1.263 0.629 0.597 1.772 21.976 4.048 1.899 1.473 0.811 2.433 29.375 IOFPL-ft[38] 3.059 1.421 0.943 0.391 1.292 31.812 3.288 1.479 1.419 0.646 1.897 27.596 PWC-Net[25] 4.282 1.657 0.674 0.606 2.070 28.793 4.636 2.087 1.475 0.799 2.986 31.070 HMFlow[39] 2.786 0.957 0.584 0.467 1.693 20.470 4.582 2.213 1.465 0.926 3.170 29.974 SegFlow153[40] 3.072 1.143 0.656 0.486 2.000 27.563 4.969 2.492 2.119 1.201 3.865 36.570 SAMFL[41] 3.946 1.623 0.811 0.618 1.860 29.995 4.208 1.846 1.449 0.893 2.587 29.232 本文方法 2.772 0.854 0.443 0.541 1.621 16.575 3.884 1.660 1.292 0.753 2.381 25.715 表 3 KITTI2015數據集計算結果 (%)
Table 3 Calculation results in KITTI2015 dataset (%)
表 4 MPI-Sintel數據集上消融實驗結果對比
Table 4 Comparison of ablation experiment results in MPI-Sintel dataset
消融模型 All Matched Unmatched $s_{10\text{-}40}$ $s_{40+}$ Baseline 3.844 1.472 23.220 1.724 25.430 Baseline_CS 2.892 1.070 17.765 1.662 17.460 Baseline_deconv 3.621 1.461 21.272 1.659 23.482 Full model 2.763 1.062 16.656 1.621 16.575 亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] 張驕陽, 叢爽, 匡森. n比特隨機量子系統實時狀態估計及其反饋控制. 自動化學報, DOI: 10.16383/j.aas.c210916Zhang Jiao-Yang, Cong Shuang, Kuang Sen. Real-time state estimation and feedback control for n-qubit stochastic quantum systems. Acta Automatica Sinica, DOI: 10.16383/j.aas.c210916 [2] 張偉, 黃衛民. 基于種群分區的多策略自適應多目標粒子群算法. 自動化學報, 2022, 48(10): 2585?2599 doi: 10.16383/j.aas.c200307Zhang Wei, Huang Wei-Min. Multi-strategy adaptive multi-objective particle swarm optimization algorithm based on swarm partition. Acta Automatica Sinica, 2022, 48(10): 2585?2599 doi: 10.16383/j.aas.c200307 [3] 張芳, 趙東旭, 肖志濤, 耿磊, 吳駿, 劉彥北. 單幅圖像超分辨率重建技術研究進展. 自動化學報, 2022, 48(11): 2634?2654 doi: 10.16383/j.aas.c200777Zhang Fang, Zhao Dong-Xu, Xiao Zhi-Tao, Geng Lei, Wu Jun, Liu Yan-Bei. Research progress of single image super-resolution reconstruction technology. Acta Automatica Sinica, 2022, 48(11): 2634?2654 doi: 10.16383/j.aas.c200777 [4] 楊天金, 侯振杰, 李興, 梁久禎, 宦娟, 鄭紀翔. 多聚點子空間下的時空信息融合及其在行為識別中的應用. 自動化學報, 2022, 48(11): 2823?2835 doi: 10.16383/j.aas.c190327Yang Tian-Jin, Hou Zhen-Jie, Li Xing, Liang Jiu-Zhen, Huan Juan, Zheng Ji-Xiang. Recognizing action using multi-center subspace learning-based spatial-temporal information fusion. Acta Automatica Sinica, 2022, 48(11): 2823?2835 doi: 10.16383/j.aas.c190327 [5] 閆夢凱, 錢建軍, 楊健. 弱對齊的跨光譜人臉檢測. 自動化學報, 2023, 49(1): 135?147 doi: 10.16383/j.aas.c210058Yan Meng-Kai, Qian Jian-Jun, Yang Jian. Weakly aligned cross-spectral face detection. Acta Automatica Sinica, 2023, 49(1): 135?147 doi: 10.16383/j.aas.c210058 [6] 郭迎春, 馮放, 閻剛, 郝小可. 基于自適應融合網絡的跨域行人重識別方法. 自動化學報, 2022, 48(11): 2744?2756 doi: 10.16383/j.aas.c220083Guo Ying-Chun, Feng Fang, Yan Gang, Hao Xiao-Ke. Cross-domain person re-identification on adaptive fusion network. Acta Automatica Sinica, 2022, 48(11): 2744?2756 doi: 10.16383/j.aas.c220083 [7] Horn B K P, Schunck B G. Determining optical flow. Artificial Intelligence, 1981, 17(1-3): 185?203 doi: 10.1016/0004-3702(81)90024-2 [8] Sun D Q, Roth S, Black M J. Secrets of optical flow estimation and their principles. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, USA: IEEE, 2010. 2432?2439 [9] Menze M, Heipke C, Geiger A. Discrete optimization for optical flow. In: Proceedings of the 37th German Conference Pattern Recognition (GCPR). Aachen, Germany: Springer, 2015. 16?28 [10] Chen Q F, Koltun V. Full flow: Optical flow estimation by global optimization over regular grids. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4706?4714 [11] Dosovitskiy A, Fischer P, Ilg E, H?usser P, Hazirbas C, Golkov V. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2758?2766 [12] Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2720?2729 [13] Amiaz T, Lubetzky E, Kiryati N. Coarse to over-fine optical flow estimation. Pattern Recognition, 2007, 40(9): 2496?2503 doi: 10.1016/j.patcog.2006.09.011 [14] Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5754?5763 [15] Tu Z G, Xie W, Zhang D J, Poppe R, Veltkamp R C, Li B X, et al. A survey of variational and CNN-based optical flow techniques. Signal Processing: Image Communication, 2019, 72: 9?24 doi: 10.1016/j.image.2018.12.002 [16] Zhang C X, Ge L Y, Chen Z, Li M, Liu W, Chen H. Refined TV-L.1 optical flow estimation using joint filtering. IEEE Transactions on Multimedia, 2020, 22(2): 349?364 doi: 10.1109/TMM.2019.2929934 [17] Dalca A V, Rakic M, Guttag J, Sabuncu M R. Learning conditional deformable templates with convolutional networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. Article No. 32 [18] Chen J, Lai J H, Cai Z M, Xie X H, Pan Z G. Optical flow estimation based on the frequency-domain regularization. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 217?230 doi: 10.1109/TCSVT.2020.2974490 [19] Zhai M L, Xiang X Z, Lv N, Kong X D. Optical flow and scene flow estimation: A survey. Pattern Recognition, 2021, 114: Article No. 107861 doi: 10.1016/j.patcog.2021.107861 [20] Zach C, Pock T, Bischof H. A duality based approach for realtime TV-L.1 optical flow. In: Proceedings of the 29th DAGM Symposium on Pattern Recognition. Heidelberg, Germany: Springer, 2007. 214?223 [21] Zhao S Y, Zhao L, Zhang Z X, Zhou E Y, Metaxas D. Global matching with overlapping attention for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17571?17580 [22] Li Z W, Liu F, Yang W J, Peng S H, Zhou J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999?7019 doi: 10.1109/TNNLS.2021.3084827 [23] Han J W, Yao X W, Cheng G, Feng X X, Xu D. P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(2): 579?590 doi: 10.1109/TPAMI.2019.2933510 [24] Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1647?1655 [25] Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 8934?8943 [26] Wang Z G, Chen Z, Zhang C X, Zhou Z K, Chen H. LCIF-Net: Local criss-cross attention based optical flow method using multi-scale image features and feature pyramid. Signal Processing: Image Communication, 2023, 112: Article No. 116921 doi: 10.1016/j.image.2023.116921 [27] Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 402?419 [28] Han K, Xiao A, Wu E H, Guo J Y, Xu C J, Wang Y H. Transformer in transformer. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021. 15908?15919 [29] Jiang S H, Campbell D, Lu Y, Li H D, Hartley R. Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, USA: Canada, 2021. 9752?9761 [30] Xu H F, Zhang J, Cai J F, Rezatofighi H, Tao D C. GMFlow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 8111?8120 [31] Cao J M, Li Y Y, Sun M C, Chen Y, Lischinski D, Cohen-Or D, et al. DO-Conv: Depthwise over-parameterized convolutional layer. IEEE Transactions on Image Processing, 2022, 31: 3726?3736 doi: 10.1109/TIP.2022.3175432 [32] Dong X Y, Bao J M, Chen D D, Zhang W M, Yu N H, Yuan L, et al. CSWin transformer: A general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 12114?12124 [33] Huang Z L, Wang X G, Huang L C, Huang C, Wei Y C, Liu W Y. CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019. 603?612 [34] Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the 12th European Conference on Computer Vision (ECCV). Florence, Italy: Springer, 2012. 611?625 [35] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 3061?3070 [36] Wannenwetsch A S, Roth S. Probabilistic pixel-adaptive refinement networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11639?11648 [37] Hui T W, Tang X O, Loy C C. A lightweight optical flow CNN—revisiting data fidelity and regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2555?2569 doi: 10.1109/TPAMI.2020.2976928 [38] Hofinger M, Bulò S R, Porzi L, Knapitsch A, Pock T, Kontschieder P. Improving optical flow on a pyramid level. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 770?786 [39] Yu S H J, Zhang Y M, Wang C, Bai X, Zhang L, Hancock E R. HMFlow: Hybrid matching optical flow network for small and fast-moving objects. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE, 2021. 1197?1204 [40] Chen J, Cai Z M, Lai J H, Xie X H. Efficient segmentation-based PatchMatch for large displacement optical flow estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3595?3607 doi: 10.1109/TCSVT.2018.2885246 [41] Zhang C X, Zhou Z K, Chen Z, Hu W M, Li M, Jiang S F. Self-attention-based multiscale feature learning optical flow with occlusion feature map prediction. IEEE Transactions on Multimedia, 2022, 24: 3340?3354 doi: 10.1109/TMM.2021.3096083 [42] Lu Z, Xie H, Liu C, et al. Bridging the gap between vision transformers and convolutional neural networks on small datasets. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA, 2022: 14663?14677
計量
- 文章訪問數: 211
- HTML全文瀏覽量: 100
- 被引次數: 0