目標跟蹤中基于IoU和中心點(diǎn)距離預測的尺度估計
doi: 10.16383/j.aas.c210356
-
1.
南昌航空大學(xué)軟件學(xué)院 南昌 330063
-
2.
江西省圖像處理與模式識別重點(diǎn)實(shí)驗室 南昌 330063
Accurate Scale Estimation With IoU and Distance Between Centroids for Object Tracking
-
1.
School of Software, Nanchang Hangkong University, Nanchang 330063
-
2.
Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang 330063
-
摘要: 通過(guò)分析基于交并比(Intersection over union, IoU)預測的尺度估計模型的梯度更新過(guò)程, 發(fā)現其在訓練和推理過(guò)程僅將IoU作為度量, 缺乏對預測框和真實(shí)目標框中心點(diǎn)距離的約束, 導致外觀(guān)模型更新過(guò)程中模板受到污染, 前景和背景分類(lèi)時(shí)定位出現偏差. 基于此發(fā)現, 構建了一種結合IoU和中心點(diǎn)距離的新度量NDIoU (Normalization distance IoU), 在此基礎上提出一種新的尺度估計方法, 并將其嵌入判別式跟蹤框架. 即在訓練階段以NDIoU為標簽, 設計了具有中心點(diǎn)距離約束的損失函數監督網(wǎng)絡(luò )的學(xué)習, 在線(xiàn)推理期間通過(guò)最大化NDIoU微調目標尺度, 以幫助外觀(guān)模型更新時(shí)獲得更加準確的樣本. 在七個(gè)數據集上與相關(guān)主流方法進(jìn)行對比, 所提方法的綜合性能優(yōu)于所有對比算法. 特別是在GOT-10k數據集上, 所提方法的AO、$S{R}_{0.50}$和$ S{R}_{0.75} $三個(gè)指標達到了65.4%、78.7%和53.4%, 分別超過(guò)基線(xiàn)模型4.3%、7.0%和4.2%.
-
關(guān)鍵詞:
- 目標跟蹤 /
- 交并比 /
- 尺度估計 /
- 中心點(diǎn)距離
Abstract: This paper first analyzes the gradient update process of the scale estimation model of intersection over union (IoU) prediction in detail, and finds that when the IoU is used as a metric in the training and inference process, the target scale estimation in the tracking process is inaccurate due to the absence of the constraint on the distance between the two centroids. As a result, the template is polluted in the updating process of the object appearance model, which cannot discriminate the target and environment. With this insight, we propose a new metric NDIoU (normalization distance IoU) that combines the IoU and distance between two centroids to estimate the target scale and proposes a new scale estimation method, which is embedded into the discriminative tracking framework. Using NDIoU as the label to supervise the distance between centroids, it is incorporated into the loss function to facilitate the learning of the network. During online inference, NDIoU is maximized to fine-tune the target scale. Finally, the proposed method is embedded into the discriminative tracking framework and compared with related state-of-the-art methods on seven data sets. The extensive experiments demonstrate that our method outperforms all the state-of-the-art algorithms. Especially, on the GOT-10k dataset, our method achieves 65.4%, 78.7% and 53.4% on the three metrics of AO, $S{R}_{0.50}$ and $ S{R}_{0.75} $, which are better than the baseline by 4.3%, 7.0% and 4.2%, respectively. -
圖 1 IoU相同但中心點(diǎn)距離不同的情況(紅色代表候選的邊界框, 綠色代表真實(shí)邊界框)
Fig. 1 Same IoU while different distances between centroids (Red represents the candidate bounding box, and green represents the ground-truth bounding box)
圖 3 IoU和中心點(diǎn)距離對應視頻幀數的統計
Fig. 3 The number statistics of video frame corresponding to IoU and distances between centroids
圖 5 本文方法(ASEID)在OTB-100數據集上與相關(guān)方法的比較
Fig. 5 Comparison of the proposed method (ASEID) with related algorithms on OTB-100 dataset
圖 6 OTB-100 數據集不同挑戰性因素影響下的成功率圖
Fig. 6 Success plots on sequences with different challenging attributes on OTB-100 dataset
圖 7 OTB-100 數據集不同挑戰性因素影響下的精度圖
Fig. 7 Precision plots on sequences with different challenging attributes on OTB-100 dataset
圖 8 本文方法與相關(guān)方法的可視化比較
Fig. 8 Visualization comparison of the proposed method and related trackers
圖 9 OTB-100數據集中的失敗案例(綠色框代表真實(shí)框, 紅色框代表本文算法的預測框)
Fig. 9 Failure cases in OTB-100 dataset (The green box represents ground truth box, and the red box represents the prediction box)
圖 10 GOT-10k數據集中的失敗案例(在GOT-10k的測試集中, 由于只能拿到測試視頻序列的第一幀的真實(shí)框, 因此第一幀的標記代表被跟蹤目標)
Fig. 10 Failure cases in GOT-10k dataset (In GOT-10k test set, only the ground truth in the first frame of the test dataset can be obtained. Therefore, the bounding box of the first frame represents the tracked target)
表 1 OTB-100數據集上的消融實(shí)驗
Table 1 Ablation study on OTB-100 dataset
方法 AUC (%) Precision (%) Norm.Pre (%) 幀速率(幀/s) 多尺度搜索 68.4 88.8 83.8 21 IoU 68.4 89.4 84.2 35 NDIoU 69.8 91.3 87.3 35 下載: 導出CSV表 3 在VOT2018數據集上與SOTA方法的比較
Table 3 Compare with SOTA trackers on VOT2018 dataset
DRT[40] RCO[22] UPDT[38] DaSiamRPN[39] MFT[41] LADCF[42] ATOM[9] SiamRPN++[16] DiMP50 (基線(xiàn))[14] PrDiMP50[15] ASEID (本文) EAO 0.356 0.376 0.378 0.383 0.385 0.389 0.401 0.414 0.440 0.442 0.454 Robustness 0.201 0.155 0.184 0.276 0.140 0.159 0.204 0.234 0.153 0.165 0.153 Accuracy 0.519 0.507 0.536 0.586 0.505 0.503 0.590 0.600 0.597 0.618 0.615 下載: 導出CSV表 4 在GOT-10k數據集上與SOTA方法的比較(%)
Table 4 Compare with SOTA trackers on GOT-10k dataset (%)
DCFST[32] PrDiMP50[15] KYS[17] SiamFC++[13] D3S[43] Ocean[12] ROAM[44] ATOM[9] DiMP50 (基線(xiàn))[14] ASEID (本文) $ \mathit{S}{\mathit{R}}_{0.50}$ 68.3 73.8 75.1 69.5 67.6 72.1 46.6 63.4 71.7 78.7 $ \mathit{S}{\mathit{R}}_{0.75} $ 44.8 54.3 51.5 47.9 46.2 — 16.4 40.2 49.2 53.4 $ \mathit{A}\mathit{O}$ 59.2 63.4 63.6 59.5 59.7 61.1 43.6 55.6 61.1 65.4 下載: 導出CSV亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] Wu Y, Lim J, Yang M H. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834?1848 doi: 10.1109/TPAMI.2014.2388226 [2] 孟琭, 楊旭. 目標跟蹤算法綜述. 自動(dòng)化學(xué)報, 2019, 45(7): 1244?1260Meng Lu, Yang Xu. A survey of object tracking algorithms. Acta Automatica Sinica, 2019, 45(7): 1244?1260 [3] 尹宏鵬, 陳波, 柴毅, 劉兆棟. 基于視覺(jué)的目標檢測與跟蹤綜述. 自動(dòng)化學(xué)報, 2016, 42(10): 1466?1489Yin Hong-Peng, Chen Bo, Chai Yi, Liu Zhao-Dong. Vision-based object detection and tracking: A review. Acta Automatica Sinica, 2016, 42(10): 1466?1489 [4] 譚建豪, 鄭英帥, 王耀南, 馬小萍. 基于中心點(diǎn)搜索的無(wú)錨框全卷積孿生跟蹤器. 自動(dòng)化學(xué)報, 2021, 47(4): 801?812Tan Jian-Hao, Zheng Ying-Shuai, Wang Yao-Nan, Ma Xiao-Ping. AFST: Anchor-free fully convolutional siamese tracker with searching center point. Acta Automatica Sinica, 2021, 47(4): 801?812 [5] Danelljan M, H?ger G, Khan F S, Felsberg M. Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 4310−4318 [6] Dai K, Wang D, Lu H C, Sun C, Li J. Visual tracking via adaptive spatially-regularized correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4665?4674 [7] Danelljan M, H?ger G, Khan F S, Felsberg M. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561?1575 doi: 10.1109/TPAMI.2016.2609928 [8] Li Y, Zhu J. A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 254?265 [9] Danelljan M, Bhat G, Khan F S, Felsberg M. ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4655?4664 [10] Li B, Yan J J, Wu W, Zhu Z, Hu X L. High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8971?8980 [11] Wang Q, Bertinetto L, Hu W M, Torr P H S. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 1328?1338 [12] Zhang Z P, Peng H W, Fu J L, Li B, Hu W M. Ocean: Object-aware anchor-free tracking. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 771?787 [13] Xu Y D, Wang Z Y, Li Z X, Yuan Y, Yu G. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 12549?12556 [14] Bhat G, Danelljan M, Van Gool L, Timofte R. Learning discriminative model prediction for tracking. In: Proceedings of the International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 6181?6190 [15] Danelljan M, Van Gool L, Timofte R. Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF Conference Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 7181?7190 [16] Li B, Wu W, Wang Q, Zhang F Y, Xing J L, Yan J J. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4277?4286 [17] Bhat G, Danelljan M, Van Gool L, Timofte R. Know your surroundings: Exploiting scene information for object tracking. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 205?221 [18] Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440?1448 [19] Ren S, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137?1149 doi: 10.1109/TPAMI.2016.2577031 [20] Jiang B R, Luo R X, Mao J Y, Xiao T T, Jiang Y N. Acquisition of localization confidence for accurate object detection. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 816?832 [21] Mueller M, Smith N, Ghanem B. A benchmark and simulator for UAV tracking. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 445?461 [22] Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc L ?, et al. The sixth visual object tracking VOT2018 challenge results. In: Proceedings of the 15th European Conference on Computer Vision workshop. Munich, Germany: Springer, 2018. 3?53 [23] Huang L H, Zhao X, Huang K Q. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562?1577 doi: 10.1109/TPAMI.2019.2957464 [24] Fan H, Lin L T, Yang F, Chu P, Deng G, Yu S J, et al. LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 5369?5378 [25] Müller M, Bibi A, Giancola S, Subaihi S, Ghanem B. TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 310?327 [26] Liang P P, Blasch E, Ling H B. Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 2015, 24(12): 5630?5644 doi: 10.1109/TIP.2015.2482905 [27] Zheng Z H, Wang P, Liu W, Li J Z, Ye R G, Ren D W. Distance-IoU loss: Faster and better learning for bounding box regression. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 12993?13000 [28] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 740?755 [29] Li X, Ma C, Wu B Y, He Z Y, Yang M H. Target-aware deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 1369?1378 [30] Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 2019 Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2019. Article No. 721 [31] Danelljan M, Bhat G. PyTracking: Visual tracking library based on PyTorch [Online], available: https://gitee.com/zengzheming/pytracking, November 2, 2021 [32] Zheng L Y, Tang M, Chen Y Y, Wang J Q, Lu H Q. Learning feature embeddings for discriminant model based tracking. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 759?775 [33] Chen Z D, Zhong B N, Li G R, Zhang S P, Ji R R. Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 6667?6676 [34] Du F, Liu P, Zhao W, Tang X L. Correlation-guided attention for corner detection based visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 6835?6844 [35] Wang N, Zhou W G, Qi G J, Li H Q. POST: Policy-based switch tracking. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 12184?12191 [36] Jung I, You K, Noh H, Cho M, Han B. Real-time object tracking via meta-learning: Efficient model adaptation and one-shot channel pruning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11205?11212 [37] Danelljan M, Bhat G, Khan F S, Felsberg M. ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 6931?6939 [38] Bhat G, Johnander J, Danelljan M, Khan F S, Felsberg M. Unveiling the power of deep tracking. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 493?509 [39] Zhu Z, Wang Q, Li B, Wei W, Yan J J, Hu W M. Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 103?119 [40] Sun C, Wang D, Lu H C, Yang M H. Correlation tracking via joint discrimination and reliability learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 489?497 [41] Bai S, He Z Q, Dong Y, Bai H L. Multi-hierarchical independent correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). London, UK: IEEE, 2020. 1?6 [42] Xu T Y, Feng Z H, Wu X J, Kittler J. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Transactions on Image Processing, 2019, 28(11): 5596?5609 doi: 10.1109/TIP.2019.2919201 [43] Lukezic A, Matas J, Kristan M. D3S——A discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 7131?7140 [44] Yang T Y, Xu P F, Hu R B, Chai H, Chan A B. ROAM: Recurrently optimizing tracking model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 6717?6726 [45] Huang L H, Zhao X, Huang K Q. GlobalTrack: A simple and strong baseline for long-term tracking. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11037?11044 [46] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 4293?4302 [47] Wang N, Song Y B, Ma C, Zhou W G, Liu W. Unsupervised deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 1308?1317 [48] Huang J L, Zhou W G. Re.2EMA: Regularized and reinitialized exponential moving average for target model update in object tracking. In: Proceedings of the 33th AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu, USA: AIAA, 2019. Article No. 1037 [49] Jung I, Son J, Baek M, Han B. Real-time MDNet. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 89?104 [50] Choi J, Kwon J, Lee K M. Deep meta learning for real-time target-aware visual tracking. In: Proceedings of the International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 911?920