Accurate Scale Estimation with IoU and Distance between Centroids for Object Tracking
-
摘要: 目標跟蹤中基于IoU (Intersection over union, IoU)預測的尺度估計方法, 通過估計視頻幀中候選框與真實目標框的重疊度訓練尺度回歸模型, 推理階段通過最大化IoU對初始化邊界框進行微調, 取得目標的尺度. 本文詳細分析了基于IoU預測的尺度估計模型的梯度更新過程, 發現其在訓練和推理過程僅將IoU作為度量, 缺乏對預測框和真實目標框中心點距離的約束, 導致外觀模型更新過程中模板受到污染, 前景和背景分類時定位出現偏差. 基于此發現, 本文構建了一種結合IoU和中心點距離的新度量NDIoU (Normalization distance IoU), 在此基礎上提出一種新的尺度估計方法, 并將其嵌入判別式跟蹤框架. 即在訓練階段以NDIoU為標簽, 設計了具有中心點距離約束的損失函數監督網絡的學習, 在線推理期間通過最大化NDIoU微調目標尺度, 以幫助外觀模型更新時獲得更加準確的樣本. 在七個數據上與相關主流方法進行對比, 本文方法在七個數據集上的綜合性能優于所有對比算法. 特別是在GOT-10k數據集上, 本文方法的AO、
$ S{R}_{0.5} $ 和$ S{R}_{0.75} $ 三個指標達到了65.4%、78.7%和53.4%, 分別超過基線模型4.3%、7.0%和4.2%.Abstract: The scale estimation based on IoU (Intersection over Union) prediction in object tracking trains the bounding box regression branch by estimating the IoU between the candidate box and the ground-truth in the video frame, and fine-tunes the initial bounding box by maximizing the IoU to obtain the object scale during inference. This paper first analyzes the gradient update process of the scale estimation model of IoU prediction in detail, and finds that when the IoU is used as a metric in the training and inference process, the target scale estimation in the tracking process is inaccurate due to the absence of distance between the two centroids. As a result, the template is polluted in the updating process of the object appearance model, which cannot discriminate the target and environment. With this insight, we propose a new metric NDIoU (Normalization Distance IoU) that combines the IoU and distance between two centroids to estimate the target scale and proposes a new scale estimation method, which is embedded into the discriminative tracking framework. NDIoU is used as the label to supervise the learning of the network and train the scale regression model. During online inference, NDIoU is maximized to fine-tune the target scale. Finally, the proposed method is embedded into the discriminative tracking framework and compared with related state-of-the-art methods on seven data sets. The extensive experiments demonstrate that our method outperforms all the state-of-the-art algorithms. Especially, on the GOT-10k data set, our method achieves 65.4%, 78.7% and 53.4% on the three metrics of AO,$ S{R}_{0.5} $ and$ S{R}_{0.75} $ , which are better than the baseline by 4.3%, 7.0% and 4.2%, respectively.-
Key words:
- Object Tracking /
- IoU /
- Scale Estimation /
- Distance between Centroids
1) 收稿日期?2021-04-24 錄用日期?2021-11-02 Manuscript?received?April?24,?2021;?accepted?November?2, 2021 國家自然科學基金 (62162045) 資助,?江西省科技支撐計劃項目20192BBE50073?資助 Supported?by?National?Natural?Science?Foundation?of?P.?R. China?(62162045),?supported?by?Jiangxi?Provincial?Science?and Technology?Key?Project?20192BBE50073 本文責任編委 Recommended?by?Associate?Editor 1.?南昌航空大學軟件學院計算機視覺研究所?南昌?330063 2.江西省圖像處理與模式識別重點實驗室?南昌?3300632) 1.?School?of?Software,?Nanchang?Hangkong?University,?Nanchang?330063 2.?Key?Laboratory?of?Jiangxi?Province?for?Image Processing?and?Pattern?Recognition,?Nanchang?Hangkong?University,?Nanchang?330063 -
表 1 OTB-100上的消融實驗
Table 1 Ablation study on OTB-100
AUC (%) Precision (%) Norm.Pre (%) FPS 多尺度搜索 68.4 88.8 83.8 21 IoU 68.4 89.4 84.2 35 NDIoU 69.8 91.3 87.3 35 表 2 在UAV123上和SOTA算法的比較
Table 2 Compare with SOTA trackers on UAV123
表 3 在VOT2018上與SOTA方法的比較
Table 3 Compare with SOTA trackers on VOT2018
DRT[38] RCO[39] UPDT[32] DaSiamRPN[37] MFT[39] LADCF[40] ATOM[9] SiamRPN++[16] Dimp50 (baseline)[14] PrDiMP50[15] ASEID (ours) EAO 0.356 0.376 0.378 0.383 0.385 0.389 0.401 0.414 0.440 0.442 0.454 Robustness 0.201 0.155 0.184 0.276 0.140 0.159 0.204 0.234 0.153 0.165 0.153 Accuracy 0.519 0.507 0.536 0.586 0.505 0.503 0.590 0.600 0.597 0.618 0.615 表 4 在GOT-10k上與SOTA方法的比較
Table 4 Compare with SOTA trackers on GOT-10k
DCFST[30] PrDiMP50[15] KYS[17] SiamFC++[13] D3S[41] Ocean[12] ROAM[31] ATOM[7] DiMP50 (baseline)[14] ASEID (ours) $ \mathit{S}{\mathit{R}}_{0.50}\left (\mathbf{\%}\right) $ 68.3 73.8 75.1 69.5 67.6 72.1 46.6 63.4 71.7 78.7 $ \mathit{S}{\mathit{R}}_{0.75} $ (%) 44.8 54.3 51.5 47.9 46.2 — 16.4 40.2 49.2 53.4 $ \mathit{A}\mathit{O}\left (\mathbf{\%}\right) $ 59.2 63.4 63.6 59.5 59.7 61.1 43.6 55.6 61.1 65.4 表 5 在LaSOT上與SOTA方法的比較
Table 5 Compare with SOTA trackers on LaSOT
表 6 在TrackingNet上與SOTA方法的比較
Table 6 Compare with SOTA trackers on TrackingNet
表 7 在TC128上與SOTA算法比較
Table 7 Compare with SOTA trackers on TC128
亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] Wu Y, Lim J, and Yang M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834–1848. doi: 10.1109/TPAMI.2014.2388226 [2] 孟琭, 楊旭. 目標跟蹤算法綜述[J]. 自動化學報, 2019, 45(07): 1244-1260Meng Lu, Yang Xu. A review of target tracking algorithms[J]. ACTA AUTOMATICA SINICA, 2019, 45(07): 1244-1260 [3] 尹宏鵬, 陳波, 柴毅, 劉兆棟. 基于視覺的目標檢測與跟蹤綜述[J]. 自動化學報, 2016, 42(10): 1466-1489Yin Hong-Peng, Chen Bo, Chai Yi, Liu Zhao-Dong. A review of object detection and tracking based on vision[J]. ACTA AUTOMATICA SINICA, 2016, 42(10): 1466-1489 [4] 譚建豪, 鄭英帥, 王耀南, 馬小萍. 基于中心點搜索的無錨框全卷積孿生跟蹤器[J]. 自動化學報, 2021, 47(04): 801-812Tan Jian-Hao, Zheng Ying-Shuai, Wang Yao-Nan, Ma Xiao-Ping. AFST: Anchor-free fully convolutional siamese tracker with searching center[J]. ACTA AUTOMATICA SINICA, 2021, 47(04): 801-812 [5] Danelljan M, Hager G, Khan F S. Learning spatially regularized correlation filters for visual tracking[C]. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 4310–4318. [6] Dai K, Wang D, Lu H, Sun C, and Li J. Visual tracking via adaptive spatially-regularized correlation filters[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4670–4679. [7] Danelljan M, Hager G, Khan F S, and Felsberg M. Discriminative scale space tracking[J]. IEEE Transaction Pattern Analysis Machine Intelligence, 2017, 39(8): 1561–1575. doi: 10.1109/TPAMI.2016.2609928 [8] Li Y and Zhu J. A scale adaptive kernel correlation filter tracker with feature integration[C]. In: Proceedings of the 13th European Conference on Computer Vision. Switzerland, Zurich: Springer, 2014. 254–265. [9] Danelljan M, Bhat G, Khan F S, and Felsberg M. ATOM: Accurate tracking by overlap maximization[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4660–4669. [10] Li B, Yan J, Wu W, Zhu Z, and Hu X. High performance visual tracking with siamese region proposal network[C]. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8971–8980. [11] Wang Q, Bertinetto L, Hu W, and Torr P. Fast online object tracking and segmentation: a unifying approach[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 1328–1338. [12] Zhang Z, Peng H, Fu J, Li B, Hu W. Ocean: Object-aware anchor-free tracking[C]. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 771–787. [13] Xu Y, Wang Z, Li Z, Yuan Y, Yu G. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines[C]. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 12549–12556. [14] Bhat G, Danelljan M, Gool L, and Timofte R. Learning discriminative model prediction for tracking[C]. In: Proceedings of the 2019 International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 6181–6190. [15] Danelljan M, Gool L, Timofte R. Probabilistic regression for visual tracking[C]. In: Proceedings of the 2020 IEEE Conference Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 7183–7192. [16] Li B, Wu W, Wang Q, Zhang F, Xing J, and Yan J. SiamRPN++: Evolution of siamese visual tracking with very deep networks[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4282–4291. [17] Bhat G, Danelljan M, Gool L, Timofte R. Know your surroundings: exploiting scene information for object tracking[C]. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 205–221. [18] Girshick R. Fast R-CNN[C]. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440–1448. [19] Ren S, He K, Girshick R, and Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. In: IEEE Transaction Pattern Analysis Machine Intelligence, 2015, 39(6): 1137–1149. [20] Jiang B, Luo R, Mao J, Xiao T, and Jiang Y. Acquisition of localization confidence for accurate object detection[C]. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 816–832. [21] Mueller M, Smith N, and Ghanem B. A benchmark and simulator for UAV tracking[C]. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, Netherlands: Springer, 2016. 445–461. [22] Kristan M, Leonardis A, Matas J, Felsberg M, Pfugfelder R, Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A, Fernandez G, and et al. The sixth visual object tracking VOT2018 challenge results. In: Proceedings of the 15th European Conference on Computer Vision workshop. Munich, Germany: Springer, 2018. 3–53. [23] Huang L, Zhao X, and Huang K. Got-10k: A Large High-diversity Benchmark for Generic Object Tracking in the Wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, pp. 1562–1577. [24] Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu X, Liao C, and Ling H. LaSOT: A high-quality benchmark for large-scale single object tracking[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 5374–5383. [25] Muller M, Bibi A, Giancola S, Subaihi S, and Ghanem B. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild[C]. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 310–327. [26] Liang P, Blasch E, Lin H. Encoding color information for visual tracking: algorithms and benchmark[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5630–5644. doi: 10.1109/TIP.2015.2482905 [27] Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D. Distance-IoU loss: faster and better learning for bounding box regression[C]. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020, 34(7): 12993–13000. [28] Danelljan M, Bhat G, Khan F S, and Felsberg M. ECO: Efficient convolution operators for tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6931–6939. [29] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 4293–4302. [30] Zheng L, Tang M, Chen Y, Wang J, Lu H. Learning feature embeddings for discriminant model based tracking[C]. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 759–775. [31] Yang T, Xu P, Hu R, Chai H, Chan A. ROAM: Recurrently optimizing tracking model[C]. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 6718–6727. [32] Bhat G, Johnander J, Danelljan M, Khan F S, and Felsberg M. Unveiling the power of deep tracking[C]. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 493–509. [33] Chen Z, Zhong B, Li G, Zhang S, and Ji R. Siamese box adaptive network for visual tracking[C]. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 6668–6677. [34] Du F, Liu P, Zhao W, Tang X. Correlation-guided attention for corner detection based visual tracking[C]. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 6836–6845. [35] Wang N, Zhou W, Qi G, Li H. POST: POlicy-based switch tracking[C]. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 12184–12191. [36] Jung I, You K, Noh H, Cho M, Han B. Real-Time object tracking via meta-learning: efficient model adaptation and one-shot channel pruning[C]. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11205–11212. [37] Zhu Z, Wang Q, Li B, Wei W, Yan J, Hu W. Distractor-aware siamese networks for visual object tracking[C]. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 101–117. [38] Sun C, Wang D, Lu H, Yang M. Correlation tracking via joint discrimination and reliability learning[C]. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 489–497. [39] Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pfugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, Gustavo Fernandez, and et al. The sixth visual object tracking VOT2018 challenge results. In: Proceedings of the 15th European Conference on Computer Vision Workshop. Munich, Germany: Springer, 2018. [40] Xu T, Feng Z, Wu X, and Kittler J. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual tracking[Online]. ArXiv Preprint ArXiv: 1807.11348, 2018. [41] Lukezic A, Matas J, Kristan M. D3S – A discriminative single shot segmentation tracker[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 7133–7142. [42] Huang L, Zhao X, Huang K. GlobalTrack: A simple and strong baseline for long-term tracking[C]. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11037–11044. [43] Wang N, Song Y, Ma C, Zhou W, Liu W. Unsupervised deep tracking[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 1308–1317. [44] Li X, Ma C, Wu B, He Z, Yang M. Target-aware deep tracking[C]. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 1369–1378. [45] Huang J, and Zhou W. Re2EMA: Regularized and reinitialized exponential moving average for target model update in object tracking[C]. In: Proceedings of the 2019 AAAI Conference on Artificial Intelligence, 2019. 8457–8464. [46] Jung I, Song J, Baek M, Han B. Real-time MDNet[C]. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 89–104. [47] Choi J, Kwon J, Lee K. Deep meta learning for real-time target-aware visual tracking[C]. In: Proceedings of the 2019 International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 911–920. [48] Paszke A, Gross S, Massa F, and et. al. Pytorch: An imperative style, high-performance deep learning library [C]. In: Proceedings of the 2019 Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2019. [49] Martin Danelljan, Goutam Bhat. PyTracking: Visual tracking library based on PyTorch.https://github.com/visionml/pytracking,2019. [50] Lin T, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Dollar P, and Zitnick C. Microsoft COCO: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. Switzerland, Zurich: Springer, 2014: 740–755.
計量
- 文章訪問數: 641
- HTML全文瀏覽量: 394
- 被引次數: 0