-
摘要: 多模態數據間交互式任務的興起對于綜合利用不同模態的知識提出了更高的要求, 因此融合不同模態知識的多模態知識圖譜應運而生. 然而, 現有多模態知識圖譜存在圖譜知識不完整的問題, 嚴重阻礙對信息的有效利用. 緩解此問題的有效方法是通過實體對齊進行知識圖譜補全. 當前多模態實體對齊方法以固定權重融合多種模態信息, 在融合過程中忽略不同模態信息貢獻的差異性. 為解決上述問題, 設計一套自適應特征融合機制, 根據不同模態數據質量動態融合實體結構信息和視覺信息. 此外, 考慮到視覺信息質量不高、知識圖譜之間的結構差異也影響實體對齊的效果, 本文分別設計提升視覺信息有效利用率的視覺特征處理模塊以及緩和結構差異性的三元組篩選模塊. 在多模態實體對齊任務上的實驗結果表明, 提出的多模態實體對齊方法的性能優于當前最好的方法.Abstract: The recent surge of interactive tasks involving multi-modal data brings a high demand for utilizing knowledge in different modalities. This facilitated the birth of multi-modal knowledge graphs, which aggregate multi-modal knowledge to meet the demands of the tasks. However, they are known to suffer from the knowledge incompleteness problem that hinders the utilization of information. To mitigate this problem, it is of great need to improve the knowledge coverage via entity alignment. Current entity alignment methods fuse multi-modal information by fixed weighting, which ignores the different contributions of individual modalities. To solve this challenge, we propose an adaptive feature fusion mechanism, that combines entity structure information and visual information via dynamic fusion according to the data quality. Besides, considering that low quality visual information and structural difference between knowledge graphs further impact the performance of entity alignment, we design a visual feature processing module to improve the effective utilization of visual information and a triple filtering module to ease structural differences. Experiments on multi-modal entity alignment indicate that our method outperforms the state-of-the-arts.
-
Key words:
- Multi-modal knowledge graph /
- entity alignment /
- pre-trained model /
- feature fusion
-
表 1 多模態知識圖譜數據集數據統計
Table 1 Statistic of the MMKGs datasets
數據集 實體 關系 三元組 圖片 SameAs FB15K 14 915 1 345 592 213 13 444 DB15K 14 777 279 99 028 12 841 12 846 Yago15K 15 404 32 122 886 11 194 11 199 表 2 多模態實體對齊結果
Table 2 Results of multi-modal entity alignment
數據集 方法 seed = 20% seed = 50% Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K IKRL 2.96 11.45 0.059 5.53 24.41 0.121 GCN-align 6.26 18.81 0.105 13.79 34.60 0.210 PoE 11.10 17.80 — 23.50 33.00 — HMEA 12.16 34.86 0.191 27.24 51.77 0.354 AF2MEA 17.75 34.14 0.233 29.45 50.25 0.365 FB15K-Yago15K IKRL 3.84 12.50 0.075 6.16 20.45 0.111 GCN-align 6.44 18.72 0.106 14.09 34.80 0.209 PoE 8.70 13.30 — 18.50 24.70 — HMEA 10.03 29.38 0.168 27.91 55.31 0.371 AF2MEA 21.65 40.22 0.282 35.72 56.03 0.423 表 3 消融實驗實體對齊結果
Table 3 Entity alignment results of ablation study
數據集 方法 seed = 20% seed = 50% Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K AF2MEA 17.75 34.14 0.233 29.45 50.25 0.365 AF2MEA-Adaptive 16.03 31.01 0.212 26.29 45.35 0.331 AF2MEA-Visual 16.19 30.71 0.212 26.14 45.38 0.323 AF2MEA-Filter 14.13 28.77 0.191 22.91 43.08 0.297 FB15K-Yago15K AF2MEA 21.65 40.22 0.282 35.72 56.25 0.423 AF2MEA-Adaptive 19.32 37.38 0.255 31.77 53.24 0.393 AF2MEA-Visual 19.75 36.38 0.254 32.08 51.53 0.388 AF2MEA-Filter 15.84 32.36 0.216 27.38 48.14 0.345 表 4 實體視覺特征的對齊結果
Table 4 Entity alignment results of visual feature
數據集 方法 seed = 20% seed = 50% Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K HMEA-v 2.07 9.82 0.058 3.91 14.41 0.086 Att 8.81 20.16 0.128 9.57 21.13 0.139 Att+Filter 8.98 20.52 0.131 9.96 22.58 0.144 FB15K-Yago15K HMEA-v 2.77 11.49 0.072 4.28 15.38 0.095 Att 9.25 21.38 0.137 10.56 23.55 0.157 Att+Filter 9.43 21.91 0.138 11.07 24.51 0.158 表 5 不同三元組篩選機制下實體結構特征對齊結果
Table 5 Entity alignment results of structure feature in different filtering mechanism
數據集 方法 seed = 20% seed = 50% Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K Baseline 6.26 18.81 0.105 13.79 34.60 0.210 ${\rm{F}}_{\text{PageRank}}$ 8.03 21.37 0.125 18.90 39.25 0.259 ${\rm{F}}_{\text{random}}$ 7.57 20.76 0.120 16.32 36.48 0.231 ${\rm{F}}_{\text{our}}$ 9.74 25.28 0.150 22.09 44.85 0.297 FB15K-Yago15K Baseline 6.44 18.72 0.106 15.88 36.70 0.229 ${\rm{F}}_{\text{PageRank}}$ 9.54 23.45 0.144 21.67 42.30 0.290 ${\rm{F}}_{\text{random}}$ 8.17 20.86 0.126 18.22 38.55 0.254 ${\rm{F}}_{\text{our}}$ 11.59 28.44 0.175 24.88 47.85 0.327 表 6 自適應特征融合與固定權重融合多模態實體對齊結果
Table 6 Multi-modal entity alignment results of fixed feature fusion and adaptive feature fusion
方法 Group 1 Group 2 Group 3 Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10 FB15K-DB15K Adaptive 16.44 32.97 17.43 33.47 19.29 35.40 Fixed 13.87 28.91 15.82 31.08 18.12 34.33 FB15K-Yago15K Adaptive 16.44 32.97 17.43 33.47 19.29 35.40 Fixed 16.21 33.23 19.55 37.11 22.27 45.52 表 7 補充實驗多模態實體對齊結果
Table 7 Multi-modal entity alignment results of additional experiment
方法 seed = 20% seed = 50% Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR PoE 16.44 32.97 17.430 34.70 53.60 0.414 MMEA 13.87 28.91 15.820 40.26 64.51 0.486 AF2MEA 28.65 48.22 0.382 48.25 75.83 0.569 亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] Zhu S G, Cheng X, Su S. Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 2020, 372: 64?72 doi: 10.1016/j.neucom.2019.09.003 [2] Martinez-Rodriguez J L, Hogan A, Lopez-Arevalo I. Information extraction meets the semantic web: A survey. Semantic Web, 2020, 11(2): 255?335 doi: 10.3233/SW-180333 [3] Yao X C, Van Durme B. Information extraction over structured data: Question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: ACL, 2014. 956?966 [4] Sun Z, Yang J, Zhang J, Bozzon A, Huang L K, Xu C. Recurrent knowledge graph embedding for effective recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems. Vancouver, Canada: ACM, 2018. 297?305 [5] Wang M, Qi G L, Wang H F, Zheng Q S. Richpedia: A comprehensive multi-modal knowledge graph. In: Proceedings of the 9th Joint International Conference on Semantic Technology. Hangzhou, China: Springer, 2019. 130?145 [6] Liu Y, Li H, Garcia-Duran A, Niepert M, Onoro-Rubio D, Rosenblum D S. MMKG: Multi-modal knowledge graphs. In: Proceedings of the 16th International Conference on the Semantic Web. Portoro?, Slovenia: Springer, 2019. 459?474 [7] Shen L, Hong R C, Hao Y B. Advance on large scale near-duplicate video retrieval. Frontiers of Computer Science, 2020, 14(5): Article No. 145702 doi: 10.1007/s11704-019-8229-7 [8] Han Y H, Wu A M, Zhu L C, Yang Y. Visual commonsense reasoning with directional visual connections. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 625?637 [9] Zheng W F, Yin L R, Chen X B, Ma Z Y, Liu S, Yang B. Knowledge base graph embedding module design for visual question answering model. Pattern Recognition, 2021, 120: Article No. 108153 doi: 10.1016/j.patcog.2021.108153 [10] Zeng W X, Zhao X, Wang W, Tang J Y, Tan Z. Degree-aware alignment for entities in tail. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Virtual Event: ACM, 2020. 811?820 [11] Zhao X, Zeng W X, Tang J Y, Wang W, Suchanek F. An experimental study of state-of-the-art entity alignment approaches. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(6): 2610?2625 [12] Zeng W X, Zhao X, Tang J Y, Li X Y, Luo M N, Zheng Q H. Towards entity alignment in the open world: An unsupervised approach. In: Proceedings of the 26th International Conference Database Systems for Advanced Applications. Taipei, China: Springer, 2021. 272?289 [13] Guo H, Tang J Y, Zeng W X, Zhao X, Liu L. Multi-modal entity alignment in hyperbolic space. Neurocomputing, 2021, 461: 598?607 doi: 10.1016/j.neucom.2021.03.132 [14] Wang Z C, Lv Q S, Lan X H, Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: ACL, 2018. 349?357 [15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, 2015. [16] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 770?778 [17] Chen M H, Tian Y T, Yang M H, Zaniolo C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI.org, 2017. 1511?1517 [18] Sun Z Q, Hu W, Zhang Q H, Qu Y Z. Bootstrapping entity alignment with knowledge graph embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: IJCAI.org, 2018. 4396?4402 [19] Chen L Y, Li Z, Wang Y J, Xu T, Wang Z F, Chen E H. MMEA: Entity alignment for multi-modal knowledge graph. In: Proceedings of the 13th International Conference on Knowledge Science, Engineering and Management. Hangzhou, China: Springer, 2020. 134?147 [20] Guo L B, Sun Z Q, Hu W. Learning to exploit long-term relational dependencies in knowledge graphs. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR, 2019. 2505?2514 [21] 莊嚴, 李國良, 馮建華. 知識庫實體對齊技術綜述. 計算機研究與發展, 2016, 53(1): 165?192 doi: 10.7544/issn1000-1239.2016.20150661Zhuang Yan, Li Guo-Liang, Feng Jian-Hua. A survey on entity alignment of knowledge base. Journal of Computer Research and Development, 2016, 53(1): 165?192 doi: 10.7544/issn1000-1239.2016.20150661 [22] 喬晶晶, 段利國, 李愛萍. 融合多種特征的實體對齊算法. 計算機工程與設計, 2018, 39(11): 3395?3400Qiao Jing-Jing, Duan Li-Guo, Li Ai-Ping. Entity alignment algorithm based on multi-features. Computer Engineering and Design, 2018, 39(11): 3395?3400 [23] Trisedya B D, Qi J Z, Zhang R. Entity alignment between knowledge graphs using attribute embeddings. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI Press, 2019. 297?304 [24] Zhu H, Xie R B, Liu Z Y, Sun M S. Iterative entity alignment via joint knowledge embeddings. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI.org, 2017. 4258?4264 [25] Chen M H, Tian Y T, Chang K W, Skiena S, Zaniolo C. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: IJCAI.org, 2018. 3998?4004 [26] Cao Y X, Liu Z Y, Li C J, Liu Z Y, Li J Z, Chua T S. Multi-channel graph neural network for entity alignment. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: ACL, 2019. 1452?1461 [27] Li C J, Cao Y X, Hou L, Shi J X, Li J Z, Chua T S. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: ACL, 2019. 2723?2732 [28] Mao X, Wang W T, Xu H M, Lan M, Wu Y B. MRAEA: An efficient and robust entity alignment approach for cross-lingual knowledge graph. In: Proceedings of the 13th International Conference on Web Search and Data Mining. Houston, USA: ACM, 2020. 420?428 [29] Sun Z Q, Hu W, Li C K. Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proceedings of the 16th International Semantic Web Conference on the Semantic Web (ISWC). Vienna, Austria: Springer, 2018. 628?644 [30] Galárraga L, Razniewski S, Amarilli A, Suchanek F M. Predicting completeness in knowledge bases. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, United Kingdom: ACM, 2017. 375?383 [31] Ferrada S, Bustos B, Hogan A. IMGpedia: A linked dataset with content-based analysis of Wikimedia images. In: Proceedings of the 16th International Semantic Web Conference on the Semantic Web (ISWC). Vienna, Austria: Springer, 2017. 84?93 [32] Xie R B, Liu Z Y, Luan H B, Sun M S. Image-embodied knowledge representation learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI.org, 2017. 3140?3146 [33] Mousselly-Sergieh H, Botschen T, Gurevych I, Roth S. A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. New Orleans, USA: ACL, 2018. 225?234 [34] Tan H, Bansal M. LXMERT: Learning cross-modality encoder representations from transformers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: ACL, 2019. 5100?5111 [35] Li L H, Yatskar M, Yin D, Hsieh C J, Chang K W. What does BERT with vision look at? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Virtual Event: ACL, 2020. 5265?5275 [36] Wang H R, Zhang Y, Ji Z, Pang Y W, Ma L. Consensus-aware visual-semantic embedding for image-text matching. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 18?34 [37] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 740?755 [38] Plummer B A, Wang L W, Cervantes C M, Caicedo J C, Hockenmaier J, Lazebnik S. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2641?2649 [39] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137?1149 doi: 10.1109/TPAMI.2016.2577031 [40] Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997, 45(11): 2673–2681 [41] Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017. 1025?1035 [42] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net, 2017. [43] Wu Y T, Liu X, Feng Y S, Wang Z, Yan R, Zhao D Y. Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI, 2019. 5278?5284 [44] Xing W P, Ghorbani A. Weighted pagerank algorithm. In: Proceedings of the 2nd Annual Conference on Communication Networks and Services Research. Fredericton, Canada: IEEE, 2004. 305?314 [45] Zhang Q H, Sun Z Q, Hu W, Chen M H, Guo L B, Qu Y Z. Multi-view knowledge graph embedding for entity alignment. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI.org, 2019. 5429?5435 [46] Pang N, Zeng W X, Tang J Y, Tan Z, Zhao X. Iterative entity alignment with improved neural attribute embedding. In: Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG2019) Co-located With the 16th Extended Semantic Web Conference (ESWC). Portoro?, Slovenia: CEUR-WS, 2019. 41?46 [47] Huang B, Yang F, Yin M X, Mo X Y, Zhong C. A review of multimodal medical image fusion techniques. Computational and Mathematical Methods in Medicine, 2020, 2020: Article No. 8279342 [48] Atrey P K, Hossain M A, El Saddik A, Kankanhalli M S. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 2010, 16(6): 345?379 doi: 10.1007/s00530-010-0182-0 [49] Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 2017, 37: 98?125 doi: 10.1016/j.inffus.2017.02.003