基于閱讀技巧識別和雙通道融合機制的機器閱讀理解方法
doi: 10.16383/j.aas.c220983
-
1.
中國科學(xué)院信息工程研究所 北京 100085
-
2.
中關(guān)村實(shí)驗室 北京 100080
-
3.
中國科學(xué)院大學(xué)網(wǎng)絡(luò )空間安全學(xué)院 北京 100085
A Machine Reading Comprehension Approach Based on Reading Skill Recognition and Dual Channel Fusion Mechanism
-
1.
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085
-
2.
Zhongguancun Laboratory, Beijing 100080
-
3.
School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100085
-
摘要: 機器閱讀理解任務(wù)旨在要求系統對給定文章進(jìn)行理解, 然后對給定問(wèn)題進(jìn)行回答. 先前的工作重點(diǎn)聚焦在問(wèn)題和文章間的交互信息, 忽略了對問(wèn)題進(jìn)行更加細粒度的分析(如問(wèn)題所考察的閱讀技巧是什么?). 受先前研究的啟發(fā), 人類(lèi)對于問(wèn)題的理解是一個(gè)多維度的過(guò)程. 首先, 人類(lèi)需要理解問(wèn)題的上下文信息; 然后, 針對不同類(lèi)型問(wèn)題, 識別其需要使用的閱讀技巧; 最后, 通過(guò)與文章交互回答出問(wèn)題答案. 針對這些問(wèn)題, 提出一種基于閱讀技巧識別和雙通道融合的機器閱讀理解方法, 對問(wèn)題進(jìn)行更加細致的分析, 從而提高模型回答問(wèn)題的準確性. 閱讀技巧識別器通過(guò)對比學(xué)習的方法, 能夠顯式地捕獲閱讀技巧的語(yǔ)義信息. 雙通道融合機制將問(wèn)題與文章的交互信息和閱讀技巧的語(yǔ)義信息進(jìn)行深層次的融合, 從而達到輔助系統理解問(wèn)題和文章的目的. 為了驗證該模型的效果, 在FairytaleQA數據集上進(jìn)行實(shí)驗, 實(shí)驗結果表明, 該方法實(shí)現了在機器閱讀理解任務(wù)和閱讀技巧識別任務(wù)上的最好效果.Abstract: Machine reading comprehension task aims to require the system to understand a given passage and then answer a question. Previous researches focus on the interaction between questions and passages. However, they neglect to make a more granular analysis of the questions, e.g., what is the reading skill examined by the questions? Inspired by the previous reading comprehension literature, the understanding of questions is a multi-dimensional process where humans first need to understand the context semantics of the question, then identify the reading skills they need to use for different types of questions, and finally answer the question. In the end, we propose a machine reading comprehension method based on reading skill recognition and dual channel fusion mechanism to make a comprehensive analysis of questions, so as to improve the accuracy of the model in answering questions. Specifically, the reading skill recognizer can capture the semantic representations of reading skills through contrastive learning. The dual channel fusion mechanism deeply integrates the contextual information and the semantic representations of reading skills, so as to help the system understand the question and passage. To verify the effectivenesss of the model, we conduct experiments on the FairytaleQA dataset. The experimental results show that the proposed method achieves the state-of-the-art performance on machine reading comprehension task and reading skill recognition task.
-
表 1 FairytaleQA數據集的主要統計數據
Table 1 Core statistics of the FairytaleQA dataset
項目 均值 標準偏差 最小值 最大值 每個(gè)故事章節數 15.6 9.8 2 60 每個(gè)故事單詞數 2305.4 1480.8 228 7577 每個(gè)章節單詞數 147.7 60.0 12 447 每個(gè)故事問(wèn)題數 41.7 29.1 5 161 每個(gè)章節問(wèn)題數 2.9 2.4 0 18 每個(gè)問(wèn)題單詞數 10.5 3.2 3 27 每個(gè)答案單詞數 7.2 5.8 1 70 下載: 導出CSV表 2 FairytaleQA數據集中驗證集和測試集上的性能對比 (%)
Table 2 Performance comparison on the validation and the test set in FairytaleQA dataset (%)
模型名稱(chēng) 驗證集 測試集 B-1 B-2 B-3 B-4 ROUGE-L METEOR B-1 B-2 B-3 B-4 ROUGE-L METEOR 輕量化模型 Seq2Seq 25.12 6.67 2.01 0.81 13.61 6.94 26.33 6.72 2.17 0.81 14.55 7.34 CAQA-LSTM 28.05 8.24 3.66 1.57 16.15 8.11 30.04 8.85 4.17 1.98 17.33 8.60 Transformer 21.87 4.94 1.53 0.59 10.32 6.01 21.72 5.21 1.74 0.67 10.27 6.22 預訓練語(yǔ)言模型 DistilBERT — — — — 9.70 — — — — — 8.20 — BERT — — — — 10.40 — — — — — 9.70 — BART 19.13 7.92 3.42 2.14 12.25 6.51 21.05 8.93 3.90 2.52 12.66 6.70 微調模型 BART-Question-types — — — — — — — — — — 49.10 — CAQA-BART 52.59 44.17 42.76 40.07 53.20 28.31 55.73 47.00 43.68 40.45 55.13 28.80 BART-NarrativeQA 45.34 39.17 36.33 34.10 47.39 24.65 48.13 41.50 38.26 36.97 49.16 26.93 BART-FairytaleQA$ \dagger $ 51.74 43.30 41.23 38.29 53.88 27.09 54.04 45.98 42.08 39.46 53.64 27.45 BART-FairytaleQA ? 51.28 43.96 41.51 39.05 54.11 26.86 54.82 46.37 43.02 39.71 54.44 27.82 本文模型 54.21 47.38 44.65 43.02 58.99 29.70 57.36 49.55 46.23 42.91 58.48 30.93 人類(lèi)表現 — — — — 65.10 — — — — — 64.40 — 下載: 導出CSV表 3 FairytaleQA數據集中驗證集和測試集上的各組件消融實(shí)驗結果 (%)
Table 3 The performance of ablation study on each component in our model on the validation set and the test set of the FairytaleQA dataset (%)
模型設置 驗證集 測試集 B-1 B-2 B-3 B-4 ROUGE-L METEOR B-1 B-2 B-3 B-4 ROUGE-L METEOR SOTA 模型 51.28 43.96 41.51 39.05 54.11 26.86 54.82 46.37 43.02 39.71 54.44 27.82 去除閱讀技巧識別器 52.15 44.47 42.11 40.73 55.38 27.45 54.90 47.16 43.55 40.67 56.48 29.31 去除對比學(xué)習損失 53.20 45.07 42.88 41.94 56.75 28.15 55.22 47.98 44.13 41.42 57.34 30.20 去除雙通道融合機制 52.58 45.38 43.15 41.62 57.22 27.75 55.79 48.20 44.96 41.28 57.12 29.88 本文模型 54.21 47.38 44.65 43.02 58.99 29.70 57.36 49.55 46.23 42.91 58.48 30.93 下載: 導出CSV表 4 基于交叉熵損失的方法和基于有監督對比學(xué)習的方法在2個(gè)任務(wù)上的效果 (%)
Table 4 The performance of cross-entropy-loss-based method and supervised contrastive learning method on the two tasks (%)
實(shí)驗設置 準確率 B-4 ROUGE-L METEOR 基于交叉熵損失的方法 91.40 41.42 57.34 30.20 本文基于有監督對比
學(xué)習損失的方法93.77 42.91 58.48 30.93 下載: 導出CSV表 5 不同輸入下的閱讀技巧識別器的識別準確率 (%)
Table 5 The recognition accuracy of reading skill recognizer under different inputs (%)
實(shí)驗設置 驗證集 測試集 只輸入問(wèn)題 85.31 82.56 輸入問(wèn)題和文章 92.24 93.77 下載: 導出CSV亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] Hermann K M, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, et al. Teaching machines to read and comprehend. In: Proceedings of the Neural Information Processing Systems. Montreal, Canada: 2015. 1693–1701 [2] Seo M J, Kembhavi A, Farhadi A, Hajishirzi H. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv: 1611.01603, 2016. [3] Tay Y, Wang S, Luu A T, Fu J, Phan M C, Yuan X, et al. Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives. In: Proceedings of the Conference of the Association for Computational Linguistics. Florence, Italy: 2019. 4922–4931 [4] Peng W, Hu Y, Yu J, Xing L X, Xie Y Q. APER: Adaptive evidence-driven reasoning network for machine reading comprehension with unanswerable questions. Knowledge-Based Systems, 2021, 229: Article No. 107364 doi: 10.1016/j.knosys.2021.107364 [5] Perevalov A, Both A, Diefenbach D, Ngomo A N. Can machine translation be a reasonable alternative for multilingual question answering systems over knowledge graphs? In: Proceedings of the ACM Web Conference. Lyon, France: 2022. 977–986 [6] Xu Y, Wang D, Yu M, Ritchie D, Yao B, Wu T, et al. Fantastic questions and where to find them: FairytaleQA——An authentic dataset for narrative comprehension. In: Proceedings of the Conference of the Association for Computational Linguistics. Dublin, Ireland: 2022. 447–460 [7] Liu S, Zhang X, Zhang S, Wang H, Zhang W. Neural machine reading comprehension: Methods and trends. arXiv preprint arXiv: 1907.01118, 2019. [8] Yan M, Xia J, Wu C, Bi B, Zhao Z, Zhang J, et al. A deep cascade model for multi-document reading comprehension. In: Proceedings of the Conference on Artificial Intelligence. Honolulu, USA: 2019. 7354–7361 [9] Liao J, Zhao X, Li X, Tang J, Ge B. Contrastive heterogeneous graphs learning for multi-hop machine reading comprehension. World Wide Web, 2022, 25(3): 1469?1487 doi: 10.1007/s11280-021-00980-6 [10] Lehnert W G. Human and computational question answering. Cognitive Science, 1977, 1(1): 47?73 doi: 10.1207/s15516709cog0101_3 [11] Kim Y. Why the simple view of reading is not simplistic: Unpacking component skills of reading using a direct and indirect effect model of reading. Scientific Studies of Reading, 2017, 21(4): 310?333 doi: 10.1080/10888438.2017.1291643 [12] Sugawara S, Yokono H, Aizawa A. Prerequisite skills for reading comprehension: Multi-perspective analysis of MCTest datasets and systems. In: Proceedings of the Conference on Artificial Intelligence. San Francisco, USA: 2017. 3089–3096 [13] Weston J, Bordes A, Chopra S, Mikolov T. Towards AI-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv: 1502.05698, 2015. [14] Purves A C, S?ter A, Takala S, V?h?passi A. Towards a domain-referenced system for classifying composition assignments. Research in the Teaching of English, 1984: 385?416 [15] V?h?passi A. On the specification of the domain of school writing. Afinlan Vuosikirja, 1981: 85?107 [16] Chen D, Bolton J, Manning C D. A thorough examination of the CNN/daily mail reading comprehension task. arXiv preprint arXiv: 1606.02858, 2016. [17] Rajpurkar P, Zhang J, Lopyrev K, Liang P. Squad: 100,000 + questions for machine comprehension of text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Austin, USA: 2016. 2383–2392 [18] Richardson M, Renshaw E. MCTest: A challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Washington, USA: 2013. 193–203 [19] Kocisk'y T, Schwarz J, Blunsom P, Dyer C, Hermann K M, Melis G. The narrativeqa reading comprehension challenge. Transactions of the Association for Computational Linguistics, 2018, 6: 317?328 doi: 10.1162/tacl_a_00023 [20] Yang B, Mitchell T M. Leveraging knowledge bases in LSTMs for improving machine reading. In: Proceedings of the Conference of the Association for Computational Linguistics. Vancou-ver, Canada: 2017. 1436–1446 [21] Zhang Z, Wu Y, Zhou J, Duan S, Zhao H, Wang R. SG-Net: Syntax-guided machine reading comprehension. In: Proceedings of the Conference on Artificial Intelligence. New York, USA: 2020. 9636–9643 [22] Kao K Y, Chang C H. Applying information extraction to storybook question and answer generation. In: Proceedings of the Conference on Computational Linguistics and Speech Process-ing. Taipei, China: 2022. 289–298 [23] Lu J, Sun X, Li B, Bo L, Zhang T. BEAT: Considering question types for bug question answering via templates. Knowledge-Based Systems, 2021, 225: Article No. 107098 [24] Yang C, Jiang M, Jiang B, Zhou W, Li K. Co-attention network with question type for visual question answering. IEEE Access, 2019, (7): 40771?40781 doi: 10.1109/ACCESS.2019.2908035 [25] Wu Z, Xiong Y, Yu S X, Lin D. Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Computer Society, 2018. 3733– 3742 [26] Chen X, He K. Exploring simple siamese representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Virtual Event: IEEE, 2021. 15750– 15758 [27] Yang J, Duan J, Tran S, Xu Y, Chanda S, Li Q C, et al. Vision-language pre-training with triple contrastive learning. arXiv preprint arXiv: 2202.10401, 2022. [28] Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE Computer Society, 2005. 539–546 [29] Weinberger K Q, Saul L K. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Lear-ning Research, 2009, 10(2): 207?244 doi: 10.5555/1577069.1577078 [30] Gao T, Yao X, Chen D. SimCSE: Simple contrastive learning of sentence embeddings. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Virtual Event: 2021. 6894–6910 [31] Giorgi J M, Nitski O, Wang B, Bader G D. DeCLUTR: Deep contrastive learning for unsupervised textual representations. In: Proceedings of the Conference of the Association for Computational Linguistics. Virtual Event: 2021. 879–895 [32] Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Krishnan D, et al. Supervised contrastive learning. In: Proceedings of the Neural Information Processing Systems. Virtual Event: 2020. 18661–18673 [33] Li S, Hu X, Lin L, Wen L. Pair-level supervised contrastive learning for natural language inference. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Virtual Event: IEEE, 2022. 8237–8241 [34] Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics. Minneapolis, USA: 2019. 4171– 4186 [35] Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the Conference of the Association for Computational Linguistics. Philadelphia, USA: 2002. 311–318 [36] Lin C Y. ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out, 2004: 74?81 [37] Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the Conference of the Association for Computational Linguistics. Ann Arbor, USA: 2005. 65–72 [38] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the Association for Computational Linguistics. Virtual Event: 2020. 7871–7880 [39] Loshchilov I, Hutter F. Fixing weight decay regularization in adam. arXiv preprint arXiv: 1711.05101, 2017. [40] Cho K, Merrienboer B, Bengio Y, Gulcehre C, Bahdanau D, Bougares F, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Empirical Methods in Natural Language Processing. Doha, Qatar: 2014. 1724–1734 [41] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, et al. Attention is all you need. In: Proceedings of the Neural Information Processing Systems. Long Beach, USA: 2017. 5998–6008 [42] Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108, 2019. [43] Mou X, Yang C, Yu M, Yao B, Guo X, Potdar S. Narrative question answering with cutting-edge open-domain QA techni-ques: A comprehensive study. Transactions of the Association for Computational Linguistics, 2021, 9: 1032?1046 doi: 10.1162/tacl_a_00411