Neural-based Knowledge Transfer in Natural Language Processing
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In Natural Language Processing (NLP), neural-based knowledge transfer, which is to transfer out-of-domain (OOD) knowledge to task-specific neural networks, has been applied to many NLP tasks. To further explore neural-based knowledge transfer in NLP, in this dissertation, we consider both structured OOD knowledge and unstructured OOD knowledge, and deal with several representative NLP tasks. For structured OOD knowledge, we study the neural-based knowledge transfer in Machine Reading Comprehension (MRC). In single-passage MRC tasks, to bridge the gap between MRC models and human beings, which is mainly reflected in the hunger for data and the robustness to noise, we integrate the neural networks of MRC models with the general knowledge of human beings embodied in knowledge bases. On the one hand, we propose a data enrichment method, which uses WordNet to extract inter-word semantic connections as general knowledge from each given passage-question pair. On the other hand, we propose a novel MRC model named Knowledge Aided Reader (KAR), which explicitly uses the above extracted general knowledge to assist its attention mechanisms. According to the experimental results, KAR is comparable in performance with the state-of-the-art MRC models, and significantly more robust to noise than them. On top of that, when only a subset (20%-80%) of the training examples are available, KAR outperforms the state-of-the-art MRC models by a large margin, and is still reasonably robust to noise. In multi-hop MRC tasks, to probe the strength of Graph Neural Networks (GNNs), we propose a novel multi-hop MRC model named Graph Aided Reader (GAR), which uses GNN methods to perform multi-hop reasoning, but is free of any pre-trained language model and completely end-to-end. For graph construction, GAR utilizes the topic-referencing relations between passages and the entity-sharing relations between sentences, which is aimed at obtaining the most sensible reasoning clues. For message passing, GAR simulates a top-down reasoning and a bottom-up reasoning, which is aimed at making the best use of the above obtained reasoning clues. According to the experimental results, GAR even outperforms several competitors relying on pre-trained language models and filter-reader pipelines, which implies that GAR benefits a lot from its GNN methods. On this basis, GAR can further benefit from applying pre-trained language models, but pre-trained language models can mainly facilitate the within-passage reasoning rather than cross-passage reasoning of GAR. Moreover, compared with the competitors constructed as filter-reader pipelines, GAR is not only easier to train, but also more applicable to the low-resource cases. For unstructured OOD knowledge, we study the neural-based knowledge transfer in Natural Language Understanding (NLU), and focus on the neural-based knowledge transfer between languages, which is also known as Cross-Lingual Transfer Learning (CLTL). To facilitate the CLTL of NLU models, especially the CLTL between distant languages, we propose a novel CLTL model named Translation Aided Language Learner (TALL), where CLTL is integrated with Machine Translation (MT). Specifically, we adopt a pre-trained multilingual language model as our baseline model, and construct TALL by appending a decoder to it. On this basis, we directly fine-tune the baseline model as an NLU model to conduct CLTL, but put TALL through an MT-oriented pre-training before its NLU-oriented fine-tuning. To make use of unannotated data, we implement the recently proposed Unsupervised Machine Translation (UMT) technique in the MT-oriented pre-training of TALL. According to the experimental results, the application of UMT enables TALL to consistently achieve better CLTL performance than the baseline model without using more annotated data, and the performance gain is relatively prominent in the case of distant languages.