Comparative Analysis of Transformer-Based Language Models for Text Analysis in the Domain of Sustainable Development

Marina G ErechtchoukovaNabil Safwat2023-08-042023-08-042023-08-04https://hdl.handle.net/10315/41364With advancements of Artificial Intelligence, Natural Language Processing (NLP) has gained a lot of attention because of its potential to facilitate complex human-machine interactions, enhance language-based applications, and automate processing of unstructured texts. The study investigates the transfer learning approach on Transformer-based Language models, abstractive text summarization approach, and their application to the domain of Sustainable Development with the goal to determine SDGs representation in scientific publications using the text summarization technique. To achieve this, the traditional transfer learning framework was expanded so that: (1) the relevance of textual documents to specified text can be evaluated, (2) neural language models, namely BART and T5, were selected, and (3) 8 text similarity measures were investigated to identify the most informative ones. Both the BART and T5 models were fine-tuned on an acquired domain-specific corpus of scientific publications extracted from Scopus Elsevier database. The relevance of recently published works to an SDG was determined by calculating semantic similarity scores between each model generated summary to the SDG’s description. The proposed framework made it possible to identify goals that dominated the developed corpus and those that require further attention of the research community.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Information scienceArtificial intelligenceEnvironmental studiesComparative Analysis of Transformer-Based Language Models for Text Analysis in the Domain of Sustainable DevelopmentElectronic Thesis or Dissertation2023-08-04transformer-based language modelstransfer learningsemantic similarityabstractive text summarizationsustainable developmentdocument relevance