Term Association Modelling in Information Retrieval

dc.contributor.advisorHuang, Xiangji
dc.creatorZhao, Jiashu
dc.date.accessioned2015-08-28T15:38:01Z
dc.date.available2015-08-28T15:38:01Z
dc.date.copyright2015-03-23
dc.date.issued2015-08-28
dc.date.updated2015-08-28T15:38:01Z
dc.degree.disciplineComputer Science
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractMany traditional Information Retrieval (IR) models assume that query terms are independent of each other. For those models, a document is normally represented as a bag of words/terms and their frequencies. Although traditional retrieval models can achieve reasonably good performance in many applications, the corresponding independence assumption has limitations. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this thesis, I propose a new concept named Cross Term, to model term proximity, with the aim of boosting retrieval performance. With Cross Terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query term impact gradually weakens with increasing distance from the place of occurrence. Shape functions are used to characterize such impacts. Based on this assumption, I first propose a bigram CRoss TErm Retrieval (CRTER2) model for probabilistic IR and a Language model based model CRTER2LM. Specifically, a bigram Cross Term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. Second, I propose a generalized n-gram CRoss TErm Retrieval (CRTERn) model recursively for n query terms where n>2. For n-gram Cross Term, I develop several distance metrics with different properties and employ them in the proposed models for ranking. Third, an enhanced context-sensitive proximity model is proposed to boost the CRTER models, where the contextual relevance of term proximity is studied. The models are validated on several large standard data sets, and show improved performance over other state-of-art approaches. I also discusse the practical impact of the proposed models. The approaches in this thesis can also provide helpful benefit for term association modeling in other domains.
dc.identifier.urihttp://hdl.handle.net/10315/30082
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsInformation Retrieval
dc.subject.keywordsTerm Association
dc.subject.keywordsContext-sensitive
dc.subject.keywordsBi-gram
dc.subject.keywordsN-gram
dc.subject.keywordsSearch
dc.subject.keywordsProximity
dc.subject.keywordsKernel Function
dc.subject.keywordsIndexing
dc.subject.keywordsExperimentation
dc.subject.keywordsmodeling
dc.titleTerm Association Modelling in Information Retrieval
dc.typeElectronic Thesis or Dissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhao_Jiashu_2015_PhD.pdf
Size:
2.41 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.38 KB
Format:
Plain Text
Description: