Wednesday, June 24, 2020 - 8:00am to 10:00am
Location:
to take place via ZoomSpeaker:
Zhuyun DaiEvent Website:
https://cmu.zoom.us/j/98831630294For More Information, Contact:
Corey Bisbal, cbisbal@andrew.cmu.eduCommittee:
Jamie Callan (Chair), Carnegie Mellon University
Graham Neubig, Carnegie Mellon University
Tie-Yan Liu, Carnegie Mellon University & Microsoft Research
Yiqun Liu, Tsinghua University, Beijing
Abstract:
The first part of this dissertation focuses on how queries and documents are matched. State-of-the-art rankers have previously relied on exact lexical match, which causes the well-known vocabulary mismatch problem. This dissertation develops neural models that bring soft match into relevance ranking. Using distributed text representations, our models can soft match every query word to every document word. As the soft match signals are noisy, this dissertation presents a novel kernel-pooling technique that groups soft matches based on their contribution to relevance. This dissertation also studies whether pre-trained model parameters can improve low-resource domains, and whether the model architectures are re-usable in a non-text retrieval task. Our approaches outperform previous state-of-the-art ranking systems by large margins.
The second part of this dissertation focuses on how queries and documents are represented. A typical search engine uses frequency statistics to weight words, but frequent words are not necessarily essential to the meaning of the text.
This dissertation develops neural networks to estimate word importance based on how a word interacts with its linguistic context. A weak-supervision approach is developed that allows training our models without any human annotations. Our models can run offline, significantly improving first-stage retrieval without hurting efficiency.
To summarize, this dissertation formulates a new neural retrieval paradigm that overcomes classic retrieval models' limitations in matching and importance weighting. It points out several promising paths in neural relevance ranking, deep retrieval models, and deep document understanding for IR.
For a copy of the defense thesis please go to the following link:
http://www.cs.cmu.edu/~