← smac.pub home

Content-Based Weak Supervision for Ad-Hoc Re-Ranking

bibtex pdf arxiv slides poster doi: 10.1145/3331184.3331316 blog post short conference paper

Authors: Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder

Appeared in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019)

Abstract:

One problem with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document \textit{pairs} that already exhibit relevance. Specifically, we investigate using newswire headline-content pairs and encyclopedic heading-paragraph pairs for training neural ranking models. We further propose filtering techniques to eliminate training samples from these sources that are too far out of domain using two techniques: a heuristic-based approach and supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures, multiple weak supervision datasets, and multiple evaluation datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that our filtering technique can further improve ranking performance.

BibTeX @InProceedings{macavaney:sigir2019-nyt, author = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, title = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, booktitle = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2019}, url = {https://arxiv.org/abs/1707.00189}, doi = {10.1145/3331184.3331316} }