← smac.pub home

Content-Based Weak Supervision for Ad-Hoc Re-Ranking

bibtex slides doi: 10.1145/3331184.3331316 short conference paper to appear

Authors: Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder

Appearing in: SIGIR 2019


One problem with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document \textit{pairs} that already exhibit relevance. Specifically, we investigate using newswire headline-content pairs and encyclopedic heading-paragraph pairs for training neural ranking models. We further propose filtering techniques to eliminate training samples from these sources that are too far out of domain using two techniques: a heuristic-based approach and supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures, multiple weak supervision datasets, and multiple evaluation datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that our filtering technique can further improve ranking performance.

BibTeX @InProceedings{macavaney:sigir2019-nyt, author = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, title = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, booktitle = {SIGIR 2019}, year = {2019}, doi = {10.1145/3331184.3331316} }