Content-Based Weak Supervision for Ad-Hoc Re-Ranking

pdf bibtex slides poster 62 citations blog post short conference paper

Authors: Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder

Appeared in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019)

Links/IDs:

DOI 10.1145/3331184.3331316 DBLP conf/sigir/MacAvaneyYHF19 arXiv 1707.00189 Google Scholar 7wWfoDgAAAAJ:ufrVoPGSRksC Semantic Scholar e8bd38a75c12659b4aeb31e2f7199fcc5f591dfd smac.pub sigir2019-nyt

Abstract:

One problem with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document \textit{pairs} that already exhibit relevance. Specifically, we investigate using newswire headline-content pairs and encyclopedic heading-paragraph pairs for training neural ranking models. We further propose filtering techniques to eliminate training samples from these sources that are too far out of domain using two techniques: a heuristic-based approach and supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures, multiple weak supervision datasets, and multiple evaluation datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that our filtering technique can further improve ranking performance.

BibTeX @inproceedings{macavaney:sigir2019-nyt, author = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, title = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, booktitle = {Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2019}, url = {https://arxiv.org/abs/1707.00189}, doi = {10.1145/3331184.3331316}, pages = {993--996} }