PARADE: Passage Representation Aggregation for Document Reranking

pdf bibtex 191 citations journal article

Authors: Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun

Appeared in: TOIS

Links/IDs:

DOI 10.1145/3600088 DBLP journals/tois/LiYMHS24 arXiv 2008.09093 Google Scholar 7wWfoDgAAAAJ:YOwf2qJgpHMC Semantic Scholar e1d89e5a9585a28c9530e2e336876d3dd41c021d Enlighten 298257 smac.pub tois2023-parade

Abstract:

Pretrained transformer models, such as BERT and T5, have shown to be highly effective at ad-hoc passage and document ranking. Due to inherent sequence length limits of these models, they need to be run over a document's passages, rather than processing the entire document sequence at once. Although several approaches for aggregating passage-level signals have been proposed, there has yet to be an extensive comparison of these techniques. In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score. We find that a novel transformer-based aggregation technique---that is, one that uses an additional transformer model that operates over passage representations---can significantly improve over techniques proposed in prior works, such as taking the maximum passage score. In particular, transformer-based passage aggregators can significantly improve results on collections with broad information needs where relevance signals can be spread throughout the document (such as TREC Robust04 and GOV2). Meanwhile, prior proposed techniques, such as taking the maximum over passage scores, works better on collections with an information need that that can often be pinpointed to a single passage (such as TREC DL and TREC Genomics). We also conduct an extensive efficiency analyses, and highlight several strategies for improving transformer-based aggregation. When combined with knowledge distillation, a model with 72% fewer parameters achieves effectiveness competitive with previous approaches using the base version of BERT.

BibTeX @article{li:tois2023-parade, author = {Li, Canjia and Yates, Andrew and MacAvaney, Sean and He, Ben and Sun, Yingfei}, title = {PARADE: Passage Representation Aggregation for Document Reranking}, year = {2023}, url = {https://arxiv.org/abs/2008.09093}, doi = {10.1145/3600088}, journal = {TOIS} }