← smac.pub home

Expansion via Prediction of Importance with Contextualization

pdf bibtex code slides 104 citations short conference paper

Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

Appeared in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020)

Links/IDs:
DOI 10.1145/3397271.3401262 DBLP conf/sigir/MacAvaneyN0TGF20 arXiv 2004.14245 Google Scholar 7wWfoDgAAAAJ:8k81kl-MbHgC Semantic Scholar 0c57dcf959ead9530f9ec3ebe0dd58de42a3e8af smac.pub sigir2020-epic

Abstract:

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.

BibTeX @inproceedings{macavaney:sigir2020-epic, author = {MacAvaney, Sean and Nardini, Franco Maria and Perego, Raffaele and Tonellotto, Nicola and Goharian, Nazli and Frieder, Ophir}, title = {Expansion via Prediction of Importance with Contextualization}, booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2020}, url = {https://arxiv.org/abs/2004.14245}, doi = {10.1145/3397271.3401262}, pages = {1573--1576} }