A Unified Framework for Learned Sparse Retrieval

pdf bibtex code 49 citations reproducibility paper

Authors: Thống Nguyen, Sean MacAvaney, Andrew Yates

Appeared in: Proceedings of the 45th European Conference on Information Retrieval Research (ECIR 2023)

Links/IDs:

DOI 10.1007/978-3-031-28241-6_7 DBLP conf/ecir/NguyenMY23 arXiv 2303.13416 Google Scholar 7wWfoDgAAAAJ:hMod-77fHWUC Semantic Scholar 6f25037c6a6b5b9bd857ccede998c84c388a6f0e Enlighten 287838 smac.pub ecir2023-lsr

Abstract:

Learned sparse retrieval (LSR) is a family of first-stage retrieval approaches that are trained to generate sparse lexical representations for queries and vectors for inverted indexing. Recently, there has been an increasing number of LSR methods introduced with Splade model(s) achieving superior performance on MSMARCO. However, despite the similarity in the model's architecture, some LSR methods showed diverged effectiveness and efficiency due to different training and testing environments. These discrepancies make it difficult to compare LSR methods and derive valid insights. In this work, we analyze existing LSR methods and identify key components to establish an LSR framework that unifies all LSR methods under the same perspectives. We then reproduce all models using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect the system's effectiveness and efficiency. As a result, we show how a SOTA model could be modified to reduce latency by more than 74% while keeping the same effectiveness.

BibTeX @inproceedings{nguyen:ecir2023-lsr, author = {Nguyen, Thống and MacAvaney, Sean and Yates, Andrew}, title = {A Unified Framework for Learned Sparse Retrieval}, booktitle = {Proceedings of the 45th European Conference on Information Retrieval Research}, year = {2023}, url = {https://arxiv.org/abs/2303.13416}, doi = {10.1007/978-3-031-28241-6_7} }