← smac.pub home

Reproducing and Unifying Learned Sparse Retrieval Methods

bibtex reproducibility paper to appear

Authors: Thong Nguyen, Sean MacAvaney, Andrew Yates

Appearing in: Proceedings of the 45th European Conference on Information Retrieval Research (ECIR 2023)

Links/IDs:
Google Scholar 7wWfoDgAAAAJ:maZDTaKrznsC smac.pub ecir2023-lsr

Abstract:

Learned sparse retrieval (LSR) is a family of first-stage retrieval approaches that are trained to generate sparse lexical representations for queries and vectors for inverted indexing. Recently, there has been an increasing number of LSR methods introduced with Splade model(s) achieving superior performance on MSMARCO. However, despite the similarity in the model's architecture, some LSR methods showed diverged effectiveness and efficiency due to different training and testing environments. These discrepancies make it difficult to compare LSR methods and derive valid insights. In this work, we analyze existing LSR methods and identify key components to establish an LSR framework that unifies all LSR methods under the same perspectives. We then reproduce all models using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect the system's effectiveness and efficiency. As a result, we show how a SOTA model could be modified to reduce latency by more than 74% while keeping the same effectiveness.

BibTeX @inproceedings{nguyen:ecir2023-lsr, author = {Nguyen, Thong and MacAvaney, Sean and Yates, Andrew}, title = {Reproducing and Unifying Learned Sparse Retrieval Methods}, booktitle = {Proceedings of the 45th European Conference on Information Retrieval Research}, year = {2023} }