pdf bibtex code 27 citations reproducibility paper
Appeared in: Proceedings of the 45th European Conference on Information Retrieval Research (ECIR 2023)
Abstract:
Learned sparse retrieval (LSR) is a family of first-stage retrieval approaches that are trained to generate sparse lexical representations for queries and vectors for inverted indexing. Recently, there has been an increasing number of LSR methods introduced with Splade model(s) achieving superior performance on MSMARCO. However, despite the similarity in the model's architecture, some LSR methods showed diverged effectiveness and efficiency due to different training and testing environments. These discrepancies make it difficult to compare LSR methods and derive valid insights. In this work, we analyze existing LSR methods and identify key components to establish an LSR framework that unifies all LSR methods under the same perspectives. We then reproduce all models using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect the system's effectiveness and efficiency. As a result, we show how a SOTA model could be modified to reduce latency by more than 74% while keeping the same effectiveness.
BibTeX @inproceedings{nguyen:ecir2023-lsr, author = {Nguyen, Thống and MacAvaney, Sean and Yates, Andrew}, title = {A Unified Framework for Learned Sparse Retrieval}, booktitle = {Proceedings of the 45th European Conference on Information Retrieval Research}, year = {2023}, url = {https://arxiv.org/abs/2303.13416}, doi = {10.1007/978-3-031-28241-6_7} }