Shallow Cross-Encoders for Low-Latency Retrieval

pdf bibtex 6 citations long conference paper

Authors: Aleksandr A. Petrov, Craig Macdonald, Sean MacAvaney

Appeared in: Proceedings of the 46th European Conference on Information Retrieval Research (ECIR 2024)

Links/IDs:

DOI 10.1007/978-3-031-56063-7_10 DBLP conf/ecir/PetrovMM24 arXiv 2403.20222 Google Scholar 7wWfoDgAAAAJ:ldfaerwXgEUC Semantic Scholar 82ea931d6cad7b3b9d1bbe349510ade124df18b3 Enlighten 312856 smac.pub ecir2024-gbce

Abstract:

Transformer-based Cross-Encoders achieve state-of-the-art effectiveness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, keeping search latencies low is important for user satisfaction and energy usage. In this paper, we show that shallow transformer models (i.e., transformers with a limited number of layers) provide a better tradeoff between effectiveness and efficiency compared to the full-scale models in the low-latency scenario. We further show that shallow transformers may benefit from the generalised Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking datasets demonstrate significant improvements in shallow and full-scale models in low-latency scenarios. For example, when the latency limit is 25ms per query, MonoBERT-LARGE (a cross-encoder based on a full-scale BERT model) is only able to achieve NDCG@10 of 0.431 on TREC DL 2019, while TinyBERT-gBCE (a cross-encoder based on tiny bert trained with gBCE) reaches NDCG@10 of 0.652, a +51% gain over MonoBERT-LARGE. We also show that shallow Cross-Encoders are effective even when used without GPU (e.g., with CPU inference, NDCG@10 decreases only by 3% compared to GPU inference with 50ms latency), which makes Cross-Encoders practical to run even without specialised hardware acceleration.

BibTeX @inproceedings{petrov:ecir2024-gbce, author = {Petrov, Aleksandr A. and Macdonald, Craig and MacAvaney, Sean}, title = {Shallow Cross-Encoders for Low-Latency Retrieval}, booktitle = {Proceedings of the 46th European Conference on Information Retrieval Research}, year = {2024}, url = {https://arxiv.org/abs/2403.20222}, doi = {10.1007/978-3-031-56063-7_10} }