Online Distillation for Pseudo-Relevance Feedback

* equal contribution

Appeared in: arXiv

Links/IDs:

DBLP journals/corr/abs-2306-09657 arXiv 2306.09657 Google Scholar 7wWfoDgAAAAJ:J_g5lzvAfSwC Semantic Scholar 43fac75122651e2f840e059cc7174b92d23deadf smac.pub arxiv2023-odis

Abstract:

Model distillation has emerged as a prominent technique to improve neural search models. To date, distillation taken an offline approach, wherein a new neural model is trained to predict relevance scores between arbitrary queries and documents. In this paper, we explore a departure from this offline distillation strategy by investigating whether a model for a specific query can be effectively distilled from neural re-ranking results (i.e., distilling in an online setting). Indeed, we find that a lexical model distilled online can reasonably replicate the re-ranking of a neural model. More importantly, these models can be used as queries that execute efficiently on indexes. This second retrieval stage can enrich the pool of documents for re-ranking by identifying documents that were missed in the first retrieval stage. Empirically, we show that this approach performs favourably when compared with established pseudo relevance feedback techniques, dense retrieval methods, and sparse-dense ensemble "hybrid" approaches.

BibTeX @article{macavaney:arxiv2023-odis, author = {MacAvaney, Sean and Wang, Xi}, title = {Online Distillation for Pseudo-Relevance Feedback}, year = {2023}, url = {https://arxiv.org/abs/2306.09657}, journal = {arXiv}, volume = {abs/2306.09657} }