← smac.pub home

MURR: Model Updating with Regularized Replay for Searching a Document Stream

bibtex long conference paper to appear

Authors: Eugene Yang, Nicola Tonellotto, Dawn Lawrie, Sean MacAvaney, James Mayfield, Douglas Oard, Scott Miller

Appearing in: 47th European Conference on Information Retrieval (ECIR 2025)

Links/IDs:
Enlighten 343693 smac.pub ecir2025-mu

Abstract:

The Internet produces a continuous stream of new documents and user-generated queries. These naturally change over time based on the development of world events and the evolution of language. Neural retrieval models that were trained once on a fixed set of query-document pairs will quickly start misrepresenting newly-created content and queries, leading to less effective retrieval. Traditional statistical sparse retrieval can update collection statistics to reflect these changes in the document language and in the query language. In contrast, continued fine-tuning of the language model underlying neural retrieval approaches such as DPR and ColBERT creates incompatibility with previously-encoded documents. Re-encoding and re-indexing all previously-processed documents can be cost-prohibitive. In this work, we explore updating a neural dual encoder retrieval model without reprocessing past documents in the stream. We propose MURR, a model updating strategy with regularized replay, to ensure the model can still faithfully search existing documents without reprocessing, while continuing to update the model for the latest topics. In our simulated streaming environments, we show that fine-tuning models using MURR leads to more effective and more consistent retrieval results than other strategies as the stream of documents and queries progresses.

BibTeX @inproceedings{yang:ecir2025-mu, author = {Yang, Eugene and Tonellotto, Nicola and Lawrie, Dawn and MacAvaney, Sean and Mayfield, James and Oard, Douglas and Miller, Scott}, title = {MURR: Model Updating with Regularized Replay for Searching a Document Stream}, booktitle = {47th European Conference on Information Retrieval}, year = {2025} }