← smac.pub home

mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval

Best Full (Student) Paper Honorable Mention pdf bibtex long conference paper

Authors: Orion Weller, Benjamin Chang, Eugene Yang, Mahsa Yarmohammadi, Sam Barham, Sean MacAvaney, Arman Cohan, Luca Soldaini, Benjamin Van Durme, Dawn Lawrie

Appeared in: 47th European Conference on Information Retrieval (ECIR 2025)

Links/IDs:
arXiv 2501.19264 Google Scholar 7wWfoDgAAAAJ:OU6Ihb5iCvQC Enlighten 343694 smac.pub ecir2025-mfollowir

Abstract:

Retrieval systems generally focus on web-style queries that are short and underspecified. However, advances in language models have facilitated the nascent rise of retrieval models that can understand more complex queries with diverse intents. However, these efforts have focused exclusively on English; therefore, we do not yet understand how they work across languages. We introduce mFollowIR, a multilingual benchmark for measuring instruction-following ability in retrieval models. mFollowIR builds upon the TREC NeuCLIR narratives (or instructions) that span three diverse languages (Russian, Chinese, Persian) giving both query and instruction to the retrieval models. We make small changes to the narratives and isolate how well retrieval models can follow these nuanced changes. We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance. Despite recent progress in instruction following with English retrievers, we reveal that multilingual models struggle with effective instruction utilization. We further show that simply translating English instruction-retrieval data provides large gains, indicating that more focus is needed on developing multilingual data and methods for instruction-based retrievers.

BibTeX @inproceedings{weller:ecir2025-mfollowir, author = {Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn}, title = {mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval}, booktitle = {47th European Conference on Information Retrieval}, year = {2025}, url = {https://arxiv.org/abs/2501.19264} }