← smac.pub home

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

pdf arxiv bibtex short conference paper to appear

Authors: Sean MacAvaney, Luca Soldaini, Nazli Goharian

Appearing in: Proceedings of the 42nd European Conference on Information Retrieval Research (ECIR 2020)

Abstract:

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsu- pervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.

BibTeX @inproceedings{macavaney:ecir2020-multiling, author = {MacAvaney, Sean and Soldaini, Luca and Goharian, Nazli}, title = {Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning}, booktitle = {Proceedings of the 42nd European Conference on Information Retrieval Research}, year = {2020}, url = {https://arxiv.org/abs/1912.13080} }

Tweets