← smac.pub home

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

link bibtex code slides 35 citations video short conference paper

Authors: Sean MacAvaney, Luca Soldaini, Nazli Goharian

Appeared in: Proceedings of the 42nd European Conference on Information Retrieval Research (ECIR 2020)

Links/IDs:
DOI 10.1007/978-3-030-45442-5_31 DBLP conf/ecir/MacAvaneySG20 arXiv 1912.13080 Google Scholar 7wWfoDgAAAAJ:Se3iqnhoufwC Semantic Scholar 0cc95c6ed0f174e62023d1ad035c800029d5e063 smac.pub ecir2020-multiling

Abstract:

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsu- pervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.

BibTeX @inproceedings{macavaney:ecir2020-multiling, author = {MacAvaney, Sean and Soldaini, Luca and Goharian, Nazli}, title = {Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning}, booktitle = {Proceedings of the 42nd European Conference on Information Retrieval Research}, year = {2020}, url = {https://link.springer.com/chapter/10.1007/978-3-030-45442-5_31}, doi = {10.1007/978-3-030-45442-5_31}, pages = {246--254} }

Tweets