link bibtex code slides 35 citations video short conference paper
Appeared in: Proceedings of the 42nd European Conference on Information Retrieval Research (ECIR 2020)
Abstract:
While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsu- pervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.
BibTeX @inproceedings{macavaney:ecir2020-multiling, author = {MacAvaney, Sean and Soldaini, Luca and Goharian, Nazli}, title = {Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning}, booktitle = {Proceedings of the 42nd European Conference on Information Retrieval Research}, year = {2020}, url = {https://link.springer.com/chapter/10.1007/978-3-030-45442-5_31}, doi = {10.1007/978-3-030-45442-5_31}, pages = {246--254} }
this was a fun paper to work on! turns out that transformer-based LMs are good at zero-shot document retrieval, too! https://t.co/sRoMnxF90s
— Luca Soldaini (@soldni) December 9, 2019