IR From Bag-of-words to BERT and Beyond through Practical Experiments: An ECIR 2021 tutorial with PyTerrier and OpenNIR

Authors: Sean MacAvaney*, Craig Macdonald*, Nicola Tonellotto*

* equal contribution

Appeared in: Proceedings of the 43rd European Conference on Information Retrieval Research (ECIR 2021)

Links/IDs:

smac.pub ecir2021-tutnir

Abstract:

Advances from the natural language processing community have recently sparked a renaissance in the task of adhoc search. Particularly, large contextualized language modeling techniques, such as BERT, have equipped ranking models with a far deeper understanding of language than the capabilities of previous bag-of-words (BoW) models. Applying these techniques to a new task is tricky, requiring knowledge of deep learning frameworks, and significant scripting and data munging. In this tutorial, we provide background on classical (e.g., BoW), modern (e.g., Learning to Rank) and contemporary (e.g., BERT, doc2query) search ranking and re-ranking techniques. Going further, we detail and demonstrate how these can be easily experimentally applied to new search tasks in a new declarative style of conducting experiments exemplified by the PyTerrier and OpenNIR search toolkits. This tutorial is interactive in nature for participants, where each of the 90 minute sessions mixes 50 minutes of explanatory presentation with a 40 minutes hands-on activities using prepared Jupyter notebooks running on the Google Colab platform.

BibTeX @inproceedings{macavaney:ecir2021-tutnir, author = {MacAvaney, Sean and Macdonald, Craig and Tonellotto, Nicola}, title = {IR From Bag-of-words to BERT and Beyond through Practical Experiments: An ECIR 2021 tutorial with PyTerrier and OpenNIR}, booktitle = {Proceedings of the 43rd European Conference on Information Retrieval Research}, year = {2021} }