← smac.pub home

IR From Bag-of-words to BERT and Beyond through Practical Experiments: An ECIR 2021 tutorial with PyTerrier and OpenNIR

bibtex tutorial to appear

Authors: Sean MacAvaney*, Craig Macdonald*, Nicola Tonellotto*

* equal contribution

Appearing in: ECIR 2021

Abstract:

Advances from the natural language processing community have recently sparked a renaissance in the task of adhoc search. Particularly, large contextualized language modeling techniques, such as BERT, have equipped ranking models with a far deeper understanding of language than the capabilities of previous bag-of-words (BoW) models. Applying these techniques to a new task is tricky, requiring knowledge of deep learning frameworks, and significant scripting and data munging. In this tutorial, we provide background on classical (e.g., BoW), modern (e.g., Learning to Rank) and contemporary (e.g., BERT, doc2query) search ranking and re-ranking techniques. Going further, we detail and demonstrate how these can be easily experimentally applied to new search tasks in a new declarative style of conducting experiments exemplified by the PyTerrier and OpenNIR search toolkits. This tutorial is interactive in nature for participants, where each of the 90 minute sessions mixes 50 minutes of explanatory presentation with a 40 minutes hands-on activities using prepared Jupyter notebooks running on the Google Colab platform.

BibTeX @inproceedings{macavaney:ecir2021-tutnir, author = {MacAvaney, Sean and Macdonald, Craig and Tonellotto, Nicola}, title = {IR From Bag-of-words to BERT and Beyond through Practical Experiments: An ECIR 2021 tutorial with PyTerrier and OpenNIR}, booktitle = {ECIR}, year = {2021} }