DiffIR: Exploring Differences in Ranking Models' Behavior

pdf bibtex code 13 citations demonstration paper

Authors: Kevin Martin Jose, Thống Nguyen, Sean MacAvaney, Jeff Dalton, Andrew Yates

Appeared in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021)

Links/IDs:

DOI 10.1145/3404835.3462784 DBLP conf/sigir/JoseNM0Y21 ACM 3404835.3462784 Google Scholar 7wWfoDgAAAAJ:9ZlFYXVOiuMC Semantic Scholar 6aede24ea510f2116699dc37ad014bbc55477caa Enlighten 239078 smac.pub sigir2021-diffir

Abstract:

Understanding and comparing the behavior of retrieval models is a fundamental challenge. It requires going beyond examining average effectiveness and even per-query metrics because these do not reveal key differences in how ranking models' behavior impacts individual results. DiffIR is a new open-source web tool to assist with qualitative ranking analysis by visually `diffing' system rankings at the individual result level for queries where behavior significantly diverges. Using one of several configurable similarity measures, it identifies queries in which the rankings of models compared have important differences in individual rankings and provides a visual web interface to compare system rankings side-by-side. DiffIR additionally supports a model-specific approach to understanding based on custom term importance weight files. These support studying the behavior of interpretable models, such as neural retrieval methods that produce document scores based on a similarity matrix or based on a single document passage. Observations from this tool complement neural probing approaches like ABNIRML to generate quantitative tests. We provide an illustrative use case of DiffIR by studying the qualitative differences between recently developed neural ranking models on standard TREC benchmark datasets.

BibTeX @inproceedings{jose:sigir2021-diffir, author = {Jose, Kevin Martin and Nguyen, Thống and MacAvaney, Sean and Dalton, Jeff and Yates, Andrew}, title = {DiffIR: Exploring Differences in Ranking Models' Behavior}, booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2021}, doi = {10.1145/3404835.3462784} }