The Information Retrieval Experiment Platform

Best Paper pdf bibtex code 3 citations resource conference paper

Authors: Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

Appeared in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

DOI 10.1145/3539618.3591888 DBLP conf/sigir/FrobeRMDRB0HP23 arXiv 2305.18932 Google Scholar 7wWfoDgAAAAJ:O3NaXMp0MMsC Semantic Scholar 73feb5cfe491342d52d47e8817d113c072067306 Enlighten 296336 smac.pub sigir2023-tirex


We integrate ir_datasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform to promote more standardized, reproducible, and scalable retrieval experiments---and ultimately blinded experiments in IR. Standardization is achieved when the input and output of an experiment are compatible with ir_datasets and ir_measures, and the retrieval approach implements PyTerrier's interfaces. However, none of this is a must for reproducibility and scalability, as TIRA can run any dockerized software locally or remotely in a cloud-native execution environment. Version control and caching ensure efficient (re)execution. TIRA allows for blind evaluation when an experiment runs on a remote server/cloud not under the control of the experimenter. The test data and ground truth are then hidden from view, and the retrieval software has to process them in a sandbox that prevents data leaks. The platform currently includes 15 corpora (1.9 billion documents) on which 32 well-known shared tasks are based, as well as Docker images of 50 standard retrieval approaches. Within this setup, we were able to automatically run and evaluate the 50 approaches on the 32 tasks (1600 runs) in less than a week. A hosted version of the IR Experiment Platform is open for submissions and will finally be integrated with the IR Anthology.

