The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation

link bibtex code 24 citations resource conference paper

Authors: Domenico Dato, Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto

Appeared in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

Links/IDs:

DOI 10.1145/3477495.3531740 DBLP conf/sigir/DatoMN0T22 ACM 3477495.3531740 Google Scholar 7wWfoDgAAAAJ:4JMBOYKVnBMC Semantic Scholar bb2cc299e135cc11f95e585f618a1967b8f8364d Enlighten 268514 smac.pub sigir2022-istella

Abstract:

Neural ranking approaches that use pre-trained language models are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their effectiveness compared to feature-based Learning-to-Rank (LtR) methods has not yet been well-established. A major reason for this is because present LtR benchmarks that contain query-document feature vectors do not contain the raw query and document text needed for neural models. On the other hand, the benchmarks often used for evaluating neural models, e.g., MS MARCO, TREC Robust, etc., provide text but do not provide query-document feature vectors. In this paper, we present Istella22, a new dataset that enables such comparisons by provid- ing both query/document text and strong query-document feature vectors used by an industrial search engine. The dataset consists of a comprehensive corpus of 8.4M web documents, a collection of query-document pairs including 220 hand-crafted features, rele- vance judgments on a 5-graded scale, and a set of textual queries used for testing purposes. Istella22 enables a fair comparison of traditional learning-to-rank and transfer ranking techniques on exactly the same data. LtR models exploit the feature-based rep- resentations of training samples while transformers-based neural rankers can be evaluated on the corresponding textual content of queries and documents. Through preliminary experiments on Is- tella22, we find that neural re-ranking approaches lag behind LtR models in terms of absolute performance. However, LtR models identify the scores from neural models as strong signals. Istella22 enables the IR community to study neural and traditional LtR on the same data.

BibTeX @inproceedings{dato:sigir2022-istella, author = {Dato, Domenico and MacAvaney, Sean and Nardini, Franco Maria and Perego, Raffaele and Tonellotto, Nicola}, title = {The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2022}, doi = {10.1145/3477495.3531740} }