← smac.pub home

Evaluating the Explainability of Neural Rankers

pdf bibtex

Authors: Saran Pandian, Debasis Ganguly, Sean MacAvaney

Appeared in: Proceedings of the 46th European Conference on Information Retrieval Research (ECIR 2024)

Links/IDs:
DOI 10.1007/978-3-031-56066-8_28 DBLP conf/ecir/PandianGM24 arXiv 2403.01981 Google Scholar 7wWfoDgAAAAJ:ZHo1McVdvXMC Semantic Scholar 36fb06e3f855f0f18148a787926964e8fa37f918 Enlighten 312910 smac.pub ecir2024-xair

Abstract:

Information retrieval models have witnessed a paradigm shift from unsupervised statistical approaches to feature-based supervised approaches to completely data-driven ones that make use of the pre-training of large language models. While the increasing complexity of the search models have been able to demonstrate improvements in effectiveness (measured in terms of relevance of top-retrieved results), a question worthy of a thorough inspection is - “how ex- plainable are these models?”, which is what this paper aims to evaluate. In particular, we propose a common evaluation platform to systematically evaluate the explainability of any ranking model (the explanation algorithm being identical for all the models that are to be evaluated). In our proposed framework, each model, in addition to returning a ranked list of documents, also requires to return a list of explanation units or rationales for each document. This meta-information from each document is then used to measure how locally consistent these rationales are as an intrinsic measure of interpretability - one that does not require manual relevance assessments. Additionally, as an extrinsic measure, we compute how relevant these rationales are by leveraging sub-document level relevance assessments. Our findings show a number of interesting observations, such as sentence-level rationales are more consistent, an increase in complexity mostly leads to less consistent explana- tions, and that interpretability measures offer a complementary dimension of evaluation of IR systems because consistency is not well-correlated with nDCG at top ranks.

BibTeX @inproceedings{pandian:ecir2024-xair, author = {Pandian, Saran and Ganguly, Debasis and MacAvaney, Sean}, title = {Evaluating the Explainability of Neural Rankers}, booktitle = {Proceedings of the 46th European Conference on Information Retrieval Research}, year = {2024}, url = {https://arxiv.org/abs/2403.01981}, doi = {10.1007/978-3-031-56066-8_28} }