← smac.pub home

CODEC: Complex Document and Entity Collection

pdf bibtex resource conference paper

Authors: Iain Mackie, Paul Owoicho, Carlos Gemmell, Sophie Fischer, Sean MacAvaney, Jeff Dalton

Appeared in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

Links/IDs:
arXiv: 2205.04546
smac.pub: sigir2022-codec

Abstract:

CODEC is a new document and entity ranking benchmark that fo- cuses on complex research topics. We target essay-style information needs of social science researchers, i.e. ‘How has the UK’s Open Banking Regulation benefited Challenger Banks?’ CODEC includes 36 topics developed by researchers and a new focused web corpus with semantic annotations including entity links. It includes expert judgments on 15,369 documents and entities (426.9 per topic) from diverse automatic and interactive manual runs. The manual runs include over 300 query reformulations, providing data for query performance prediction and automatic rewriting evaluation. CODEC includes analysis of state-of-the-art systems, including dense retrieval and neural re-ranking. The results show the topics are challenging with headroom for document and entity ranking improvement. Query expansion with entity information shows significant gains on document ranking ([email protected] improves by 8%), indicating the resource’s value for evaluating and improving entity- oriented search. We also show that the manual query reformulations improve document ranking and entity ranking performance. Overall, CODEC, provides challenging research topics to support developing and evaluating new entity-centric search methods.

BibTeX @inproceedings{mackie:sigir2022-codec, author = {Mackie, Iain and Owoicho, Paul and Gemmell, Carlos and Fischer, Sophie and MacAvaney, Sean and Dalton, Jeff}, title = {CODEC: Complex Document and Entity Collection}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2022}, url = {https://arxiv.org/abs/2205.04546}, doi = {10.1145/3477495.3531712} }