← smac.pub home

CODEC: Complex Document and Entity Collection

pdf bibtex code 9 citations resource conference paper

Authors: Iain Mackie, Paul Owoicho, Carlos Gemmell, Sophie Fischer, Sean MacAvaney, Jeff Dalton

Appeared in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

DOI 10.1145/3477495.3531712 DBLP conf/sigir/MackieOGFM022 ACM 3477495.3531712 arXiv 2205.04546 Google Scholar 7wWfoDgAAAAJ:j3f4tGmQtD8C Semantic Scholar 21f7c8135396ee19e78fe6c9002bddf3ac6c056d Enlighten 269440 smac.pub sigir2022-codec


CODEC is a new document and entity ranking benchmark that fo- cuses on complex research topics. We target essay-style information needs of social science researchers, i.e. ‘How has the UK’s Open Banking Regulation benefited Challenger Banks?’ CODEC includes 36 topics developed by researchers and a new focused web corpus with semantic annotations including entity links. It includes expert judgments on 15,369 documents and entities (426.9 per topic) from diverse automatic and interactive manual runs. The manual runs include over 300 query reformulations, providing data for query performance prediction and automatic rewriting evaluation. CODEC includes analysis of state-of-the-art systems, including dense retrieval and neural re-ranking. The results show the topics are challenging with headroom for document and entity ranking improvement. Query expansion with entity information shows significant gains on document ranking (Recall@1000 improves by 8%), indicating the resource’s value for evaluating and improving entity- oriented search. We also show that the manual query reformulations improve document ranking and entity ranking performance. Overall, CODEC, provides challenging research topics to support developing and evaluating new entity-centric search methods.

BibTeX @inproceedings{mackie:sigir2022-codec, author = {Mackie, Iain and Owoicho, Paul and Gemmell, Carlos and Fischer, Sophie and MacAvaney, Sean and Dalton, Jeff}, title = {CODEC: Complex Document and Entity Collection}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2022}, url = {https://arxiv.org/abs/2205.04546}, doi = {10.1145/3477495.3531712} }