SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

pdf bibtex code slides 40 citations short conference paper

Authors: Sean MacAvaney, Arman Cohan, Nazli Goharian

Appeared in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Links/IDs:

DOI 10.18653/v1/2020.emnlp-main.341 DBLP conf/emnlp/MacAvaneyCG20 ACL 2020.emnlp-main.341 arXiv 2010.05987 Google Scholar 7wWfoDgAAAAJ:M3ejUd6NZC8C Semantic Scholar 05598331268614305ff844cea001f5b22f3519c9 smac.pub emnlp2020-sledge

Abstract:

With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus. Clinicians, researchers, and policy-makers need to be able to effectively search these articles. In this work, we present a zero-shot ranking algorithm that adapts to COVID-related scientific literature by filtering training data from another collection down to medical-related queries, using a neural re-ranking model pre-trained on scientific text (SciBERT), and filtering the candidate document collection. Ablation analysis on the TREC-COVID dataset shows that each of these components is beneficial. The method ranks top among zero-shot methods on the TREC COVID Round 1 leaderboard, and exhibits a P@5 of 0.80 and an nDCG@10 of 0.68 when evaluated on both Round 1 and 2 judgments. The method even outperforms models trained or tuned on TREC-COVID data. As one of the first search methods to thoroughly evaluate COVID-19 search, we hope that this not only serves as a strong baseline, but also helps in the global crisis.

BibTeX @inproceedings{macavaney:emnlp2020-sledge, author = {MacAvaney, Sean and Cohan, Arman and Goharian, Nazli}, title = {SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search}, booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing}, year = {2020}, url = {https://arxiv.org/abs/2010.05987}, doi = {10.18653/v1/2020.emnlp-main.341} }