← smac.pub home

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

pdf arxiv bibtex code slides dblp: conf/emnlp/MacAvaneyCG20 ACL: 2020.emnlp-main.341 short conference paper

Authors: Sean MacAvaney, Arman Cohan, Nazli Goharian

Appeared in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)


With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus. Clinicians, researchers, and policy-makers need to be able to effectively search these articles. In this work, we present a zero-shot ranking algorithm that adapts to COVID-related scientific literature by filtering training data from another collection down to medical-related queries, using a neural re-ranking model pre-trained on scientific text (SciBERT), and filtering the candidate document collection. Ablation analysis on the TREC-COVID dataset shows that each of these components is beneficial. The method ranks top among zero-shot methods on the TREC COVID Round 1 leaderboard, and exhibits a [email protected] of 0.80 and an [email protected] of 0.68 when evaluated on both Round 1 and 2 judgments. The method even outperforms models trained or tuned on TREC-COVID data. As one of the first search methods to thoroughly evaluate COVID-19 search, we hope that this not only serves as a strong baseline, but also helps in the global crisis.

BibTeX @inproceedings{macavaney:emnlp2020-sledge, author = {MacAvaney, Sean and Cohan, Arman and Goharian, Nazli}, title = {SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search}, booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing}, year = {2020}, url = {https://arxiv.org/abs/2010.05987} }