link bibtex code slides 11 citations reproducibility paper
Appeared in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)
Abstract:
Dense retrieval approaches are of increasing interest because they can better capture contextualised similarity compared to sparse retrieval models such as BM25. Among the most prominent of these approaches is TCT-ColBERT, which trains a light-weight "student" model from a more expensive "teacher" model. In this work, we take a closer look into TCT-ColBERT concerning its reproducibility and replicability. To structure our study, we propose a three-stage perspective on reproducing the training, inference, and evaluation of model-focused papers, each using artefacts produced from different stages in the pipeline. We find that --- perhaps as expected --- precise reproduction is more challenging when the complete training process is conducted, rather than just inference from a released trained model. Each stage provides the opportunity to perform replication and ablation experiments. We are able to replicate (i.e., produce an effective independent implementation) for model inference and dense indexing/retrieval, but are unable to replicate the training process. We conduct several ablations to cover gaps in the original paper, and make the following observations: (1) the model can function as an inexpensive re-ranker, establishing a new Pareto-optimal result; (2) the index size can be reduced by using lower-precision floating point values, but only if ties in scores are handled appropriately; (3) training needs to be conducted for the entire suggested duration to achieve optimal performance; and (4) student initialisation from the teacher is not necessary.
BibTeX @inproceedings{wang:sigir2022-tctrepro, author = {Wang, Xiao and MacAvaney, Sean and Macdonald, Craig and Ounis, Iadh}, title = {An Inspection of the Reproducibility and Replicability of TCT-ColBERT}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year = {2022}, doi = {10.1145/3477495.3531721} }