Effective matching of patients to clinical trials using entity extraction and neural re-ranking

Kusa, Wojciech; Mendoza, Óscar E; Knoth, Petr; Pasi, Gabriella and Hanbury, Allan (2023). Effective matching of patients to clinical trials using entity extraction and neural re-ranking. Journal of biomedical informatics, 144, article no. 104444.

DOI: https://doi.org/10.1016/j.jbi.2023.104444


Introduction Clinical trials (CTs) often fail due to inadequate patient recruitment. Finding eligible patients involves comparing the patient's information with the CT eligibility criteria. Automated patient matching offers the promise of improving the process, yet the main difficulties of CT retrieval lie in the semantic complexity of matching unstructured patient descriptions with semi-structured, multi-field CT documents and in capturing the meaning of negation coming from the eligibility criteria.
Objectives This paper tackles the challenges of CT retrieval by presenting an approach that addresses the patient-to-trials paradigm. Our approach involves two key components in a pipeline-based model: (i) a data enrichment technique for enhancing both queries and documents during the first retrieval stage, and (ii) a novel re-ranking schema that uses a Transformer network in a setup adapted to this task by leveraging the structure of the CT documents.
Methods We use named entity recognition and negation detection in both patient description and the eligibility section of CTs. We further classify patient descriptions and CT eligibility criteria into current, past, and family medical conditions. This extracted information is used to boost the importance of disease and drug mentions in both query and index for lexical retrieval. Furthermore, we propose a two-step training schema for the Transformer network used to re-rank the results from the lexical retrieval. The first step focuses on matching patient information with the descriptive sections of trials, while the second step aims to determine eligibility by matching patient information with the criteria section.
Results Our findings indicate that the inclusion criteria section of the CT has a great influence on the relevance score in lexical models, and that the enrichment techniques for queries and documents improve the retrieval of relevant trials. The re-ranking strategy, based on our training schema, consistently enhances CT retrieval and shows improved performance by 15% in terms of precision at retrieving eligible trials.
Conclusion The results of our experiments suggest the benefit of making use of extracted entities. Moreover, our proposed re-ranking schema shows promising effectiveness compared to larger neural models, even with limited training data. These findings offer valuable insights for improving methods for retrieval of clinical documents.

Viewing alternatives

Download history


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions