SNOMED International, along with a team of research and technology focused subject matter experts, recently contributed to a research paper documenting the development of entity linking models to link spans of free-text data in clinical notes with specific topics in the clinical terminology SNOMED CT. The paper, which had been planned as part of the competition, was recently published in the highly respected, peer-reviewed Journal of the American Medical Informatics Association, the outcomes of which were derived from SNOMED International’s Entity Linking Challenge held in 2024.
The SNOMED CT Entity Linking Challenge, which ran from January to March 2024, trained machine learning models to link clinical notes with specific topics based on the largest publicly available dataset of labeled clinical notes that had been de-identified and annotated with SNOMED CT concepts. It was supported by platform host partner DrivenData, which hosts online data science competitions; AI consultancy Veratai; Physionet, the Research Resource for Complex Physiologic Signals, and an annotation team.
The paper, which was co-authored by SNOMED International, Veratai and the winning teams of the competition, describes the basis of the work – a large set of 74,808 annotations curated across 272 discharge notes spanning 6,624 unique clinical concepts – and the evaluation process. It compares the approaches used by the winning solutions and highlights the most challenging factors affecting clinical entity linking models. It also describes the data set and the policy-based approach to the development of the “ground truth” data set and provides an example of its approach to scoring. Importantly, it analyzes the reasons for low-scoring concepts, and details a number of lessons learned.
Read the release here.
1 Comment
Yongsheng Gao
Congratulations!