When performing data analytics over clinical data, it is important to understand the interdependency between the terminology and the structural information model. For example, it is not sufficient to find a diagnosis of 56265001 |heart disease|, and make the assumption that the patient has heart disease. Instead, the surrounding information model must be considered to discover whether this is, for example, a confirmed diagnosis for the patient themselves, a suspected or preliminary diagnosis for the patient, or perhaps a family history of heart disease in the patient's paternal grandfather. Contextual or qualifying information about a code may appear in a variety of places, including:
By understanding where and how this contextual or qualifying information is represented, more appropriate queries can be created.
When the same semantics may be represented in both the information model and the terminology, there is also a risk of ambiguity as to how these two representations should be combined. This is clearly demonstrated by models in which both the information model and the terminology can represent 'negation' or 'absence'. Does the combination of 'negation' in the information model and 'absence' in the terminology indicate:
It is important in these situations to have clear rules about how the semantics in the information model and the terminology should be combined.
The challenge often becomes even greater when heterogeneous data sources are integrated. When different information models represent the same semantics using different combinations of structure versus terminology, retrieval and reuse may miss similar information. To avoid false negatives or false positives in the query results, the integration and/or analytics processes must resolve these differences.
For example, in Figure below, the system on the left uses the 'Family history' structural heading to indicate that the selected disease is a family history, while the system on the right precoordinates this within the terminology. When integrating or querying across these data sources, these semantics need to be harmonized to ensure accurate queries can be performed.
Two ways of recording family history of diabetes mellitus
Even when the same information model is used, different systems may populate this
model with differing levels of precoordination. For example, the three clinical systems
shown below in
each collect data about a 'suspected lung cancer' diagnosis in a different way. For
this reason, when given a common data model (as shown in
), different systems may populate this in different ways. When this occurs, queries
must be careful to consider all possible representations of the data, to ensure that
contextual and qualifying information about each code is correctly interpreted.
Three ways of recording suspected lung cancer
Three ways of populating a common Problem Diagnosis model
SNOMED CT is in the unique position to be able to resolve many of these challenges, using the techniques described in sections 6.4 Description Logic Over Terminology and 6.5 Description Logic Over Terminology and Structure. For example, SNOMED CT enables the computation of equivalence and subsumption between alternative representations of data. For example, the postcoordinated expression
22253000 |pain|: 363698007 |finding site| = 56459004 |foot|
(which can be represented either in a single data element or using two separate data elements for 22253000 |pain| and 56459004 |foot|) can be automatically determined to be equivalent to the precoordinated concept 47933007 |foot pain| (stored in a single data element).
Some cases exist, however, where SNOMED CT is not currently able to automatically establish equivalence. These cases primarily relate to concepts for which the SNOMED CT concept model does not yet fully model their meaning. For example, the two approaches for representing a 'twin pregnancy' shown below ( ) are currently not able to be computed as equivalent using SNOMED CT.
Two non-equivalent ways of recording a twin pregnancy using SNOMED CT
The SNOMED CT concept model continues to be extended to support equivalence and subsumption testing within an increasing number of hierarchies of SNOMED CT.