Page tree

Date & Time

20:00 UTC Wednesday 22nd November 2017

Teleconference Details

To join the meeting please go to https://snomed.zoom.us/j/471420169

Further information can be found at SLPG meeting information

Goals

  • Review proposed Query language examples

Apologies


Agenda and Meeting Notes

Description
Owner
Notes

Welcome and apologies


Query Language
  • Review proposed Query Language examples
  • Consider issues and potential resolutions
Confirm next meeting date/time

Next meeting to be held at 20:00 UTC on Wednesday 20th December 2017

  File Modified
Microsoft Powerpoint Presentation Draft SNOMED Query Language - Examples for Discussion.pptx 2017-Nov-20 by Linda Bird


  • No labels

10 Comments

  1. Linda, all

    Looking through the examples I wonder if we should consider some of the following:

    1. much more explicit reference to the RF2 tables and attributes when adding {{  }}  filters. No doubt this would come at the expense of 'human-readability' in all but the most trivial cases, but given the goal of  '...query[ing] over SNOMED CT RF2 content...' perhaps human-readability was always going to be a casualty of the query language. I have in mind, in particular, the examples where Concepts are followed by a filter that makes reference to properties of terms/descriptions - an inevitably further entries in language reference sets (and, see below, relationships). Whilst there is probably a predictable shorthand that would avoid table names ('a term filter on a concept necessarily refers to 'Descriptions.term WHERE Descriptions.ConceptId=Concepts.Id' etc.), it strikes me as preferable to agree a notation (for use inside filter braces) that ties each filter property to a table (even if it still leaves the exact nature of the 'join' implicit). This could be the prefixes already offered, or (if required) could perhaps be a compromise of shorter aliasies (e.g. << 64572001 |Disease| {{ D.term = “*heart*”, D.typeId =  synonym, D.active=0, C.module=core, D.module='UKClinicalModule' }} - any active core disease concepts where there's a (now inactive) UK description matching “*heart*”). Not quite sure how this would extend to various refsets, but there's probably a scalable shorthand (like C, D, R for the various types/patterns, and then tightening up on each actual refset by reference to its refsetId etc). The alternative 'implicit until there's a chance of ambiguity' approach seems too loose. In particular, << 246061005 | Attribute | {{ active=0 }} (after an attribute name - see below) would probably mostly be used to refer to the status of the relationship, but would be indistinguishable from the same filter looking only for attributes where the name is now inactive. 
    2. We touched on 'where' each filter should go last time. looking at SPARQL and CYPHER (I'm sure other participants know of other relevant examples), I wonder if the following would work:
      1. A 'global' header/prefix filter that 'comes first' declares the substrate, any global cross-table filters, and maybe any tokens/shorthands used later.
      2. Any subsequent filter only has the scope of the resolved set of concepts immediately preceding it, unless the scope is 'extended' by parentheses (like the examples on slide 23)
      3. Relationship filters would sit:
        1. after the attribute Name for role relationships - thus being filters that could contain C, D, Refset and R filter prefixes
        2. for hierarchical relationships the filter would sit after each constraint operator (e.g. << {{ R.module=UKExtension, R.active=0 }} 64572001 |Disease|. I accept this probably needs further unpacking. I recall you said there were problems with this approach, but I wonder if this is more a consequence the constraint operator 'shorthand', and where a more verbose query representation of ancestry/descendants (as used in e.g. CYPHER) would be isomorphic with role relationships.

    Ed

  2. Dear All,

    I support Ed's no 1 in that I also would prefer explicit references to tables rather than having to assume (or rely on defaults) which component the effectiveTime applies to.

    I also have another issue about having multiple filters applying to components of the same kind. When querying translations it is sometimes needed to query for EN terms and SV terms simultaneously to check consistency of translations. E.g. concepts with EN terms "*substance*" and SE terms "*medel*" (I actually needed to do exactly this yesterday!). Do we allow stacking of filters, like * {{ languageCode = EN, term = "*substance*" }} {{  languageCode = SV, term = "*medel*" }}. The formal semantics of filters need to be defined as well, here I have asumed that filters evaulate to sets of concept IDs and that filter application evaluate to the intersection of the filter set and the preceding result set, either a result of ECL evaluation or another filter.

    /Daniel


  3. Thanks for your thoughts on this Ed and Daniel. Here are some ramblings in response:

    • I'd be happy for the substrate and/or other bits and pieces we need to be declared at the start of the query... however, I remember we had a use case where we wanted to compare 2 snapshot versions - for example '20170731 international edition' MINUS '20170131 international edition', to find the concepts that were in the 20170731 version that were not in 20170131 version. To support use cases such as this (and many other examples) I think we need to allow nesting of queries (with filters) inside other queries. This may not lend itself to defining the substrate in a single header ... just something to think about.
    • I like the suggestion to make the filters more explicit in terms of the table they refer to. And I like the shorter form of these filter prefixes that Ed has suggested - namely C for Concept, D for Description, R for Relationship  ... and I would suggest adding M for Refset Member. We will have to modify the examples to check that this always works as we would expect.
    • The example that Ed gave of "<< 64572001 |Disease| {{ D.term = “*heart*”, D.typeId =  synonym, D.active=0, C.module=core, D.module='UKClinicalModule' }}" raised some interesting points, including Daniel's suggestion of how we filter on multiple descriptions. It would be great if we can get really clear semantics on exactly what each filter means, and how the filter should be evaluated. I don't think it's clear from this example that the description being constrained is necessarily the same description. The letter "D" appears 4 times in the example filter - so are those 4 filters referring to the same Description, or 4 different Descriptions? We can assert that if it is in the same filter, then it must refer to the same Description ... but visually I don't think this is necessarily clear. So, with this in mind, what does the group think about this idea (... just brainstorming here):
      • We could 'type' the filters, using the C:, D:, R:, and M: labels to indicate what type of component the subsequent criteria in the filter applies to (e.g. {{ D:  term = "*heart*" }}
        • Each C type of filter could be used 0..1 times, and would mean "The concept itself has the following properties" 
          • i.e. For each concept C in the result set, C has the following concept properties
        • Each D type of filter could be used 0..* times, and would mean "There exists a description on the given concept, which has the following properties"
          • i.e. For each concept C in the result set, there exists a description D with D.conceptId = C.id plus the listed description properties
        • Each M type of filter could be used 0..* times, and would mean "There exists a refset member whose referenced component is the given concept OR whose referenced component is a description on the given concept, which has the following properties"
          • ie. For each concept C in the result set, there exists a refset member M with M.referencedComponentId = C.id and the listed refset member properties
      • We could treat each filter as applying to a (possibly) different component (e.g. "There exists a description X for which criteria 1 is true" and "There exists a description Y for which criteria 2 is true" logic)
        • Question: Do we need a way of saying that the 2 descriptions must necessarily be separate/different descriptions?
      • We could treat each criteria within a single filter as applying to a single component (e.g. "There exists a description X for which criteria 1 is true AND criteria 2 is true").
      • So, for example, Ed's query would look like - Options (for discussion)
        • << 64572001 |Disease| {{ C: module = core }} {{ D: term = “*heart*”, typeId =  synonym, active=0, module='UKClinicalModule' }}
        • << 64572001 |Disease| {{ C.module = core ,  D.term = “*heart*”, D.typeId =  synonym, D.active=0, D.module='UKClinicalModule' }}
        • ... which would logically mean "Descendants or self of |Disease|, where the concept has module = |core|, and where there exists a description on this concept, which has term = "*heart*", typeId = synonym, active = 0 and module = |UKClinicalModule|.
      • This would also allow Daniel's query to be defined like - Options (for discussion)
        • << 64572001 |Disease| {{ D: term = “*substance*”, languageCode = "EN"}}, {{ D: term = "*medel*", languageCode = "SE" }}
        • << 64572001 |Disease| {{ D1: term = “*substance*”, languageCode = "EN"}}, {{ D2: term = "*medel*", languageCode = "SE" }}
        • << 64572001 |Disease| {{ D1.term = “*substance*”, D2.languageCode = "EN",  D2.term = "*medel*", D1.languageCode = "SE", D1.term = D2.term, D1.id != D2.id }}
          • ... which would logically mean "Descendants or self of |Disease|, where there exists a description on this concept, which has term = "*substance*" and languageCode = "EN", AND where there also exists a description, which has the term = "*medel*" and languageCode = "*SE*" }}
    • With respect to Relationship filters, I think I agree with Ed ... but let's follow through the discussion a bit more:
      • We could (note that I am NOT recommending this) allow relationship filters to apply to the results of a query (as per the Concept and Description filters), such that C.id = R.sourceId. For example:
        • << 64572001 |Disease| {{ C: definitionStatus = primitive }} {{ R: characteristicType = stated, typeId != 116680003 }}
          • Which would mean "Descendants or self of |Disease|, where the concept has a definitionStatus = primitive, and where there exists a relationship on this concept (ie C.id = R.sourceId) where characteristicType = stated and typeId NOT = |is a|."
      • However, what if we wanted to join on R.destinationId instead? Perhaps (and I'm NOT recommending this) we would need to specify the different join condition like:
        • << 64572001 |Disease| {{ C: definitionStatus = primitive }} {{ R [destinationId]: characteristicType = stated, typeId !116680003 }}
          • Which could (?) mean "Descendants or self of |Disease|, where the concept has a definitionStatus = primitive, and where there exists a relationship where C.id = R.destinationId and R.characteristicType = stated and R.typeId NOT = |is a|."
      • If instead we adopt Ed's suggestion of applying the relationship filter to a refinement (e.g. by placing it directly after the attribute name) - we could say:
        • << 64572001 |Disease| : 363698007 |Finding site| {{ R: characteristicType = stated }} = << 39607008 |Lung structure|
          • ... which would mean "The descendants or self of |Disease|, C, where there exists a relationship where C.id = R.sourceId and R.typeId = |Finding site| and R.characteristicType = stated.
        • I think this makes sense (ie to join on both sourceId and typeId). My only concern with this approach is that IF we allow the Relationship filter to be applied here, as well as the more general case above (where R.typeId is not restricted), that this example might be confused with the similar query:
          • << 64572001 |Disease| : (363698007 |Finding site| {{ R: characteristicType = stated }}) = << 39607008 |Lung structure|
            • Which may mean that the attribute concept A used in the refinement is in the set {|Finding site|} where there exists a relationship for which A.id = R.sourceId and R.characteristicType = stated .... or in other words, the relationship filter applies to the relationships on the concept |Finding site|, rather than applying to the relationships on |Disease| which have a type of |Finding site|.
      •  But, if we agree to ONLY do it Ed's way .... that is a relationship filter must ALWAYS be placed after an attribute A, and the join condition is ALWAYS that C.id =R.sourceId AND R.typeId = A, THEN any ambiguity probably goes away.
      • Please note, that it would be nice if the reverse flag "R" changed the join condition to being C.id = R.destinationId AND R.typeId = A. For example:
        • << 91723000 |Anatomical structure| : R 363698007 |Finding site| {{ R: characteristicType = stated }}) = << 125605004 |Fracture of bone|
          • ... would/could mean "The descendants of anatomical structure that are the finding site of a descendant or self of fracture of bone, where the finding site relationship has characteristicType = stated"
      • If we now consider applying relationship filters to constraint functions "<", "<<", ">", ">>" then the logic is still not completely clear to me:
        • "< X", "<< X", "<! X"  would imply a join condition of C.id = R.sourceId, R.typeId = |is a|. However, does this also imply that R.destinationId = X? For example:
          • << {{ R: module=UKExtension, active=0 }} 64572001 |Disease|
            • Does this mean:
              • (a) The descendants or self of |Disease| which have an |is a| relationship to |Disease|, where this |is a| relationship is inactive and in the UKExtension module.
              • (b) The descendants or self of |Disease| which have an |is a| relationship to a concept that is a descendant of |Disease|, where this |is a| relationship is inactive and in the UKExtension module
              • (c) The descendants or self of |Disease| which have an |is a| relationship to any concept , where that |is a| relationship is inactive and in the UKExtension module
        • Similarly, the interpretation of a relationship filter on "> X" ">> X", ">! X"  would need to be defined consistently.

    Apologies for the long reply, but thanks again for your thoughts ... we can discuss this further at this week's SLPG meeting.

  4. A short additional note -

    • The other potential alternative to Ed's Relationship filter suggestion is to group the refinement's attribute-value pair (with brackets) and apply the relationship filter to the attribute-value pair. For example:
      • << 64572001 |Disease| : (363698007 |Finding site| = << 39607008 |Lung structure|) {{ R: characteristicType = stated }}
        • This indicates more clearly that the filter applies to the tuple - (1) focus_concept (2) attribute (3) value
        • However, some may consider it a disadvantage that this approach requires the attribute-value pair to be bracketed to ensure that it is unambiguous as to which relationships the filter applies
        • There is also the potential for this approach to be harder to read, if the attribute's value is a long nested expression, resulting in the relationship filter moving too far away from the respective attribute.

    ... but I thought I'd mention the alternative, for the benefit of the discussion.

    Thanks!
    Linda.

  5. Indeed filters on relationships are tricky.  As I see it, there are four general expression forms for ECL to consider:

    1. A
    2. A : B = C
    3. A.B
    4. A : { B = C }

    It seems to me that a filter that immediately follows any of A, B, or C above has a clear single concept in scope (bound to 'C' inside the filter). However, when and where do we get a relationship binding?

    1. no relationships in play
    2. only after the C? or what about after the B?  after the colon or equals ?
    3. only after the B? or after the period ? (analogous to after the colon / equals)
    4. as for 2, but also inside or outside the { } ?

    Furthermore, are relationships always bound via either sourceId or destinationId, or is it possible to have unbound relationships, or are they never bound and always require explicit binding?

    No answers from me yet - just trying to frame my thinking & assumptions.


  6. There's also the issue of when filters get applied in the evaluation process. This most obviously impact on cardinality constraints. Consider:

    64572001|Disease| : [1..1] 363698007 |Finding site| = < 39607008 |Lung structure| {{ C.moduleId = 'AU Core' }}

    Does this match all Diseases that have exactly one Finding site which is a descendant of Lung structure and then further restrict/filter that to those where the relationship is in the 'AU Core' module, or does it instead match all descendants of Disease that have exactly one Finding site that is a descendant of Lung structure AND is also in the 'AU Core' module.  In short, does the cardinality constraint apply before or after the filter?

  7. My interpretation is that the complete ECL is evaluated, and then the filter applies to the results of that evaluation. To achieve other results, you would need to next a query (with brackets) inside the spots in the ECL that can accept a subexpression.

    So, with the example that you've shown, it would first find the diseases that have exactly one finding site that is a descendant of lung structure. And then it would further restrict this result to those where the concept is in the AU core - so that implies the "AND" logic of your second option.

    I'm not sure why you think it might restrict to those where the relationship is in the 'AU Core', when the filter refers to the attribute "C.moduleId". To filter the results based on the Relationship's  module, we would need to use R.moduleId ... and exactly where a relationship filter should be placed in the query is the topic of further discussion ... but the current proposal (from Ed) is to put relationship filters after the attribute in the refinement that matches the relationships the you want to filter. So, for example:

    64572001|Disease| : [1..1] 363698007 |Finding site| {{ R.moduleId = 'AU Core' }} = < 39607008 |Lung structure| 

    I would interpret this as matching the diseases with exactly one finding site that (a) is in the AU core module, and (b) has a value that is a type of lung structure.

    Kind regards,
    Linda.

    1. You're right my natural language translation was inaccurate.  What I meant to say was that the Lung structure descendant concept was in the 'AU Core' module.  I was interpreting the expression based on the following bracketing:

      64572001|Disease| : [1..1] 363698007 |Finding site| = ( < 39607008 |Lung structure| {{ C.moduleId = 'AU Core' }} )

      That is, I've been assuming that filters bind to the closest thing.  But your interpretation suggests the following:

      (< 64572001|Disease| : [1..1] 363698007 |Finding site| = < 39607008 |Lung structure|) {{ C.moduleId = 'AU Core' }}

      which I would have instead expressed as:

      64572001|Disease| {{ C.moduleId = 'AU Core' }} : [1..1] 363698007 |Finding site| = < 39607008 |Lung structure|



      On brief reflection, I suspect that filters on Relationships are likely to be rare, and thus I'd be very open to requiring explicit bracketing to indicate that's what is intended.  For example:

      64572001|Disease| : [1..1] ( 363698007 |Finding site| = < 39607008 |Lung structure| ) {{ R.moduleId = 'AU Core' }}

      "exactly one relationship that is in AU Core (but zero or more may be in other modules)"

      64572001|Disease| : ( [1..1] 363698007 |Finding site| = < 39607008 |Lung structure| ) {{ R.moduleId = 'AU Core' }}

      "exactly one relationship in any module, then restrict to AU Core relationships"

      64572001|Disease| : [2..2] ( 363698007 |Finding site| = < 39607008 |Lung structure| ) {{ R.moduleId = 'AU Core' }}


      "exactly two relationships that are in AU Core (but zero or more may be in other modules)"


      64572001|Disease| : ( [2..2] 363698007 |Finding site| = < 39607008 |Lung structure| ) {{ R.moduleId = 'AU Core' }}


      "exactly two relationships in any module, then require at least one to be in AU Core"

      To write the query that there must be 2 or more relationships but all must be in AU Core means that you need to use the "zero cardinality trick":

      64572001|Disease| :
          [2..*] 363698007 |Finding site| = < 39607008 |Lung structure|,
          [0..0] ( 363698007 |Finding site| = < 39607008 |Lung structure| ) {{ R.moduleId != 'AU Core' }}


      For the dotted notation I think it's okay to bind the relationship to the one matching the closest dot:

      Any . 246090004 | Associated finding| 363698007 |Finding site| {{ R.moduleId != 'AU Core' }}

      R binds to the Finding site relationships, not the Associated finding relationships.  For the latter you'd write:

      Any . 246090004 | Associated finding| {{ R.moduleId != 'AU Core' }} 363698007 |Finding site|


      1. I would like to see some use cases before we go further down the relationship filtering path, particularly with the DL update and the deprecation of the stated relationship tabular RF2 representation. In the new scheme, relationships would be added axiom-wise rather than at the individual relationship level. While it would be possible to "extract" stated relationship information from the OWL axioms, the questions why and how would need to be answered (in that order).

        That leaves the concept and description filters which could be given a much simpler and "ECL-like" interpretation "any concept having properties X, Y, Z" and "any concept with a description having properties X, Y, Z", e.g. << 105590001 | Substance (substance) | AND {{ C.module=900000000000207008 }} AND {{ D.languageCode = EN, D.term = "*substance*" }} AND {{  D.languageCode = SV, D.term = "*medel*" }}

  8. Thanks Michael and Daniel! Let's come back to these points in this week's meeting.