Working Group: Refined Metadata

Description

Addressing the various Metadata topics raised in the TRAG and MAG

Objectives

  1. Define all use cases
  2. Define all detailed requirements
  3. Identify and agree solutions
  4. Put up a straw man for discussion in the various AG's
  5. Include any real world examples of solutions in action


Relevant Documents

Name Version Published

REQUIREMENTS

Categorization of the Type of Metadata

1 - Technical package metadata

  • Use case - How to validate the release package for knowing how to use.  
  • Full - which modules are included in the Full and what is the latest version
  • Snapshot - information needed to calculate the snapshot from the full 
    • Need to know the Edition module and version (version URI) 
    • Modules and version included in the snapshot (esp necessary if not in the MDRS)
  • Delta - defining the to/from 
    • version edition URI (what edition the Delta is from and to) 
    • additional modules outside of the MDRS dependencies (to/from URIs - also includes the versions to/from) 
  • Extension 
    • Version edition and URI 
    • The needed language refset 
    • Extension package (information on what is needed to be added to the extension to make it into an Edition) 

2 - Component "gaps" metadata

  • The preferred language of the refset 
  • Language/dialect code for language reference sets Simplemap patterns that don't specify the nature of the t
  • Correlation id in the conceptmap
  • Field names for each refset pattern 
  • Foreign (non-SNOMED CT) CodeSystem URI for map type reference set source or target

3 - IP and Release Notes metadata


4 - Component (refset) metadata

Requirement requests from the Working group:

  • associating dialect alias with a language reference set
  • field names of each of the refset patterns 




  • A lot of metadata about a release package could be encoded equally encoded as JSON, or using one (or more) refset file formats. It doesn't necessarily follow that you would have to load the WHOLE release first before you could interpret an RF2 encoding of the metadata, which might then tell you that you loaded the wrong thing. I would have thought you could cheerfully load and parse e.g. an srefset file in isolation. The advantage of a refset encoding by comparison with JSON is that it would support composition across multiple extensions, and snapshotting in ways that are both technically familiar and not currently supported so easily by JSON itself. And you would avoid the need to duplicate the information both as JSON to be read before you load the data and then again also as refsets to be read in the event you decided to load the data.
  • SNOMED CT canonical CodeSystem resource: SNOMED CT canonical CodeSystem resource
  • In case they have become lost since June 2020 when I first suggested them, the following could be useful extensions to the more human readable elements of release bundle metadata expressivity offered by the existing JSON beastie: documentationURL: link to wherever release documentation is posted licenseURL: alternative to existing licenseStatement element to be used when the required license extends beyond core SNOMED content to include the licenses for any number of allied products of which some significant part is embedded within the release, typically as crossmaps. helpdesk: email contact for further information and support updatesURL: URL of at least one canonical place where this release bundle and its future updates might be obtained, for the benefit of anybody who has no idea where the one in their hand actually came from
  • In case they have become lost since June 2020 when I first suggested them, the following could be useful extensions to the more human readable elements of release bundle metadata expressivity offered by the existing JSON beastie: documentationURL: link to wherever release documentation is posted licenseURL: alternative to existing licenseStatement element to be used when the required license extends beyond core SNOMED content to include the licenses for any number of allied products of which some significant part is embedded within the release, typically as crossmaps. helpdesk: email contact for further information and support updatesURL: URL of at least one canonical place where this release bundle and its future updates might be obtained, for the benefit of anybody who has no idea where the one in their hand actually came from


Requirement requests from Australia:

  • modules are held in a version
  • edition and version 
  • missing metadata from FHIR ConceptMap and ValueSet for implicit reference sets - implicit ConceptMap target URI and relationship type where the reference set doesn't state it explicitly
  • modules within a snapshot
  • the defintion of the delta 
  • https://www.healthterminologies.gov.au/access/snomed-ct-au/reference-sets-2/?ui:fhirVersion=R4
  • JSON file or no, we do need a way to have machine readable metadata for a release package indicating what it contains.
  • For the Full format, that is really just the set of contained modules and that can be cross checked against the content (but makes a good QA point)
  • For a Delta if it is present it is really what the Delta is relative to - the “from” version
  • For a Snapshot it is even more critical - what is the root point that Snapshot was calculated from? That comes down to at least an edition module ID and a version, but given the issues with dependency versus composition with the MDRS we also need to be able to express additional modules outside strict MDRS dependency that were “composed into” the Snapshot calculation

Requirement requests from the UK:

  1. From Freshdesk ticket https://ihtsdo.freshdesk.com/a/tickets/32991 (Reply to Mark Wardle once we've made decisions):
    1. Please could the canonical name of the release be included in the release metadata file? We will blend multiple releases together (ie International + UK clinical + UK dm+d) and a name would be useful in registering what is installed, the versions and licencing requirements. Otherwise, we have to derive from the filename of the downloaded distribution file.
    2. On a separate note, the SI metadata is correctly formatted JSON but the current UK clinical and dm+d metadata is NOT correctly formatted. I have raised this with NHS digital but it might perhaps be useful to mandate that other organisations distributing SNOMED releases should format their metadata to an agreed standard.

Requirement requests from SNOMED International (internal):

  • Here is the metadata we currently manually input into our public browser in order for various UI features to work as expected for different extensions.
  • name - e.g. Belgian Edition. This is already contentious - we list everything as an Edition although most are packaged as extensions..
  • countryCode - the ISO two letter country code table on wikipedia, e.g. "be" - this should probably be upper case to match the standard.
  • defaultLanguageCode - this can not be detected from content because some extensions have a lot of translated content but still want to use English as default. Again the two letter ISO language code.
  • defaultLanguageReferenceSets - list of SCTIDs. This controls which terms are displayed in the concept details and their order. The set of language reference sets could be found using the content but many extensions do not want the GB language refset from the International Edition. The desired order can probably not be found from the content.
  • maintainerType - I'm not sure if this information should be included in the package metadata or how it should be named. The values we have are "International", "Managed Service" or "Community Content". We use this to list extensions in different categories.
  • Apart from the last item we would consider all of these useful metadata for all SCT packages. If this metadata was included it would allow more automatic configuration of terminology servers when new packages are loaded.
  •  
  • Proposal to increase the level of metadata available for authors to log decisions made during content authoring
    • This is a subject that would be helpful to include Jim in the discussions, as he has some definite opinions on how to improve the metadata in this area. 
    • Some suggestions would be to make more detailed information available for authors to describe their reasons for inactivation (especially in those areas where currently they are forced to use inactivation reason codes that aren't completely representative of the reasons in that instance).


Requirement requests from the TRAG (through other topics):

  • WE'RE STILL MISSING THE IDENTIFICATION OF THE ACTUAL MAP PRODUCT ITSELF, AND THE VERSION OF THAT ENTITY
    • (eg) "ICNP version Jan 2019" should exist as metadata somewhere within the ICNP map product package...
    • + possibly even the direct URI?
    • Examples for MEDDRA?? (as now more relevant than ICNP)
    • SUGGESTION IS TO USE THE JSON FILE FOR THIS ?????


  • HOWEVER, CAN THE .JSON FILE REALLY COVER OFF ALL REQUIREMENTS???
    • OR DO WE NEED TO DESIGN SOME REFSETS AS WELL?? 
    •  
  • Proposal for a complimentary file to the MDRS - the "ECRS" ("Edition Composition Reference Set")
    • FINAL DECISIONS:
      • a)  We will use the new JSON data on Package Composition to resolve the issues with the false positive results in the current MDRS RVF assertions, by having the assertions check the new JSON data to confirm whether or not the modules that are not explicitly called out in the packages' MDRS file (as its an extension or similar), or that have conflicting versions.
      • b)  We will use the new .JSON data to allow correct resolution of URI's
      • c)  We will NOT change the RF2 spec to move to transitive dependencies in the MDRS. 
        • 5.2.4.2 Module Dependency Reference Set - currently states 

          "Dependencies are not transitive and this means that dependencies cannot be inferred from a chain of dependencies. If module-A depends on module-B and module-B depends on module-C, the dependency of module-A on module-C must still be stated explicitly."

        • Despite this being a valid theoretical stance (as dependencies are inherently transitive), the weight of historical data across all products for the past many years means that introducing a new approach whereby all dependencies are assumed to be transitive unless there's a problem and are therefore stated, could result in confusion when taken in the context of all previous releases where stated dependencies are NOT only there if there's a problem!   We will therefore continue to review this use case in future TRAG meetings, to see if the case for changing the spec becomes strong enough to warrant a change to all our products, plus a change that runs contrary to all historical releases.
      • New planned changes to .JSON metadata file:  Update to the .JSON file metadata - addition of "Package Composition" data
      • FEED INTO THE METADATA WORKING GROUP DISCUSSIONS..
      •  
  • Bespoke Delta file creation tool
    • Current question is no longer whether or not we still believe this to be necessary, as we're all now agreed that it is.  
    • Instead, the new question is what are the specific requirements?
      • a)  Delta's to be generated from any point in time to any other point in time
      • b)  Metadata to be included somehow (to be discussed further in the Metadata Working Group) to record critical information, such as which Dates the Delta is from + to, which Modules are incorporated, etc
      • c)  Compound Delta's (including ALL changes since the relevant date, including ALL changes in the dependent release package(s), rather than just the latest state - so these are "Full file to Full file" Delta's, as we are used to) are favoured so far, however we should continue to assess any potential use cases for Atomic Delta's (effectively "Snapshot file to Snapshot file" Delta's) as we go along, in case it becomes apparent that there is a valid Business Case to ensure that the new Delta generation tool can provide either or both of these Delta file types...
      • d)  It needs to support the future requirements for Service Based delivery, once we transition over
    •  
  •  
  • Computer readable metadata
    • Examples of extending this metadata:
      • .json format 5 ?? (Please see Michael Lawley's comments on 16/04/2021 here:  Re: Working Group: Refined Metadata)
      • Namespace data
      • Individual external Refset data
      • ranges of permitted values
      • mutability, etc?
      • Package Name? (Please see Michael Lawley's comments on  20/04/2021 here:  Re: Working Group: Refined Metadata:  Yes, regarding the "Name" entry, it would be ideal if it could be used to populate the "Product Name" field in a list of available packages (and other required and relevant fields for MLDS).  Then the zip contents would be sufficient to automatically populate MLDS (or an ATOM-based Syndication feed))
      • WE'RE STILL MISSING THE IDENTIFICATION OF THE ACTUAL MAP PRODUCT ITSELF, AND THE VERSION OF THAT ENTITY
        • (eg) "ICNP version Jan 2019" should exist as metadata somewhere within the ICNP map product package...
        • + possibly even the direct URI?
        • SUGGESTION IS TO USE THE JSON FILE FOR THIS - group to provide examples of how this would look to the TRAG for review...
      •  
      • ANYTHING TO SUPPORT FREQUENT DELIVERY USEFULLY???? 
    • Also create 2 new pages -
  •  
  • Release packaging conventions and File Naming Conventions
    • We really need to tackle the Delta from and to release version in the Delta file naming, and possibly package file naming. At the moment it is impossible to know what a Delta is relative to making it hard to safely process it. Perhaps beyond the scope of this document, but quite important
  •  
  • Reference set metadata
    • Michael + David + Harold agreed to create a straw man to put up in the next meeting and take this further...
      • Michael Lawley - where are the discussion on this currently?
      •  Michael confirmed (20210420) that this straw man was never created, and so we should use the published .json file as the straw man for future discussions... 
    • Can we link this in to the .JSON file above? (Computer readable metadata) - yes, done!
    •  
    • IN FACT, are there any requirements for machine-readable or human-readable metadata that can't be addressed with extensions to the new .JSON file in the release packages?
      • No, not that people can foresee!
      •  
    • This will be therefore be rolled into the holistic discussions on Metadata in the new Metadata Working Group...
    •