Page tree
Skip to end of metadata
Go to start of metadata

Overview

When SNOMED International make a new release, some NRCs will choose to translate any new terms that have been created, and supply a text file of the translations.    The most efficient way to import this new data into a Managed Service TS is to form it into an RF2 archive and use the back end API directly (via the Swagger screens).  This is pre-populated with description ids (and language reference entries) to avoid putting an unacceptable strain on the Component Identifier Service.

The process for importing this data into the MS TS involves the following steps.

  1. (First Time Only - Setup) Download and build the required utility software 
  2. Obtain sufficient description identifiers (with the appropriate partition and namespace) for each new description
  3. Supply the text file and the identifiers file to the utility to produce an RF2 zip file
  4. Create a Task on the appropriate project in UAT and import the RF2 data. 
  5. Users perform validation
  6. Create a Task on the appropriate project in UAT and import the RF2 data

Each of these steps will now be detailed in turn.    An example task is  MSSP-59 - Getting issue details... STATUS

1. Working with Translation Files

Before processing the file supplied, there are various potential issues to consider (and possibly warn the NRC to watch for!) which have been documented here:  Working with Translation Files

 1b. First Time Setup

  • Clone this GitHub project to your local machine:  https://git.ihtsdotools.org/ihtsdo/termserver-scripting eg    git clone git@git.ihtsdotools.org:ihtsdo/termserver-scripting.git
  • cd termserver-scripting
  • checkout develop
  • mvn clean build
  • java -cp target/termserver-scripting.jar org.ihtsdo.termserver.scripting.delta.GenerateTranslation     This will prove that the code runs correctly and, without any parameters, will give output showing what the required parameters are.

Note that you can either run the utility from the command line, or load it into Eclipse   (mvn eclipse:eclipse to create an eclipse project file) and run it as a Java Application.

 

The parameters used here are as follows:

-m The module concept eg 45991000052106 for SE

-l The lanaguage reference set concept eg 46011000052107

-c The appropriate cookie for the server being connected to eg ims-ihtsdo=dO4ekHd000ZTn7EtlRagVA00

-a The author that the task is assigned to eg pwilliams  (has no effect since a delta is being constructed rather than the API used).

-p The project to be accessed eg SENOVA

-iD The generated description identifier file.   Just a flat file of SCTIDs with no header /Users/Peter/Google\ Drive/005_Ad_hoc_queries/019_SE_Translation/sctids_cis_generated_prod.txt

-f The file to be processed eg /Users/Peter/Google\ Drive/005_Ad_hoc_queries/019_SE_Translation/rf2_delta_translation_terms_20160731_SV_UTF-8_CASEFIX.txt

 

Running from the command line could be done as follows:

java -cp target/termserver-scripting.jar org.ihtsdo.termserver.scripting.delta.GenerateTranslation  -m 45991000052106 -l 46011000052107 -c ims-ihtsdo=dO4ekHd9X7ZTn7EtlRagVA00 -a pwilliams -p SENOVA -i /Users/Peter/Google\ Drive/005_Ad_hoc_queries/019_SE_Translation/sctids_cis_generated_prod.txt /Users/Peter/Google\ Drive/005_Ad_hoc_queries/019_SE_Translation/rf2_delta_translation_terms_20160731_SV_UTF-8_CASEFIX.txt

2.  Obtain Identifiers

  • "Head" the file to check if there are any encoding problems.    The 20170131 file had some odd instances of multiple carriage returns and the encoding was ISO-8859-1 rather than UTF-8.   Fix this with:
  • iconv -f ISO-8859-1 -t UTF-8 rf2_delta_translation_terms_2017-01-31_SV-UTF-8.txt | tr -d '\r' > rf2_delta_translation_terms_2017-01-31_SV-UTF-8_encoding_corrected_unix.txt
  • wc -l rf2_delta_translation_terms_2017-01-31_SV-UTF-8_encoding_corrected_unix.txt       note this number, we'll call it num_terms.
  • Use the production CIS to generate some partition 11 (Extension/Description) SCTIDs in the appropriate namespace eg 1000052
  • The file format also changed between 20160731 and 20170131.   It is now expected to be: Concept_Id TAB Swedish_Term TAB Case_Significance_SCTID
  • First login to get an authenticated token:   http://cis.ihtsdotools.org:3000/docs/#!/Authentication/login     You can get the username and password from the production termserver configuration file.
  • The swagger interface is still trying to connect to termspace, so SSH onto the box and use curl directly:
  • curl -H "Content-Type: application/json" -X POST -d '{"username":"termserver-prod","password":"xxxpasswordxxx"}' http://localhost:3000/api/login
  • curl -H "Content-Type: application/json" -X POST -d '{"namespace":1000052,"partitionId":"11","quantity":num_terms,"software":"PWI via Swagger","comment":"SE Translation","generateLegacyIds":"false"}' http://localhost:3000/api/sct/bulk/generate?token=XXXAuthenticationTokenXXX
  • The server will return a job id (json element  = "id") , recover the ids generated from the table.  Easiest to log in via SEQUEL PRO and recover directly from table:   select sctid from sctId where jobId = 91266   watch that the job has in fact completed and you have the full count required.
  • Save the IDs to a text file to be passed to the utility application using the -i parameter.

3. Run the Utility Application

The program will check the parameters its running with to allow you to change them.   You'll probably need to change the Termserver Root.   For the others, you can just press return to accept the default (or supplied) option.

Select an environment 
0: http://localhost:8080/
1: https://dev-term.ihtsdotools.org/
2: https://uat-authoring.ihtsdotools.org/
3: https://uat-flat-termserver.ihtsdotools.org/
4: https://prod-authoring.ihtsdotools.org/
5: https://dev-ms-authoring.ihtsdotools.org/
6: https://uat-ms-authoring.ihtsdotools.org/
7: https://prod-ms-authoring.ihtsdotools.org/
Choice: 6
Specify Project [SEMAYB]:
Time delay between tasks (throttle) seconds [30]:
Time delay between concepts (throttle) seconds [5]:
Outputting Report to /Users/Peter/code/snowowl-rest-api-updates/termserver-scripting/results_GenerateTranslation_20170424_225941_uat.csv
Outputting data to output_3/SnomedCT_RF2Release_SE1000052_20170424/
Targetting which namespace? [1000052]:
Targetting which moduleId? [45991000052106]:
Targetting which language code? [sv]:
Targetting which language refset? [46011000052107]:
What's the Termserver root? [MAIN/2017-01-31/SNOMEDCT-SE/]:
What's the Edition? [SE1000052]:
Recovering current state of SEMAYB from TS (uat)
INFO org.ihtsdo.termserver.scripting.client.SnowOwlClient - Recovering export from https://uat-ms-authoring.ihtsdotools.org/snowowl/snomed-ct/v2/exports/89a97829-5b64-4bc7-a198-68d79c56b47f
INFO org.ihtsdo.termserver.scripting.client.SnowOwlClient - Recovering exported archive from https://uat-ms-authoring.ihtsdotools.org/snowowl/snomed-ct/v2/exports/89a97829-5b64-4bc7-a198-68d79c56b47f/archive
INFO org.ihtsdo.termserver.scripting.client.SnowOwlClient - Sleeping 20 seconds first.

The archive is created in the current directory eg SnomedCT_RF2Release_SE1000052_20170425.zip along with a results file showing any issues encountered.  Look for serious issues using Grep eg:

  • egrep "CRITICAL|skipped" results_rf2_delta_translation_terms_2017-01-31_SV-UTF-8_encoding_corrected_unix_20170425_102535_uat.csv
  • Also check that the number of lines in the description file matches the expected line count ie num_terms

Issue - if we're seeing an HTTP 500 error appearing in the client and the TS logs are showing a null pointer exception, then try recovering the SNAPSHOT file locally:

  • https://prod-ms-authoring.ihtsdotools.org/snowowl/snomed-ct/v2/#!/exports/beginExport  using the following JSON (change dates as required):
  • {"branchPath":"MAIN/2017-01-31/SNOMEDCT-SE/SEMAYB","transientEffectiveTime":"20170425","type":"SNAPSHOT"}
  • Recover the location UUID from the location header in the response and recover the file local to the box using (replace UUID as required):   
  • curl -O -usnowowl:snowowl localhost:8080/snowowl/snomed-ct/v2/exports/10fbe2e8-3506-4c7f-b709-4f451c596760/archive
  • If the export cannot be obtained, then a copy of the relevant release can be copied in to the terserver-scripting directory and renamed to <Project>_<Environment>.zip   eg SEMAYB_uat.zip

4. Import the archive into UAT

 

5.  User Acceptance Testing

The user should check:

  • That the expected terms are shown in the task
  • That the appropriate acceptability and case significance values have been set
  • That classification, validation and promotion all complete as expected.

6.  Import the archive into Prod

  • Follow the instructions for importing into UAT, but use the following end-point, and of course you'll have a new task id to use:
  • https://prod-ms-authoring.ihtsdotools.org/snowowl/snomed-ct/v2/#!/imports/create
  • Remember to both create, and then (different endpoint) run the import
  • The status of the import can be determined by adding the importId (recovered from the location header returned) to this endpoint:  
    https://prod-ms-authoring.ihtsdotools.org:443/snowowl/snomed-ct/v2/imports/<import uuid here>

 

 

  • No labels