All data in SNOMED CT release files conforms to one of a set of data types defined in the Release File Specification (3.1.2 Release File Data Types). The database tables that will hold release file data should be designed in ways that consistently map the general data type characteristics in the SNOMED CT specification to appropriate data types in the should be represented consistently using an appropriate data type supported by the database.
Table 4.3-1 shows how each release file data type is represented in the MySQL example database. Possible alternative data type mapping are shown where appropriate with notes explaining the rationale for the preferred choice.1
Release File Data Type
|DB Data Type||Possible Alternatives||Notes|
Performance tests comparing BIGINT (a 64-bit Integer) and VARCHAR(18) show that BIGINT consistently performs better for almost all types of required access to the SNOMED CT database. A test set of queries took 66% of the time to complete with some tasks completed in less than half the time.
CHAR(18) would also be a possible storage form but requires 2 bytes more than storage per SCTID than BIGINT. Whereas on average VARCHAR uses less storage because most SCTIDs are significantly shorter than the maximum length of 18 characters.
A Universally Unique Identifier is a 128-bit unsigned generated using a standard algorithm.
Performance tests comparing CHAR(36) and BINARY(16) indicated that CHAR(36) consistently performs significantly better than BINARY in the types of queries used in the SNOMED CT database.
BINARY only uses 16 bytes compared with 36 required for CHAR(36).
CHAR(36) enables UUIDs to be read and rendered without requiring additional processing to match the standard string representation of UUIDs.
CHAR(32) would also be a possible storage form but requires processing to and from the standard string representation of UUIDs.
A 32-bit signed integer.
|INT||Some integer columns may use far fewer than 32-bits (4 bytes) and in future some might require more. However, currently no integer values used in release files exceed the range of a 32-bit signed integer (except SCTIDs which are treated separately). Therefore, for consistency and simplicity the INT options is applied to all integer columns.|
UTF-8 text of a specified length.
All strings values use utf8mb4 character set.
VARCHAR uses less storage for strings of limited length.
TEXT offers flexible solution for longer strings.
CHAR could be used for strings of known length.
|If length is specified as no more than 200 characters||VARCHAR(length)||CHAR(length)|
|If length is unspecified and could potentially |
be more than that 200 characters
A Boolean value, represented as one of two possible integer values (1 = true, 0 = false).
TINYINT uses only a single byte and it thus the most economic way to store a 0 or 1 value.
Although CHAR(1) could be used it offers no advantages.
A date and time format expressed as a text string in line the basic representation specified in the ISO 8601 standard.
DATETIME allows date (and or time) storage in most compact form (5 bytes in MySQL 8.x). Comparison, differences, and flexible output formatting are also supported.
TIMESTAMP also supports date and time, only using 4-bytes. However, this does not permit dates after 2038 and the extra byte in DATETIME completely removes the limitation.
VARCHAR(14) would allow date and time in ISO format YYYYMMDDhhmmdd but requires more storage and performs less well than DATETIME.
CHAR(8) would be sufficient for dates without times. This is all that is currently required for the effectiveTime field but potential uses of the Time data type would not be covered. It also uses 3 bytes per date for storage when compared with DATETIME.
If you are applying this guide to a different SQL implementation, you may need to modify some or all of these data type mappings based on assessment of the performance and storage characteristics of the available data types1 .
The process for determining the data types used in the example database was as follows: