4.3. Data Type Options

All data in SNOMED CT release files conforms to one of a set of data types defined in the Release File Specification (3.1.2 Release File Data Types). The database tables that will hold release file data should be designed in ways that consistently map the general data type characteristics in the SNOMED CT specification to appropriate data types in the should be represented consistently using an appropriate data type supported by the database.

Table 4.3-1 shows how each release file data type is represented in the MySQL example database. Possible alternative data type mapping are shown where appropriate with notes explaining the rationale for the preferred choice.¹

Table 4.3-1: Mapping from Release Files Data Types to MySQL Data Types

Release File Data Type	Description	DB Data Type	Possible Alternatives	Notes
SCTID	A SNOMED CT identifier, between 6 and 18 digits long, as described in 6.2 SCTID Representation.	BIGINT	VARCHAR(18) CHAR(18)	Performance tests comparing BIGINT (a 64-bit Integer) and VARCHAR(18) show that BIGINT consistently performs better for almost all types of required access to the SNOMED CT database. A test set of queries took 66% of the time to complete with some tasks completed in less than half the time. CHAR(18) would also be a possible storage form but requires 2 bytes more than storage per SCTID than BIGINT. Whereas on average VARCHAR uses less storage because most SCTIDs are significantly shorter than the maximum length of 18 characters.
UUID	A Universally Unique Identifier is a 128-bit unsigned generated using a standard algorithm. UUIDs are represented as strings of hexadecimal characters split by - characters as points specified by the UUID standard.	CHAR(36)	BINARY(16) CHAR(32)	Performance tests comparing CHAR(36) and BINARY(16) indicated that CHAR(36) consistently performs significantly better than BINARY in the types of queries used in the SNOMED CT database. BINARY only uses 16 bytes compared with 36 required for CHAR(36). CHAR(36) enables UUIDs to be read and rendered without requiring additional processing to match the standard string representation of UUIDs. CHAR(32) would also be a possible storage form but requires processing to and from the standard string representation of UUIDs.
Integer	A 32-bit signed integer.	INT		Some integer columns may use far fewer than 32-bits (4 bytes) and in future some might require more. However, currently no integer values used in release files exceed the range of a 32-bit signed integer (except SCTIDs which are treated separately). Therefore, for consistency and simplicity the INT options is applied to all integer columns.
String	UTF-8 text of a specified length.			All strings values use utf8mb4 character set. VARCHAR uses less storage for strings of limited length. TEXT offers flexible solution for longer strings. CHAR could be used for strings of known length.
	If length is specified as no more than 200 characters	VARCHAR(length)	CHAR(length)
	If length is unspecified and could potentially be more than that 200 characters	TEXT
Boolean	A Boolean value, represented as one of two possible integer values (1 = true, 0 = false).	TINYINT	CHAR(1)	TINYINT uses only a single byte and it thus the most economic way to store a 0 or 1 value. Although CHAR(1) could be used it offers no advantages.
Time	A date and time format expressed as a text string in line the basic representation specified in the ISO 8601 standard. (i.e. YYYYMMDD or YYYYMMDDTHHMMSSZ)	DATETIME	TIMESTAMP VARCHAR(14) CHAR(8)	DATETIME allows date (and or time) storage in most compact form (5 bytes in MySQL 8.x). Comparison, differences, and flexible output formatting are also supported. TIMESTAMP also supports date and time, only using 4-bytes. However, this does not permit dates after 2038 and the extra byte in DATETIME completely removes the limitation. VARCHAR(14) would allow date and time in ISO format YYYYMMDDhhmmdd but requires more storage and performs less well than DATETIME. CHAR(8) would be sufficient for dates without times. This is all that is currently required for the effectiveTime field but potential uses of the Time data type would not be covered. It also uses 3 bytes per date for storage when compared with DATETIME.

If you are applying this guide to a different SQL implementation, you may need to modify some or all of these data type mappings based on assessment of the performance and storage characteristics of the available data types¹ .

Footnotes
Ref	Notes
1	The process for determining the data types used in the example database was as follows: Identify the range of data types capable of representing all possible values based on the characteristics of the general data types defined in the SNOMED CT specifications. For the SNOMED CT data types used in primary keys (SCTID, UUID and Time), assess candidate datatypes based on the performance of views and queries that use these keys for retrieval and database joins. For the string data type, consider different options for columns of different lengths and where relevant indexing requirements differ. Where the above factors do not distinguish between options, choose the data type that uses the least storage space. [ a b ]

Search

Page tree