One of the points identified in the previous section is that snapshot queries and database views are less likely to perform as well as direct access to database tables. As shown in on the SNOMED CT example database confirm that there is a substantial performance difference between these two approaches.
Testing Performance of Queries on Snapshot Tables and Snapshot Views |
Snapshot Table | Snapshot Views
| Performance Ratio | ||
Read 1 million rows from relationship snapshot | 1.52 | 11.06 | 15% | |
Read 1 million rows from description snapshot | 3.57 | 12.73 | 28% | |
Read all rows from concept snapshot | 0.66 | 2.45 | 26% | |
Total time for all operations above | 5.75 | 25.74 | 22% | |
Advanced test reading 10,000 relationships and with joins to descriptions and language reference set for the fully specified names of source, type and target concept | 2.34 | 4.70 | 50% |
Based on these findings the most effective way to optimize access to a snapshot view, is to replace the use of database views with snapshot tables. Representing the a snapshot with tables, rather than using a database view, adds roughly 2.6 Gb to the storage requirements for the example database.
The current snapshot view is essential and is used for most interactions with the database. Therefore, the performance enhancements justify use of the additional disk space required to store the current snapshot in separate tables. If there are specific reasons for extensive access to one or two retrospective snapshots, it might also be worthwhile representing those snapshots in separate tables. However, it would not be worthwhile to apply the same approach to the full range of less frequently used retrospective snapshots. Therefore, if the snapshot views defined in do not perform sufficiently well, it is worth considering ways to optimize snapshot access.
Unlike the views described earlier, the optimization methods described below require additional columns to be added to each of the full release tables. After importing release files, the tables are processed to generate values for the additional columns and this data is used to simplify the snapshot view queries by avoiding the need for nested queries.
A column is added to each full release table. This column is used to represent flags that indicate which snapshot view each row is included in.
The example SQL below illustrates the process of flag setting. In practice, while this query works it is not very efficient for several reasons. A more efficient approach would be to use a stored procedure that computes the full set of flags applicable to each row. This approach would allow the flag column in each row to be updated once rather than requiring a separate update for each snapshot view.
A configurable snapshot view can then be created tests the appropriate bit rather than requiring a nested query.
SET SQL_SAFE_UPDATES=0; update full_tableName tbl set flag=flag | 1 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20190731'); update full_tableName tbl set flag=flag | 2 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20190131'); update full_tableName tbl set flag=flag | 4 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20180731'); update full_tableName tbl set flag=flag | 8 where tbl.effectiveTime=(select max(sub.effectiveTime) from copy_full_tableName sub where sub.id=tbl.id and sub.effectiveTime<='20180131'); SET SQL_SAFE_UPDATES=1; |
Once the flags are set, a query such as the one below can be used to return component versions that are part of a particular snapshot view. The example query returns row in which the flag the second bit (value 2) is set. Based on the settings in the query above this means it would include rows that are part of the 2019-01-31 snapshot view of this table
select * from full_tableName where flag & 2; |
Limited testing of this optimization approach indicates that it is between 2 and 3 times faster than the unoptimized snapshot views. Direct access to a snapshot table is still twice as fast as this optimized approach.
The full release files in the 2019-07-31 release contain a total of approximately 16 million rows. If flags are added each of these rows will require a further 8 bytes of storage. No additional indexes are required to support this optimization. As a result the overall increase in storage requirements to support this optimization is less than 150 Mb.
As described here the approach is limited to 64 snapshot times. This is probably more than sufficient for most practical requirements. However, it and could be extended by adding an another flag column or by changing the data type of the flag column to binary.
An additional datetime column is added to each row. This column is used to represent the time at when a row was replaced by the next version of that component or reference set member.
The example SQL below illustrates the process of flag setting. In practice, while this query works it is not very efficient for several reasons. A more efficient approach would be to use a stored procedure that computes the full set of flags applicable to each row. This approach would allow the flag column in each row to be updated once rather than requiring a separate update for each snapshot view.
-- Create temporary table for the supersededTime values CREATE TEMPORARY TABLE tmp (id CHAR(36) NOT NULL,effectiveTime DATETIME,supersededTime DATETIME, PRIMARY KEY (id,effectiveTime)); -- Compute the supersededTime values for each combination of id+effectiveTime and add these to the temporary file INSERT INTO tmp SELECT tbl.id, tbl.effectiveTime, (SELECT IFNULL(MIN(sub.effectiveTime),DATE "99991231") FROM full_tableName sub WHERE tbl.id=sub.id AND tbl.effectiveTime<sub.effectiveTime) supersededTime FROM full_tableName tbl; -- Apply the appropriate supersededTime values to each row in the full table UPDATE full_tableName tbl JOIN tmp SET tbl.supersededTime=tmp.supersededTime WHERE tmp.id=tbl.id AND tmp.effectiveTime=tbl.effectiveTime; |
Once the superseded time values are set, a query such as the one below can be used to return component versions that are part of a particular snapshot view. The example query returns row in which the flag the second bit (value 2) is set. Based on the settings in the query above this means it would include rows that are part of the 2019-01-31 snapshot view of this table
-- This query assumes the default supersededTime is 9999-12-31 SELECT * FROM full_tableName WHERE supersededTime = DATE '99991231'; |
-- This query assumes that [snapshotTime] is replaced by the required snapshotTime SELECT * FROM full_tableName WHERE [snapshotTime] >= effectiveTime AND [snapshotTime] < supersededTime; |
Past experience indicates that this approach is 2 times faster than the unoptimized snapshot views
Previous tests in MySQL 5.7 were 3 times faster than unoptimized snapshot views. However, MySQL 8.0 seems to have enhanced the performance of the unoptimized queries without significantly improving the results of this approach to optimization. |
. However, this figure varies depending on the complexity of the queries it is used in. Direct access to a snapshot table is between 2 and 3 times as fast as this optimized approach.
The full release files in the 2019-07-31 release contain a total of approximately 16 million rows. If a datetime column is added, each of these rows will require a further 5 bytes of storage. Database designs using this additional column also included additional indexes including supersededTime
The performance impact of removing some of these indexes was not tested, so it is unclear if benefits could still be delivered by this approach without these indexes. |
. As a result, the storage to fully support this approach required an additional 750 Mb.