Data Services and CQL support for ISO 21090 data types
| |
|
|
| |
Contents
|
|
| |
|
|
General Overview
ISO 21090 data types will be supported by a the caCORE SDK in version 4.3, which handles ISO 21090 data types through several unique mechanisms which in turn require support from caGrid data services.
Special considerations
JaxB serialization and deserialization
The ISO 21090 data types require serialization and deserialization via the JaxB framework. This support is provided by the NCICB's common implementation of the ISO 21090 data types and localization. The caCORE SDK version 4.3 adds support for JaxB not only for ISO data types, but for general use as well. In parallel to this, continuing support for Castor serialization will be maintained for non-ISO data types.
The use of JaxB for serialization and deserialization affects caGrid's use of the ISO data types since the default support for Axis beans and serialization cannot be used. This is supported by the ISO 21090 data types extension for Introduce. This extension adds the XML schemas for these types and configures custom JaxB serialization for each schema element. An issue specific to data services and CQL support relates to ensuring that associated objects of the target class are not inadvertently retrieved and serialized along with the top-level target data type. This issue is overcome by use of the SDK's utility for removing Hibernate proxy functionality from data instances retrieved out of the Hibernate layer. This allows the JaxB serialization framework to be completely agnostic to the fact that Hibernate is loading the data instances' associations in a lazy fashion, and still prevents passing unwanted items over the wire.
NCI Localization
The term "localization" with respect to ISO 21090 data types refers to a specific sub-set of the ISO 21090 spec which is implemented and mapped in a meaningful way into real Java objects. The NCI localization does not map every sub-attribute of each and every data type. For these cases, a constant value may be supplied to that sub-attribute. Additionally, ISO 21090 data types deal with null values in an idiosyncratic way which attempts to quantify "why" a given data instance is null (patient didn't answer, question not asked, etc). This feature is known as the Null Flavor, and may also be mapped to a constant value.
Constants
Constants are defined at the UML / XMI level using specific Tagged Values which identity the type of constant, and attach it to a specific attribute of each class. The SDK uses this information to create a Spring configuration which the data services framework can use to determine when an attribute is mapped to a constant value as opposed to being derived from the database. The caCORE SDK fills in the constant values in returned objects using a Hibernate Tupalizer implementation.When a constant is supplied, the value is not mapped into the underlying database, but filled in when the data type is instantiated.
When querying, a client may perform a query against one of these constant mapped attributes. Since they're not actually mapped into the database, they can't be used in an HQL statement without raising an error. The CQL to HQL translator checks each attribute to determine if it is a constant, and if so, "dereferences" that value into the generated HQL statement using a positional parameter.
Null Flavors and other Enumerations
Null Flavors may be mapped into the database, made constant on a per - data type basis, or constant across an entire data model.
Since Null flavors are Enumerations of values, they appear in the domain model as an attribute of a String value. This is the same general approach taken for all Enumerations in the ISO 21090 space (AddressPartType, Compression, etc).
Implementation Details
Domain Model Considerations
To maintain compatibility with the existing CQL specification and tools which leverage and depend on it, some considerations have been made for handling complex data types in the domain model.
User types which use ISO 21090 data types as "attributes" are represented as a unidirectional association between the user type and the ISO type using the attribute's name as the role name. Inner simple attributes of ISO types (eg. Cd.codeSystem) are represented as Attributes of the appropriate type. ISO types which utilize other types (eg. Ad has several Adxp) are modeled as a series of associations as well.
Additional XMI Tagged Values
| The SDK team provided a doc identifying and discussing these new tags, however they have indicated that some recent changes to the Hibernate architecture they're using will necessitate different / more tags |
The following new XMI tagged values will need to be captured by an XMI parser and translated into some meaningful information in the extended domain model
Global Constants
Global level constants will be specified in the logical model of the datatype, and are applied any time when the value is not mapped to the database and there does not exist corresponding local constant value.
- Tag Value Key: mapped-complex-constant
- Value: <fully qualified name of the complex sub-attribute>:constant value
- Example: "II.root:2.16.12.123.456" specified against the "root" attribute in the "II" datatype class
Local Constants
The local level constants will be specified in the logical model of the datatype and is applied anytime the user has not mapped corresponding attribute to the database. It is important to note that local constants take precedence over global constants.
- Tag Value Key: mapped-complex-constant
- Value: <fully qualified name of the complex sub-attribute>:constant value
- Example: "gov.nih.nci.cacoresdk.domain.oher.datatype.ComplexType.id.root:2.16.12.123.456" specified against the "id" attribute in the "ComplexType" class
Simple Types in Complex Attributes
For purposes of the caCORE SDK's ISO 21090 data type implementation, the following are considered simple types:
- String
- Boolean
- Integer
- Real (Not REAL)
- XML
- Uid
- Code
- Uri
- Enumerations
- NullFlavor
- AddressPartType
- PostalAddressUse
- Compression
- IntegrityCheckAlgorithm
- IdentifierReliability
- IdentifierScope
- UncertainityType
- TelecommunicationAddressUse
- EntityNamePartQualifier
- EntityNameUse
- EntityNamePartType
The simple datatype inside a complex ISO datatype is mapped into the underlying database similar to mapping any other simple attribute of a non-ISO data type. The column name in that database is derived from the UML tagged value:
- Tag Value Key: mapped-complex-attributes
- Value: fully qualified name of the attribute within the complex structure
- Example: gov.nih.nci.cacoresdk.domain.other.datatype.ComplexType.id.extension
In this example, the ComplexType is one of the classes in the SDK's test model (gov.nih.nci.cacoresdk.domain.other.datatype.ComplexType) and id is one of the attributes within the same class. The "id" attribute is of type "II" (an ISO 21090 type) which has "extension" as one of the sub-attributes.
Nested Complex Attributes
Complex ISO datatypes may themselves contain complex ISO types as attributes. Any attribute of a complex type which isn't a simple type (see above), is handled as complex. The simple attributes of the nested complex datatype are recursively mapped to database columns in the same table as the parent datatype. This leads to some interesting queries that must be run to support this structure.
- Tag Value Key: mapped-complex-attributes
- Value: fully qualified name of the attribute within the complex structure
- Example 1: gov.nih.nci.cacoresdk.domain.other.datatype.ComplexType.code.codeSystemName
- Example 2: gov.nih.nci.cacoresdk.domain.other.datatype.ComplexType.code.originalText.value
In the above examples, ComplexType is one of the classes in the SDK's test model and "code" is one of the attributes of that class. It shows mapping of the "code" attribute which is of type "CD", an ISO 21090 type. CD has "codeSystemName" sub-attribute which is mapped as a simple type (String). CD also has an "originalText" nested attribute which is of complex type "ED.TEXT". ED.TEXT has nested "value" sub-attribute which is a simple type (String) whose mapping is shown in example 2.
CQL to HQL translation engine
The translation engine takes an additional parameter upon instantiation which provides the constant value resolution functionality. This parameter is an interface, and a default implementation is provided so as to maximize flexibility when the engine is integrated into other products (eg. caCORE SDK).
The engine also makes consideration for special handling when generating HQL queries involving the ISO 21090 data types. Since the data types are "flattened" at the database level into the same table as their containing user type, the existing paradigm of creating an inner select statement when traversing Associations doesn't work. Instead, often nested role names separated by dots is required. For example: CdDataType -> Cd (role name value1) -> codeSystem should be queried in HQL as "From CdDataType c where c.value1.codeSystem = foo.
CQL examples using ISO 21090 datatypes
The following examples draw from the caCORE SDK 4.3 example model and demonstrate the general functionality of CQL using ISO 21090 data types. Classes belonging to the package "gov.nih.nci.iso21090" are the ISO types, and other classes are user defined types.
Nested associations
CQL provides for recursive definitions of query restrictions based on associations between data types and attributes of them. In these examples, a target is defined, and various levels of nesting are used to restrict the result set.
Single level of ISO types with an attribute
This example targets the Computer user defined data type. It restricts the instances returned by an association to Hard Drive. This association is further restricted by way of an association to the ISO 21090 Int type through the role name size. The Int must have it's value attribute equal to "2".
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
<ns1:Target name="gov.nih.nci.cacoresdk.domain.onetomany.bidirectional.Computer">
<ns1:Association name="gov.nih.nci.cacoresdk.domain.onetomany.bidirectional.HardDrive" roleName="hardDriveCollection">
<ns1:Association name="gov.nih.nci.iso21090.Int" roleName="size">
<ns1:Attribute name="value" predicate="EQUAL_TO" value="2"/>
</ns1:Association>
</ns1:Association>
</ns1:Target>
</ns1:CQLQuery>
Multiple levels of ISO types
This query targets ScDataType, a user defined type. The instances of ScDataType returned by the query are restricted by an association to the ISO Sc type through the role name value2. A nested association to the ISO Cd type restricts furhter. The Cd instances must have a non-null codeSystem value.
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
<ns1:Target name="gov.nih.nci.cacoresdk.domain.other.datatype.ScDataType">
<ns1:Association name="gov.nih.nci.iso21090.Sc" roleName="value2">
<ns1:Association name="gov.nih.nci.iso21090.Cd" roleName="code">
<ns1:Attribute name="codeSystem" predicate="IS_NOT_NULL" value="true"/>
</ns1:Association>
</ns1:Association>
</ns1:Target>
</ns1:CQLQuery>
Associations using DSet
The use of associations through DSet are fairly straightforward, and follow the same general pattern as any other association.
DSet of Cd using EdText and an attribute
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
<ns1:Target name="gov.nih.nci.cacoresdk.domain.other.datatype.DsetCdDataType">
<ns1:Association name="gov.nih.nci.iso21090.DSet" roleName="value4">
<ns1:Association name="gov.nih.nci.iso21090.Cd" roleName="item">
<ns1:Association name="gov.nih.nci.iso21090.EdText" roleName="originalText">
<ns1:Attribute name="value" predicate="IS_NOT_NULL" value="true"/>
</ns1:Association>
</ns1:Association>
</ns1:Association>
</ns1:Target>
</ns1:CQLQuery>
Associations using Ivl
The ISO 21090 Ivl type is a generic type. This is modeled in the domain model by creating individual classes for each type Ivl may be used generically with. Ivl<Int>, Ivl<Ts>, Ivl<Pq>, and Ivl<Real> are available in the domain model, and user types may have associations to them.
IVL of PQ using width value
This query retrieves user types of IvlPqDataType which have an association to Ivl<Pq> through the role name "value4". Note that the class name for Ivl<Pq> is in generic format using the angle bracket notation. These Ivl<Pq> instances are restricted by an association to Pqv through the width role name. This Pqv instance must have its value attribute equal to "5.1".
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
<ns1:Target name="gov.nih.nci.cacoresdk.domain.other.datatype.IvlPqDataType">
<ns1:Association name="gov.nih.nci.iso21090.Ivl<Pq>" roleName="value4">
<ns1:Association name="gov.nih.nci.iso21090.Pqv" roleName="width">
<ns1:Attribute name="value" predicate="EQUAL_TO" value="5.1"/>
</ns1:Association>
</ns1:Association>
</ns1:Target>
</ns1:CQLQuery>
Null Flavors
ISO 21090 uses the notion of a Null Flavor to express a reason that an instance is null. Queries may be made against this null flavor attribute
Null Flavor of a DSet
This query returns instances of the user type DsetCdDataType which have an association to DSet, which has a null flavor of NI.
<ns1:CQLQuery xmlns:ns1="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"> <ns1:Target name="gov.nih.nci.cacoresdk.domain.other.datatype.DsetCdDataType"> <ns1:Association name="gov.nih.nci.iso21090.DSet" roleName="value1"> <ns1:Attribute name="nullFlavor" predicate="EQUAL_TO" value="NI"/> </ns1:Association> </ns1:Target> </ns1:CQLQuery>
Known Issues
The caGrid support for ISO 21090 data services with the caCORE SDK 4.3 is affected by the following two known issues.
Queries against data types using CLOB fail under Oracle
This issue arises when an HQL query using a "distinct" clause is executed against a data type which maps to an underlying table with a column of the CLOB data type on an Oracle database. The referenced GForge issue mentions the CD type specifically, but the issue is likely to appear in any case where a byte array is mapped to the underlying database as a CLOB, as might be done with the ED or ED.TEXT data types.
Queries against certain mapping scenarios of DSET<AD> fail
This issue occurs when the CQL to HQL translation engine attempts to navigate through the Hibernate configuration and discover what inner entity names are used to reference the ADXP instances contained by a given AD value inside a DSET. The Hibernate Configuration bean seems to loose this information at the deepest level of nesting, causing an exception to be thrown when the query engine attempts to retrieve it.





