|
CQL
CQL (Common/caGrid Query Language) is the caGrid query language used for all caGrid Data Services to express queries against a data source using an object oriented language. It is defined in an XML document conforming to a well defined schema with the URI http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery .
  |
| The CQL Schema Diagram |
The CQL Schema consists of the following main components:
- CQL Query
- A simple wrapper element at the head of every CQL query document. It contains the target element.
- Target
- The Target element is of the type Object, and describes the data type which the query will return.
- QueryModifier
- An optional element modifying the returned result set. This modifier has a required attribute 'countOnly' which can tell the data service to return the number of results the query would return. The modifier optionally allows for a choice of a list of Attribute Names or a single Distinct Attribute to return. When the list of attribute names is specified, sets of tuples are returned containing the attribute names and corresponding values for each object instance returned by the query. When distinct attribute is used, only unique attribute values are placed in the returned attribute sets.
- Object
- The Object element contains the required attribute 'name.' This attribute's value defines the caDSR class of the object. When the Object is the top level target of a CQL query, it identifies the data type that will be returned by the caGrid Data Service. The Object allows for a choice between three child elements. The possible child elements are Attribute, Association, and Group. Objects may have at most one of these child elements.
- Attribute
- An Attribute type in CQL describes a restriction for an attribute of an Object. The Attribute contains three XML attributes, which define the restriction. The attribute 'name' defines the name of the attribute to be restricted. The attribute 'value' defines the restriction on the attribute. The attribute 'predicate' describes what type of restriction the Attribute defines. Allowable predicates are defined by the schema's simple type 'Predicate', which defines an enumeration of allowable values. The predicate values are generally self-descriptive: "EQUAL_TO", "NOT_EQUAL_TO", "LIKE", "LESS_THAN", "LESS_THAN_EQUAL_TO", "GREATER_THAN", and "GREATER_THAN_EQUAL_TO." Two additional predicates, "IS_NOT_NULL", and "IS_NULL" check only for the presence or absence, respectively, of an attribute, and do not restrict its value at all. Therefore, any 'value' attribute will be ignored when using these predicates.
- Association
- An association describes a related Object, which defines the associated Object's restrictions for the query, as well as the relationship from one object to another. Specifically, it defines the relationship down the object model tree. The Association complex type is an extension of Object. The Association has a single, optional attribute named 'roleName.' This attribute identifies which associated object field the Association is defining. For example, a person may have more than one address, perhaps business and home. To perform a restriction against the home address, a query must specify the home address role name for the associated object. If the query omits the role name, such a query becomes ambiguous, as there is more than one field of Person which has a type of Address. In this case, the data service will throw a MalformedQueryException explaining that the requested association is ambiguous. In the case of an object where there is only one field of a given type, the roleName attribute may be omitted, and the data service will resolve the correct name as the query is processed.
- Group
- Groups define logical joints of two or more conditions, and operate against the Object to which they are attached. Groups must have two or more children, which may be a mixture of type Attribute, Association, or Group. Groups also have an attribute named 'logicOperator,' whose type is defined in the schema's simple type LogicalOperator. This type is an enumeration of the values "AND" and "OR." The operator is applied to all children in the group. The "AND" operator requires that all conditions in the group be true for the group to evaluate as true. The "OR" operator requires that any condition in the group evaluate as true.
Overview of the Process of Constructing a CQL Query
To construct a CQL query, first identify the data type that you would like to retrieve. This data type (the class from the UML model) becomes the '''Target''' in your CQL query. Next, identify the criteria that you would like to use to retrieve only a subset of all available data. For example, if you specify the "Gene" class as the Target, you will retrieve all Gene objects in the database, which probably is not what you want. To limit the subset of Gene objects that you retrieve, you must identify "filtering" criteria.
To specify "filtering criteria", use the Group, Attribute and Association CQL elements. For example, to retrieve only Gene objects where the Gene name matches a given pattern, you would specify an attribute filter on the Gene class (the "name" attribute, for example).
If the attributes that you would like to filter upon are in another class, specify an association from the Target to a (associated) class. At that point, you can specify Attributes in the associated class to filter on.
CQL Examples
Several example CQL queries are available on this wiki that demonstrate how to create a few common types of CQL queries.
Creating a CQL Query In Code
Data services in caGrid use CQL to compose queries. A query can be produced programmatically, building up parts of the query using the supplied object model:
Programmatic query building
The data services project in caGrid provides a Java object model for CQL which can be used to build queries.
gov.nih.nci.cagrid.cqlquery.CQLQuery query =
new gov.nih.nci.cagrid.cqlquery.CQLQuery();
gov.nih.nci.cagrid.cqlquery.Object target =
new gov.nih.nci.cagrid.cqlquery.Object();
target.setName(gov.nih.nci.cabio.domain.Gene.class.getName());
gov.nih.nci.cagrid.cqlquery.Attribute symbolAttribute =
new gov.nih.nci.cagrid.cqlquery.Attribute(
"symbol",
gov.nih.nci.cagrid.cqlquery.Predicate.LIKE,
"IL%");
target.setAttribute(symbolAttribute);
query.setTarget(target);
Load CQL query from a Reader
Alternatively, a CQL query can be loaded from Reader . The code examples below illustrate loading from a string of XML text or an XML file on disk and deserialized into the object model:
CQLQuery query2 = (CQLQuery) gov.nih.nci.cagrid.common.Utils.deserializeObject(
new StringReader("<CQLQuery ... />"), CQLQuery.class);
CQLQuery query3 = (CQLQuery) gov.nih.nci.cagrid.common.Utils.deserializeObject(
new FileReader(cqlFile), CQLQuery.class);
Write CQL query out to a file
The following code illustrates how to write a CQL query out to a file. The Utils.serializeObject method takes any Writer as input.
CQLQuery someQuery = ...;
StringWriter writer = new StringWriter();
Utils.serializeObject(someQuery, DataServiceConstants.CQL_QUERY_QNAME, writer);
System.out.println(writer.getBuffer().toString());
Schemas
The CQL schemas are available on this wiki.
Caveats
- CQL does not permit querying for attributes with values that are XML schema complex types.
- Only values that can be represented as XML schema simple types are allowed.
- CQL Attribute Results cannot contain attribute values which are XML schema complex types.
- Only values that can be represented as XML schema simple types are allowed.
- CQL does not provide a facility for returning object instances other than the targeted data type.
- This includes subclasses of the targeted data type. These cannot be returned because their XML representation will differ from that of the requested object, which violates the expected results schema.
- CQL cannot return populated associations on instances of the targeted data type.
- This has some implications when dealing with uni-directional associations. For example:
|
|
|