|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Comment:
Changes (4)
View Page Historyh1. Data Services 1.2 Developers Guide
----
{include:Developers Guide Navigation}
{cagrid-1.2-docs-nav:name=Data Services|guidetype=Developers}
The purpose of this guide is to describe the caGrid Data Service Infrastructure such that developers can make programmatic use of its client tools and extension points.
{cagridroundpanel}
{pre:class=cagridheaderfont}Table of Contents{pre}
{toc:outline=true|exclude=Data Services 1.2 Developers Guide,The CQL Schema Diagram|style=none}
{cagridroundpanel}
{pre:class=cagridheaderfont}Table of Contents{pre}
{toc:outline=true|exclude=Data Services 1.2 Developers Guide,The CQL Schema Diagram|style=none}
{cagridroundpanel}
{cagridtoc:exclude=Data Services 1.2 Developers Guide,The CQL Schema Diagram}
h1. Overview
caGrid Data Services provide an object view of a data resource across the grid. The data resource is exposed through a well defined query method, which also relies on well defined query language objects to perform queries and return results as a strongly typed set. caGrid Data Services are designed to expose objects whose XML schemas are registered in the GME, and also expose metadata about those data objects derived from the caDSR. Data Services also provide support for integration with alternate results delivery mechanisms such as WS-Enumeration. Enabling these features adds new query methods to the Grid-facing API.
{include:dataservices:CQL}
h1. Generic Data Service Clients
----
The caGrid Data Services infrastructure supplies three basic client classes which can be used to invoke any arbitrary caGrid Data Service. This capability is due to the query methods of all data services being defined in a common, [well known WSDL|Technical Guide#WSDL Service Interfaces] which each unique service instance imports.
The basic data service client, which is capable of invoking any caGrid Data Service, is the class *gov.nih.nci.cagrid.data.client.DataServiceClient*. The class defines the *query()* method, which takes a CQL Query as its single parameter and returns a CQL Query Results instance. A sample usage of this class is provided below:
{code}
import gov.nih.nci.cagrid.common.Utils;
import gov.nih.nci.cagrid.cqlquery.CQLQuery;
import gov.nih.nci.cagrid.cqlquery.Object;
import gov.nih.nci.cagrid.cqlresultset.CQLQueryResults;
import gov.nih.nci.cagrid.data.DataServiceConstants;
public class SampleDataServiceInvocation {
public static void main(String[] args) {
try {
DataServiceClient client = new DataServiceClient(args[0]);
CQLQuery query = new CQLQuery();
Object target = new Object();
target.setName("some.class.name");
query.setTarget(target);
CQLQueryResults results = client.query(query);
Utils.serializeDocument("myResults.xml", results,
DataServiceConstants.CQL_RESULT_COLLECTION_QNAME);
} catch (Exception ex) {
ex.printStackTrace();
System.exit(1);
}
}
}
{code}
This small sample will create a new data service client using a URL specified on the command line and submit a query to it for all objects of the type "some.class.name". The results will be stored on disk in an XML document named "myResults.xml". The *DataServiceConstants* class used in this example provides static Strings and QNames used throughout the data service infrastructure. The constant *CQL_RESULT_COLLECTION_QNAME* is the QName which defines the XML type for result sets returned from the data service's query method.
Additionally, the caGrid Data Services infrastructure provides clients that can connect to data services which support WS-Enumeration and the caGrid Bulk Data Transfer infrastructure. Respectively, these clients are *gov.nih.nci.cagrid.data.enumeration.client.EnumerationDataServiceClient* and *gov.nih.nci.cagrid.data.bdt.client.BDTDataServiceClient*. These clients provide public APIs which return an *EnumerationContext* instance or a *BulkDataHandlerReference* respectively. These return types may be used to initialize an instance of the Globus provided *ClientEnumIter* class, or make use of the caGrid Bulk Data Transfer Client directly.
The client classes provided with the data service infrastructure, as well as any other clients generated by the Introduce toolkit, should not be assumed to be thread safe. Each thread communicating with a data service should have its own instance of the client class. Since client instances are unique, multiple data service clients may be used within the same thread or JVM to communicate with multiple data services simultaneously.
h1. Client Side Utilities
----
The caGrid Data Services infrastructure provides a number of utility classes to make invocation of remote data services a simpler process for the application developer. The package *gov.nih.nci.cagrid.data.utilities* contains utilities can invoke standard, enumeration, and BDT data services, as well as tools for handling domain models and working with wsdd and castor mapping files.
h2. Iterating Query Results
When a query is performed using the standard caGrid Data Service client's query method, a CQLQueryResults object is returned. This object is a container for both the results themselves and some information pertaining to their type. This container can contain object results, attribute name/value pairs, or a count of the total number of items in the result set. The difficulty of manipulating a container which may contain such a wide variety of result types stored in it may be handled by an iterator class provided with the data service infrastructure.
The interface *DataServiceIterator* specifies a single *query()* method, which takes a CQL Query and returns an instance of *java.util.Iterator*. The iterator can be used to walk through the results of a query issued to a data service. Three concrete implementations of the *DataServiceIterator* interface are provided, each for communicating with a different type of data service. The *DataServiceHandle* class can be used to invoke a standard caGrid Data Service, while the *EnumerationDataServiceHandle* and *BdtDataServiceHandle* classes are designed for WS-Enumeration and Bulk Data Transfer supporting data services, respectively.
Additionally, this package contains an Iterator utility for handling *CQLQueryResults* instances directly. The class *CQLQueryResultsIterator* implements the *java.util.Iterator* interface, and has three constructors. The choice of constructor affects the behavior of calling the *next()* method.
* CQLQueryResultsIterator(CQLQueryResults)
** Creates an Iterator over the results which will return materialized objects deserialized using the default type mappings.
* CQLQueryResultsIterator(CQLQueryResults, boolean)
** Creates an Iterator over the results, and the value of the Boolean parameter indicates if XML strings should be returned from the *next()* method.
** If the Boolean value is true, XML text of each item is returned, otherwise the results will be deserialized using the default type mappings.
* CQLQueryResultsIterator(CQLQueryResults, InputStream)
** Creates an Iterator over the results, and expects the InputStream will point to a client or server side wsdd.
*** The contents of this wsdd file will be used to configure deserialization of the objects contained in the results.
The class *gov.nih.nci.cagrid.data.utilities.CQLQueryResultsIterator* implements the *java.util.Iterator* interface, and so can be used in a *while()* loop like any other iterator over a Java collection. Depending on what the query to the data service asked for, calls to the *next()* method of this iterator will return different types of objects.
* If the query was for object results, then:
** The iterator returns objects of the type specified as the target for the query.
** Objects which require custom serialization and/or deserialization require that the iterator be configured with an InputStream to the client-config.wsdd file containing the type mappings for the objects.
** Alternatively, the iterator can be configured to return only the XML representation of those objects.
* If the query was for attribute results, including distinct attributes, then:
** Each successive call to *next()* returns an *array* of *TargetAttribute* types.
** These types contain the name of an attribute and its value.
** The value in the TargetAttribute instance will be null if the value was null on the object satisfying the query.Each array of TargetAttributes corresponds to one object instance which satisfied the CQL query criteria.
* If the query was for a count of object instances, then:
** The iterator returns a single java.lang.Long value.
An example usage of this iterator is below:
{code}
import gov.nih.nci.cagrid.cqlquery.CQLQuery;
import gov.nih.nci.cagrid.cqlresultset.CQLQueryResults;
import gov.nih.nci.cagrid.data.utilities.CQLQueryResultsIterator;
import java.util.Iterator;
public class SampleDataServiceInvocation {
public static void main(String[] args) {
try {
DataServiceClient client = new DataServiceClient(args[0]);
CQLQuery query = new CQLQuery();
// build up the query
CQLQueryResults results = client.query(query);
Iterator iter = new CQLQueryResultsIterator(results,
SampleDataServiceInvocation.class.getResourceAsStream(
"client-config.wsdd"));
while (iter.hasNext()) {
java.lang.Object result = iter.next();
// do something with the result object
}
} catch (Exception ex) {
ex.printStackTrace();
System.exit(1);
}
}
}
{code}
More information on the [caGrid Data Services client APIs may be found on this wiki.|dataservices12:Client API]
h1. Utility Classes
----
h2. Utilities
The caGrid data services infrastructure includes several utility classes which can be used to ease development and use of data services. These classes are found in the *gov.nih.nci.cagrid.data.utilities* package distributed with the data service infrastructure.
h2. CQLResultsCreationUtil
This class provides convenience methods for creating CQLQueryResults instances for object results, attribute results, and a counting result. A convenience method for identifier results may be added in the future. The class provides three public static methods, one for each type of results currently supported.
* public static CQLQueryResults createObjectResults(List objects, String targetName, Mappings classToQname)
** objects - a list of Java objects to be placed in a new CQLQueryResults object.
** targetName - the name of the class targeted by the query which produced these object results. All items in the objects list should be of this type.
** classToQname - a mapping from class name to QName. This is a generated Java bean from the XML schema for the data service infrastructure and contains an array of name/value pairs that map class names to QNames.
* public static CQLQueryResults createAttributeResults(List attribArrays, String targetClassname, String\[\] attribNames)
** attribArrays - a List of Object arrays. Each array should have one value for one attribute of an object. These values may be null. The values must be in an order corresponding the ordering of attribute names
** targetClassname - the name of the class targeted by the query which produced these attribute results. All attribute arrays should have some from this type.
** attribNames - the names of the attributes returned by the query. These should be in the same ordering used by the attribute arrays.
* public static CQLQueryResults createCountResults(long count, String targetClassname)
** count - the number of resulting items (objects, attribute sets) from a query
** targetClassname - the name of the class which was the target of the query
h2. DataServiceIterator
The data service iterator is an interface which provides for a query to be submitted to a data service and an Iterator over the result set to be returned. There are two implementations of this interface; one for the standard data service and one for data services with enumeration enabled.
* DataServiceHandle
** The data service handle is the implementation of the data service iterator class for a base caGrid Data Service. It has three constructors, all of which take a DataServiceClient instance. The default constructor needs only this parameter. The other two constructors should be used when custom serialization and deserialization of types has been specified for the service. The extra parameter can be either the filename of a wsdd file containing this mapping information, or an InputStream to the same information.
* EnumDataServiceHandle
** The enum data service handle is the implementation of the data service iterator interface for a WS-Enumeration enabled caGrid Data Service. It has two constructors, both of which take an enumeration data service client instance. The default constructor needs only this parameter. The second constructor takes an IterationConstraints instance, which contains information about how data should be requested from the enumeration data service.
* BdtDataServiceHandle
** The BDT data service handle is an implementation of the data service iterator interface to be used with a BDT-enabled caGrid Data Service. Its behavior is the same as that of the enum data service handle, except that it handles the additional invocation of the BDT context to support enumeration internally.
h2. DomainModelUtils
The domain model utils provide a means to extract useful information from a domain model.
* public static UMLClass getReferencedUMLClass(DomainModel model, UMLClassReference reference)
** To save on document size, domain models do not duplicate class information when an association is defined, but rather use class references based on ID values. These reference values can be traced back to their original UML Class instance with this function.
* public static UMLClass\[\] getAllSuperclasses(DomainModel model, String className)
** Superclasses of a UML Class can be determined by traversing UML class references and generalization information. There are two methods which perform this task in the Domain Model Utils class. One uses a class name and the other extracts the name from an UMLClass instance and passes it to the other.
h2. WsddUtil
The wsdd utility class contains functions to set parameters on a wsdd file. This class is used internally to the Introduce data service extension to edit the wsdd files and change the castor mapping file name.
* public static void setGlobalClientParameter(String clientWsddFile, String key, String value)
** clientWsddFile - the name of the client side wsdd file to edit. When edits are complete, the changed file is saved to the same location.
** key - the key of the parameter. This is the name by which the parameter can be accessed.
** value - the value stored in the parameter
* public static void setServiceParameter(String serverWsddFile, String serviceName, String key, String value)
** serverWsddFile - the name of the server side wsdd file to edit. When edits are complete, the changed file will be saved to the same location
** key - the key of the parameter. This is the name by which the parameter can be accessed
** value - the value stored in the parameter
h1. Validation Tools
----
The caGrid Data Services infrastructure provides for validation of queries with respect to the domain model exposed by a service and the CQL schema, as well as query results for validity with respect to the exposed data types.
h2. CQL Query Syntax
The caGrid Data Service infrastructure provides mechanisms to validate CQL queries for syntactic correctness. While the Axis engine prevents malformed XML from ever being turned into CQL objects, it does not handle XML that does not conform to certain schema restrictions. For example, Axis does not prevent populating multiple child elements of an XML schema 'choice'. For this reason, CQL syntax validation can be enabled on a caGrid data service. This mechanism will reject invalid queries before they ever reach a CQL Query Processor implementation, saving the processor's developer from having to handle them. This same validation can be performed either on the client side or offline completely by using the query validation utilities. For syntax validation, the interface *gov.nih.nci.cagrid.data.cql.validation.CqlStructureValidator* is provided, as are two implementations of this interface. The interface provides the *validateCqlStructure()* method, which takes a single CQLQuery instance parameter, and throws a *MalformedQueryException* if an error is encountered. The default implementation of this interface is the *gov.nih.nci.cagrid.data.cql.validation.ObjectWalkingCQLValidator* class. As its name suggests, this class walks through the CQL object model, seeking out inconsistencies with the published CQL schema. This class also has a *main()' method, which allows it to be run from the command line with a list of CQL query XML files specified as arguments. The data service infrastructure uses this class by default when query validation is enabled. This can be changed to any other class which implements the CqlStructureValidator interface by editing the value of the* dataService*cqlValidatorClass *service property in a generated data service.*
h2. Domain Model Conformance
The Data Service infrastructure also provides mechanisms to validate a structurally sound CQL query against a Domain Model to ensure its restrictions are supported by the domain model's exposed structure. Domain Model validation may be enabled for a caGrid data service, and will be performed on every query submitted to the service before it is passed to the CQL query processor. The interface *gov.nih.nci.cagrid.data.cql.validation.CqlDomainValidator* is provided, along with a single implementation. The interface provides the *validateDomainModel()* method, which takes a single CQLQuery instance parameter, and throws a *MalformedQueryException* if an error is encountered. The lone implementation provided with the caGrid Data Service infrastructure is the *gov.nih.nci.cagrid.data.cql.validation.DomainModelValidator* class. Like the CQL validation instance, this class has a main() method, which allows it to be run from a command line. The arguments should be first a domain model XML file, then a list of CQL query files to be validated. The data service infrastructure uses this class when domain model validation is enabled. This implementation may be substituted for another by editing the value of the *dataService_domainModelValidatorClass* service property in a generated data service.
h2. Results Validation
The data service infrastructure also provides a means to both validate the results of a CQL query against a known set of targets, and to determine what target data types are allowed to be returned by a caGrid Data Service. Every data service exposes a schema through its WSDL that enumerates the data types which may be returned by the data service. This schema appears in generated services under the schemas/<ServiceName> directory as <ServiceName>_CQLResultTypes.xsd.
The utility class *gov.nih.nci.cagrid.data.utilities.validation.CQLQueryResultsValidator* has been provided to both retrieve this file and verify that a CQLQueryResults instance conforms to this schema. An instance of this class can be constructed with either the full path to a data service's WSDL file, or an endpoint reference to a running data service.
The validator exposes two public methods:
This method locates the restriction XSD file and saves its contents into the file specified.
* public void saveRestrictedCQLResultSetXSD(File fileLocation) throws SchemaValidationException
** fileLocation - a file into which the restriction XSD will be saved.
* public void validateCQLResultSet(CQLQueryResults resultSet) throws SchemaValidationException
** resultSet - a set of results generated by a query into a caGrid Data Service. The object contents of this result set will be processed against the restriction XSD.
The CQLQueryResultsValidator class also has a main() method, which takes two arguments. The first argument is a URL to a caGrid Data Service, which will be used to retrieve the result restriction schema. The second argument should be the filename of a CQLQueryResults instance serialized to an XML document.
h1. CQL Query Processors
----
The CQL Query Processor is a pluggable implementation which handles the details of processing CQL against some backend data source and produces a CQLQueryResults instance. The particular implementation used is determined by a value in the service's deployment properties, and an instance of the processor is loaded at runtime via reflection. The query processor may optionally supply a set of properties via the *getRequiredParameters()*. These properties may be configured prior to deployment of the service, and are passed into the query processor when it is first instantiated via the initialize() method. Additionally, the query processor may implement the method *getPropertiesFromEtc()*. This method returns a *java.collections.Set* containing a subset of keys from the getRequiredParameters() method whose values should be returned as file system paths relative to the etc directory of the deployed grid service.
When a query is issued to the data service, the query will be passed along to the CQL Query Processor instance's processQuery() method. This method may throw both a QueryProcessingException in the case of an error in handling the query and a MalformedQueryException in cases where the query was found to be invalid for any reason.
h2. Implementation
*See Also:* [How-to Implement CQL Query Processors|dataservices13:How-to Implement CQL Query Processors]
All query processor implementations are required to extend the abstract base class *gov.nih.nci.cagrid.data.cql.CQLQueryProcessor*. This base class declares several methods which are meant to be overridden, however the only method a query processor is required to implement is the processQuery() method. This method takes a CQL query and returns a CQLQueryResults instance. This method is declared abstract in the base class, which enforces this implementation requirement. Generally, this method should be able to translate CQL into whatever native query language is required by the back end data resource, and translate the result set into a CQLQueryResults instance.
CQL Query Processors are designed to be configurable at runtime by a set of properties. These properties are modifiable via the data service extension to the Introduce toolkit, or manually by editing a configuration file once a service has been built. The base CQL query processor class provides a method to retrieve required configuration parameters and their associated default values:
* *public Properties getRequiredParameters()*
** This method is provided by default and returns an empty *java.util.Properties* instance. CQL implementers who require properties to be configured should override this method to return a populated Properties instance. If a property is optional, its value should be set to an empty string. All property keys must be valid Java identifiers meaning that there cannot be any spaces or punctuation in the key.
Additionally, a method is provided to specify a subset of those properties are meant to be file locations:
* *getPropertiesFromEtc()*
** This method returns a *java.util.Set* containing a subset of keys from the getRequiredParameters() method whose values should be returned as file system paths relative to the etc directory of the deployed grid service.
The query processor base class has two protected methods which provide access to any user configured parameters and an input stream to the server side wsdd configuration file. The method *getConfiguredParameters()* returns a *java.util.Properties* instance containing all the keys defined in the properties returned by getRequiredParameters(), but with either the default or a developer configured value associated with each. The method getConfiguredWsddStream() returns an InputStream instance which will read in the contents of the server side wsdd configuration file. The call to the query processor's initialize method, and in turn the population of these values, occurs when the data service is first instantiated, typically when the container is started. Calls to these methods before this time will return null. For this reason, the constructor of the CQL Query Processor implementation must be fairly simple, and initialization of any resources required delayed until the initialize() method has been called.
{code}
/**
* Processes the CQL Query
* @param cqlQuery
* @return The results of processing a CQL query
* @throws MalformedQueryException
* Should be thrown when the query itself does not conform to the
* CQL standard or attempts to perform queries outside of
* the exposed domain model
* @throws QueryProcessingException
* Thrown for all exceptions in query processing not related
* to the query being malformed
*/
public abstract CQLQueryResults processQuery(CQLQuery cqlQuery)
throws MalformedQueryException, QueryProcessingException;
{code}
The only method which is absolutly required to be implemented by CQL query processors is the processQuery() method. This is the method which executes the CQL query against its data source and generates an appropriate set of results. There are utilities (discussed earlier) to make generation of this result set a simpler process. At the time this method is called, the return values of getConfiguredParameters() and getConfiguredWsddStream() will be non-null.
The processQuery() method throws both a MalformedQueryException and a QueryProcessingException. Malformed query exceptions should be thrown under conditions where the query is somehow incorrect syntactically, or uses features of the CQL language which are not yet supported in the query processor implementation. If query syntax validation is enabled in the data service infrastructure, then it may be assumed that all queries reaching the processQuery() method are at least well formed CQL. Query processing exceptions should be thrown when some error occurs which prevents successful resolution of the query request. These conditions may include database errors, file system problems, or misconfiguration of properties.
h1. Service Styles Architecture
----
*See also:* [Data Service Styles|dataservices12:Service Styles]
Data service styles may be added to the data service extension to provide additional functionality to the service creation and configuration processes, and are selected by the service developer when a service is first created. Styles may be installed at any time after the primary caGrid Data Services extension has been installed by adding to the styles directory found in the installed data service extension directory. Each style must provide its own directory in which files it uses will be placed, but no restriction is made on the naming of these directories. At the top level of each style directory, a style.xml file must exist, describing the style. This document describes the style's name, which caGrid and Data Service versions it is compatible with, and information on which classes are to be loaded for each component of the style. If the service developer selects no style at service creation time, the service is created with only the standard data services components and query method, and ready to have a custom domain model, query processor, and other data service requirements selected and configured.
h2. Functionality Extended by Styles
Data Service styles may add functionality to any or all of the following areas of service develo-pment with the Introduce toolkit:
* Creation Wizard
** The service style may supply a list of wizard panels to be displayed and chained together in a wizard-like fashion to break the setup process for the service style into a series of steps. These panels will be shown in a wizard dialog when a service style is selected at service creation time.
* Post-creation processing
*** Just as Introduce extensions may add functionality to the service creation process, data service styles may add processing capabilities to this step.
* Modification User Interface
** The style may supply a graphical panel which will be added to the data service tab in the Introduce service modification viewer. Th-is tab can be used to configure any style-specific options in the service.
* Post-code generation processing
** The style may add functionality to the code generation process of service modification. This processing will be invoked each time the service is modified and saved in Introduce.
h3. Implementation of a Style
*Main Article* [Data Service Style Creation|dataservices12:How-To Create Data Service Styles]





