Data Services 1.2 Design Guide
| |
|
|
| |
Table of Contents |
|
| |
|
|
Architecture
Based on Standard Grid Services
The caGrid Data Services architecture is designed as a specialization of standard grid services. As such, some basic requirements for Data Services are immediately met:
- Security integration:
- All security concepts that apply to any other grid service are immediately available and enforced on caGrid data services.
- Simplified creation tooling:
- The Introduce Toolkit allows creation of grid services, including configuring their metadata, security, and service functionality using a simple graphical interface.
- Introduce also provides a pluggable back end architecture and graphical user interface, which allows specialized services to be created with a minimum of interaction from end users. Data Services leverage this extensibility to create services with a standard query interface, metadata, and core implementation.
Specialization of Features
Further requirements are met through implementation of a standardized query schema and client tooling to manipulate it:
- The standard query language is CQL, defined by a schema registered in the GME.
- Allows creation of CQL Query Java objects which can be passed to the caGrid data service's query method.
- CQL Java Object model can be populated so as to describe the target data type and all qualifications and restrictions that must be met for the requested object.
- Queries may be imported from an XML representation either on disk, or any other source of String input.
- Query results described by a CQL Results schema, also registered in the GME.
- Client tooling implemented as an iterator over the result set. The iterator may deserialize the XML returned in the CQL Result object as a series of registered objects. An alternative implementation simply returns the XML without any processing applied to it.
- Data services expose a metadata document known as a domain model. This model defines the data types which are exposed by the data service and their relationships to one another. This model also contains semantic information, which allows data services to be discovered in the grid based on concept codes.
Service Interface
All caGrid data services implement a standard interface in the form of a WSDL document, which contains a single 'query' method. This method takes a single CQL Query parameter, and returns a single CQL Result Set object. All data services must follow this implementation pattern, but are free to include additional methods, such as domain-specific querying and data upload capabilities.
To both simplify creation of data services and ensure interoperability between data services, the basic implementation of this query operation is provided by the caGrid data services infrastructure, and is imported into user-generated services as they are created.
The query result schema wraps the serialized XML of registered data objects. These objects are identified by their schemas, which are included in the WSDL of the caGrid Data Service. This enables clients to discover which data types are available from a given service.
Query Processors
As caGrid data services are intended to be an abstraction away from an arbitrary underlying data resource, the data services infrastructure provides a means for customizing the implementation by which queries are executed against the data resource. The data services infrastructure provides an abstract base class for querying a data source with CQL, which data providers are required to implement, known as the CQL query processor. Query processor implementations are expected to take a single CQL query and produce an appropriate result set. Query processors are pluggable at runtime to the data service infrastructure, and are loaded via reflection. Implementations of query processors may specify configuration properties they require for proper functionality. These properties are configurable by the service developer through a graphical interface in the Introduce toolkit, as well as at deploy time of the service. At runtime, these properties and their corresponding values are passed to the query processor implementation.
To aid in moving existing silver level data sources on to the grid, several implementations of the CQL query processor are provided with the caGrid data service infrastructure to perform queries against a caCORE SDK generated data source. Serialization of SDK generated objects is also automatically configured when the service is created through the Introduce Toolkit.
Data Service Styles
The data service creation and modification system is pluggable, allowing for specialized data services to be easily created and configured with the Introduce toolkit. The style concept insulates the data service core infrastructure from the complexities associated with supporting various data sources, such as the caCORE SDK.
Style Architecture

Data service styles may be added to the data service extension to provide additional functionality to the service creation and configuration processes, and are selected by the service developer when a service is first created. Styles may be installed at any time after the primary caGrid Data Services extension has been installed by adding to the styles directory found in the installed data service extension directory. Each style must provide its own directory in which files it uses will be placed, but no restriction is made on the naming of these directories. At the top level of each style directory, a style.xml file must exist, describing the style. This document describes the style's name, which caGrid and Data Service versions it is compatible with, and information on which classes are to be loaded for each component of the style. If the service developer selects no style at service creation time, the service is created with only the standard data services components and query method, and ready to have a domain model, query processor, and other data service requirements selected and configured.
Functionality Extended by Styles
Data Service styles may add functionality to any or all of the following areas of service development with the Introduce toolkit:
- Creation Wizard
- The service style may supply a list of wizard panels to be displayed and chained together in a wizard-like fashion to break the setup process for the service style into a series of steps. These panels will be shown in a wizard dialog when a service style is selected at service creation time.
- Post-creation processing
- Just as Introduce extensions may add functionality to the service creation process, data service styles may add processing capabilities to this step.
- Modification User Interface
- The style may supply a graphical panel which will be added to the data service tab in the Introduce service modification viewer. This tab can be used to configure any style-specific options in the service.
- Post-code generation processing
- The style may add functionality to the code generation process of service modification. This processing will be invoked each time the service is modified and saved in Introduce.
Querying
Data Services are accessed via their query() method, which takes a single CQL Query parameter. This method can also throw both a MalformedQueryException and a QueryProcessingException to indicate error conditions.
Alternate delivery mechanisms such as WS-Enumeration and BDT are also supported and have their own specialized query methods.
Creation
caGrid Data Services can be built with a set of extensions to the Introduce Toolkit. This provides grid service developers with a simple and well defined starting point to create caBIG gold compliant Data Services.
From 1.1 to 1.2
CaGrid 1.2 maintains the principle of backwards compatibility with caGrid 1.1. This holds true for caGrid data services as well. Data services developed with caGrid 1.0 and 1.1 can be queried with the 1.2 data service client side tooling. New services should be developed using caGrid 1.2 to take advantage of new features and tooling, as well as any bug fixes.
Integration with caCORE SDK 4.0
An all-new query processor has been implemented to facilitate use of a caCORE SDK 4.0 backend with caGrid 1.2 data services. The new version of caCORE utilizes a new version of Hibernate, which allows additional query functionality to be moved to the database layer, and simplifying the work the query processor itself must perform. Additionally, version 4.0 of caCORE SDK provides support for Hibernate positional query parameters, which enables queries to be run which make use of special characters.
Implementation Details
The new query processor depends on several jar files which either do not exist in caCORE 3.2(.1), or are newer versions of those used in the older version of the SDK. To simplify implementation of support for the new version of the SDK, all implementation specific to it has been placed in its own unique project within caGrid. This project contains two major components. The first is a utility to convert CQL to parameterized HQL (Hibernte Query Language), and tooling which connects to the caCORE application service, submits the translated HQL, and converts the results into a CQL Result Set. The other component is the data service style which presents the graphical front end to a service developer to create a data service depending on the SDK, as well as some tooling to properly configure mappings between XML schema and the back end domain model exposed by the SDK.





