Access Keys:
Skip to content (Access Key - 0)

Data Services


Data Services 1.3 Design Guide


Navigation
caGrid caGrid 1.3 Documentation
Data Services Data Services 1.3 Documentation Data Services 1.3 Design Guide
Contents


Architecture


The caGrid data services architecture is composed of several parts which interoperate to facilitate general CQL query processing, validation, auditing, and configuration.

The central Java class around which the majority of the data services service-side infrastructure revolves is the BaseServiceImpl class. This abstract class provides the basic implementation of the grid service and auditing support methods. The various standard query mechanisms such as query and queryWithEnumeration are implemented by concrete classes which extend from this base class.

On startup of the data service, the service class configures itself by reading from the ServiceConfigUtil class. This class acts as a wrapper around the dynamically generated ServiceConfiguration class which Introduce creates, and provides accessors to data service specific configuration properties. At this time, the data service will initialize any CQL validators required, instantiate and configure auditors, and create a configured instance of the CQL query processor.

When a CQL query is passed to the data service, the configuration is consulted to determine if validation of that query needs to be performed. If validation is specified, the validator implementations specified in the configuration are instantiated, and the query is passed off to them. If validation fails, an exception is thrown indicating the cause of the failure and returned to the client. Otherwise, query processing proceeds.

Once validation is complete, the query is passed along to the CQL query processor implementation specified by the configuration. If the query fails at this point, an exception indicating the nature of the problem is thrown and returned to the client. Otherwise, the results of the query processing operation are returned.

Throughout this process, data service auditors are kept appraised of the current processing status through a series of callbacks. caGrid provides a single concrete implementation of the auditor interface which logs information to the local file system.

Based on Standard Grid Services

The caGrid Data Services architecture is designed as a specialization of standard grid services. As such, some basic requirements for Data Services are immediately met:

  • Security integration:
    • All security concepts that apply to any other grid service are immediately available and enforced on caGrid data services.
  • Simplified creation tooling:
    • The Introduce Toolkit allows creation of grid services, including configuring their metadata, security, and service functionality using a simple graphical interface.
    • Introduce also provides a pluggable back end architecture and graphical user interface, which allows specialized services to be created with a minimum of interaction from end users. Data Services leverage this extensibility to create services with a standard query interface, metadata, and core implementation.

Specialization of Features

Further requirements are met through implementation of a standardized query schema and client tooling to manipulate it:

  • The standard query language is CQL, defined by a schema registered in the GME.
    • Allows creation of CQL Query Java objects which can be passed to the caGrid data service's query method.
    • CQL Java Object model can be populated so as to describe the target data type and all qualifications and restrictions that must be met for the requested object.
    • Queries may be imported from an XML representation either on disk, or any other source of String input.
  • Query results described by a CQL Results schema, also registered in the GME.
  • Client tooling implemented as an iterator over the result set. The iterator may deserialize the XML returned in the CQL Result object as a series of registered objects. An alternative implementation simply returns the XML without any processing applied to it.
  • Data services expose a metadata document known as a domain model. This model defines the data types which are exposed by the data service and their relationships to one another. This model also contains semantic information, which allows data services to be discovered in the grid based on concept codes.

Service Interface


All caGrid data services implement a standard interface in the form of a WSDL document, which contains a single 'query' method. This method takes a single CQL Query parameter, and returns a single CQL Result Set object. All data services must follow this implementation pattern, but are free to include additional methods, such as domain-specific querying and data upload capabilities.

To both simplify creation of data services and ensure interoperability between data services, the basic implementation of this query operation is provided by the caGrid data services infrastructure, and is imported into user-generated services as they are created.
The query result schema wraps the serialized XML of registered data objects. These objects are identified by their schemas, which are included in the WSDL of the caGrid Data Service. This enables clients to discover which data types are available from a given service.

Query Processors


As caGrid data services are intended to be an abstraction away from an arbitrary underlying data resource, the data services infrastructure provides a means for customizing the implementation by which queries are executed against the data resource. The data services infrastructure provides an abstract base class for querying a data source with CQL, which data providers are required to implement, known as the CQL query processor. Query processor implementations are expected to take a single CQL query and produce an appropriate result set. Query processors are pluggable at runtime to the data service infrastructure, and are loaded via reflection. Implementations of query processors may specify configuration properties they require for proper functionality. These properties are configurable by the service developer through a graphical interface in the Introduce toolkit, as well as at deploy time of the service. At runtime, these properties and their corresponding values are passed to the query processor implementation.
To aid in moving existing silver level data sources on to the grid, several implementations of the CQL query processor are provided with the caGrid data service infrastructure to perform queries against a caCORE SDK generated data source. Serialization of SDK generated objects is also automatically configured when the service is created through the Introduce Toolkit.

Data Service Styles


Main Article: Data Service Styles

The data service creation and modification system is pluggable, allowing for specialized data services to be easily created and configured with the Introduce toolkit. The style concept insulates the data service core infrastructure from the complexities associated with supporting various data sources, such as the caCORE SDK.

Querying


Data Services are accessed via their query() method, which takes a single CQL Query parameter. This method can also throw both a MalformedQueryException and a QueryProcessingException to indicate error conditions.

Alternate delivery mechanisms such as WS-Enumeration and Bulk Data Transport are also supported and have their own specialized query methods and standard WSDL interfaces.

Creation


caGrid Data Services can be built with a set of extensions to the Introduce Toolkit. This provides grid service developers with a simple and well defined starting point to create caBIG gold compliant Data Services.

From 1.2 to 1.3


CaGrid 1.3 maintains the principle of backwards compatibility with the previous version of caGrid. This holds true for caGrid data services as well. Data services developed with caGrid 1.0, 1.1, and 1.2 can be queried with the 1.3 data service client side tooling, however, as is the case with all other caGrid services, the libraries and internal APIs are generally different enough that data services developed with caGrid 1.3 should not be deployed along side services from other versions. All new services should be developed using caGrid 1.3 to take advantage of new features and tooling, as well as any bug fixes.

Integration with caCORE SDK 4.1


The new 4.1 / 4.1.1 release of caCORE SDK integrates CQL processing natively by using the CQL to HQL translator from caGrid 1.2's support for caCORE SDK version 4.0. This allows the new query processor for caCORE 4.1 to leverage this functionality and move the processing of CQL queries into the caCORE SDK application directly. The new features and functionality from caGrid 1.2's support for caCORE SDK 4.0 carry forward to the new version.

A new wizard has been created which simplifies creation of caGrid 1.3 data services backed by caCORE SDK 4.1. Service developers wishing to create data services backed by a caCORE SDK 4.1 system should begin by taking the tutorial.

Last edited by
Knowledge Center (1515 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence