|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Comment:
Changes (2)
View Page Historyh1. Federated Query Processor 1.4 Developers Guide
----
{cagrid-1.4-docs-nav:name=FQP|guidetype=Developers}This document is intended to provide information to developers who wish to make use of the Federated Query Processor grid service and local Federated Query Engine APIs.
{cagridtoc:exclude=Federated Query Processor 1.4 Developers Guide}\\
h1. Prerequisites
----
To get started developing against the FQP APIs, your project will require the Java libraries found in the FQP project's ext/dependencies/jars directory, and those in its build/lib directory.
Developers using Ivy to integrate with the caGrid build artifacts may use the following line in their dependencies:
{code}<dependency rev="latest.integration" org="caGrid" name="cql" conf="myconfiguration->cql"/>
{code}
h1. Federated Query Engine API
----
The Federated Query Engine is the core component of the Federated Query Processor, and can be used as either a standalone API, or within the context of the Federated Query Processor grid service.
h2. Constructing an Instance
There are two constructors for the Federated Query Engine:
* public FederatedQueryEngine(GlobusCredential credential, QueryExecutionParameters executionParameters)
* public FederatedQueryEngine(GlobusCredential credential, QueryExecutionParameters executionParameters, ExecutorService workExecutor)
In both cases, all parameters are optional, and the first constructor is simply a convenience method which passes *null* to the second one, and is in place to maintain backwards compatibility with the Federated Query Engine from caGrid 1.2 and earlier.
The three parameters can be used as follows:
* *credential*
** A Globus client credential can be passed along to the Federared Query Engine, and will be used to query secure data services involved in any DCQL queries issued to the engine.
* *executionParameters*
** Query Execution Parameters (described later) allow the user to define how they'd like the engine to behave with respect to things like target data service failures, retries, and timeouts.
* *workExecutor*
** The Federated Query Engine will perform query related tasks in threads. This has the benefit of potentially greatly speeding up the final stage of query processing, which involves broadcasting the final CQL query generated by the engine to all target data services specified by the DCQL query. The Executor Service passed in through this parameter allows users to control the way those threads are allocated and managed.
h2. Query Execution Parameters
The Federated Query Engine allows the caller to specify parameters which control various aspects of query execution. This _QueryExecutionParameters_ data type has been designed to be flexible and extensible for future versions of the Federated Query Processor. For this version, it contains a child data type, _TargetDataServiceQueryBehavior_, which controls how the query engine handles various failure conditions when submitting CQL to the target data services specified in the DCQL query. This type contains three properties:
* _failOnFirstError_\*\* _Type:_ Boolean
** {note}If this property is set to *true*, the other two properties are meaningless.{note}
** This property controls how the query engine handles failures while querying target data services.
** If set to *true*, the engine will terminate query processing and throw an exception when querying against any target data service fails for any reason. No query results will be returned.
** If set to *false*, the other two parameters are used to determine how to handle the failure, and a _partial result set_ may be returned.
* _retries_\*\* _Type:_ Integer
** This property specifies the number of times the query engine will retry a query against a target data service if it fails to execute.
* _timeoutPerRetry_\*\* _Type:_ Integer
** This property specifies the number of seconds the query engine will wait before retrying a query against a target data service if it fails to execute.
h2. API Methods
The Federated Query Engine exposes two methods for executing a DCQL query:
h3. Simple Query Execution
The *execute* method takes a single DCQL query parameter and returns a single DCQLQueryResultsCollection instance. This method may throw a FederatedQueryProcessingException
{code}public DCQLQueryResultsCollection execute(DCQLQuery dcqlQuery)
throws FederatedQueryProcessingException
{code}This method processes the DCQL query by breaking it down into parts according to foreign join conditions and generating further CQL queries until it has produced a single CQL query which is then distributed to all target data services specified by the query. The results of this final query are placed in the DCQLQueryResultsCollection according to which target data service returned them, and then returned to the caller.
h3. Execute and Aggregate Results
The *executeAndAggregateResults* method takes a single DCQL query parameter and returns a single CQLQueryResults instance. This method may also throw a FederatedQueryProcessingException.
{code}public CQLQueryResults executeAndAggregateResults(DCQLQuery dcqlQuery)
throws FederatedQueryProcessingException
{code}This method processes the DCQL query by breaking it down into parts according to foreign join conditions and generating further CQL queries until it has produced a single CQL query which is then distributed to all target data services specified by the query. The results of this final query are aggregated into a single CQL query results instance, which allows it to be manipulated by existing data service infrastructure tooling (iterators, enumerators, etc), while loosing the context of which target data service produced a given result.
h3. Federated Query Processing Exceptions
These exceptions may be thrown from either public API method when something goes wrong in the course of processing a DCQL query. Several common causes of this exception are:
* Failure of a data service involved in the DCQL query
** Failure handling behavior for target data services is controllable by the Query Execution Parameters used to construct the Federated Query Engine
* Invalid CQL is passed along to a data service (typically due to invalid DCQL originally)
* Bad / unrecognized user certificate
h2. Query Processing Status Listeners
The Federated Query Engine provides an API by which updates to the current status of query processing may be received and handled, much like the callback APIs found in Java Swing components. Leveraging this API requires implementing an interface, and passing an instance of that implementation in to the Federated Query Engine. The engine supports any number of listeners to this API, however just as in user interface callbacks, the implementation should be reasonably fast so as not to slow down the work of the query engine.
The Federated Query Engine provides three public methods for adding, removing, and listing processing status listeners:
* *addStatusListener*
** Parameter: A processing status listener to be added to the engine's list of listeners
** Returns: *none*
** Adds a status listener instance to the list of listeners which will be notified of various query processing events
* *getStatusListeners*
** Parameter: *none*
** Returns: An array of status listeners which are registered to the engine
* *removeStatusListener*
** Parameter: A processing status listener to be removed from the engine's list of listeners
** Returns: boolean true if the listener was found and removed, false otherwise
[Details of the FQP Processing Status Listener interface|fqp14:FQPProcessingStatusListener]
h1. Federated Query Processor Client
----
The Federated Query Processor Client is the client-side API for communicating with the caGrid Federated Query Processor Service.
h2. Constructing an Instance
The Federated Query Processor Client has four constructors, most of which are simply convenience accessors to a final constructor. The various constructors are as follows:
* public FederatedQueryProcessorClient(String url) throws MalformedURIException, RemoteException
* public FederatedQueryProcessorClient(String url, GlobusCredential proxy) throws MalformedURIException, RemoteException
* public FederatedQueryProcessorClient(EndpointReferenceType epr) throws MalformedURIException, RemoteException
* public FederatedQueryProcessorClient(EndpointReferenceType epr, GlobusCredential proxy) throws MalformedURIException, RemoteException
The *url* parameter passed in the first two constructors is the URL of the Federated Query Processor Service you wish to connect to. The *epr* parameter in the last two constructors is an Axis Endpoint Reference which resolves to the FQP service you wish to connect to. The *proxy* parameter is a Globus Credential Proxy which you may use to authenticate to and communicate securely with the FQP service. These constructors should look familiar to users of other [Introduce|introduce14:Home]\-generated caGrid services, since they are the standard client constructors.
h3. Connecting to a Secure FQP Service
{import:fqp13:Connecting to Secure FQP Services}
h2. API Methods
The Federated Query Processor Client offers three public methods for executing DCQL queries:
h3. Simple Query Execution
The _execute_ method takes a single DCQL query parameter and returns a single DCQLQueryResultsCollection instance. This method may throw a FederatedQueryProcessingException
{code:java} public DCQLQueryResultsCollection execute(DCQLQuery dcqlQuery)
throws RemoteException, FederatedQueryProcessingFault
{code}This method sends a DCQL query to the service, which then uses the Federated Query Engine to processes the DCQL query by breaking it down into parts according to foreign join conditions and generating further CQL queries until it has produced a single CQL query which is then distributed to all target data services specified by the query. The results of this final query are placed in the DCQLQueryResultsCollection according to which target data service returned them, and then returned to the caller.
h3. Execute and Aggregate Results
The _executeAndAggregateResults_ method takes a single DCQL query parameter and returns a single CQLQueryResults instance. This method may also throw a FederatedQueryProcessingException.
{code:java} public CQLQueryResults executeAndAggregateResults(DCQLQuery dcqlQuery)
throws RemoteException, FederatedQueryProcessingFault
{code}This method sends a DCQL query to the service, which then uses the Federated Query Engine to processes the DCQL query by breaking it down into parts according to foreign join conditions and generating further CQL queries until it has produced a single CQL query which is then distributed to all target data services specified by the query. The results of this final query are aggregated into a single CQL query results instance, which allows it to be manipulated by existing data service infrastructure tooling (iterators, enumerators, etc), while loosing the context of which target data service produced a given result.
h3. Asynchronous Query Execution
The Federated Query Processor Client offers an API to perform a DCQL query asynchronously. With this functionality, a client can issue a DCQL query, immediately receive a Federated Query Results Client, and use that new client to retrieve results at a later time, potentially using WS-Notification functionality to determine when the query has completed processing on the service and results are available.
The _executeAsynchronously_ method takes a single DCQL query parameter and returns a single Federated Query Results Client instance. This method may also throw a Malformed URI Exception, and a Remote Exception.
{code:java} public FederatedQueryResultsClient executeAsynchronously(DCQLQuery query)
throws RemoteException, org.apache.axis.types.URI.MalformedURIException
{code}
h3. Specialized Query Execution
The Federated Query Processor Client offers an API to perform specialized DCQL queries in an asynchronous fashion.
{code:java} public FederatedQueryResultsClient query(DCQLQuery query,
DelegatedCredentialReference delegatedCredentialReference,
QueryExecutionParameters queryExecutionParameters)
throws RemoteException, org.apache.axis.types.URI.MalformedURIException,
FederatedQueryProcessingFault, InternalErrorFault
{code}The _query_ method is new for the 1.4 version of the Federated Query Processor service. It takes three parameters:
* *query*
** The DCQL query to execute on the server
* *delegatedCredentialReference*
** A reference to a delegated credential. The Federated Query Processor service will execute queries agaisnt data services involved in the DCQL query using the delegated credential. This allows clients to perform queries against secure data services using their own credentials for authentication and authorization.
** This parameter may be null
* *queryExecutionParameters*
** Parameters which control the behavior of the query processor with respect to various failure and retry conditions, especially for target data services.
** This parameter may be null.
[Details of the Federated Query Results Client|fqp14:FederatedQueryResultsClient]
{include:fqp14:FederatedQueryResultsClient}





