Access Keys:
Skip to content (Access Key - 0)

Federated Query Processor


Federated Query Processor 1.4 Developers Guide


Navigation
caGrid caGrid 1.4 Documentation
FQP FQP 1.4 Documentation FQP 1.4 Developers Guide
This document is intended to provide information to developers who wish to make use of the Federated Query Processor grid service and local Federated Query Engine APIs.

Contents

Prerequisites


To get started developing against the FQP APIs, your project will require the Java libraries found in the FQP project's ext/dependencies/jars directory, and those in its build/lib directory.

Developers using Ivy to integrate with the caGrid build artifacts may use the following line in their dependencies:

<dependency rev="latest.integration" org="caGrid" name="fqp" conf="myconfiguration->client"/>

Federated Query Engine API


The Federated Query Engine is the core component of the Federated Query Processor, and can be used as either a standalone API, or within the context of the Federated Query Processor grid service.

Constructing an Instance

There are three constructors for the Federated Query Engine:

  • public FederatedQueryEngine(GlobusCredential credential, QueryExecutionParameters executionParameters)
  • public FederatedQueryEngine(GlobusCredential credential, QueryExecutionParameters executionParameters, ExecutorService workExecutor)
  • public FederatedQueryEngine(GlobusCredential credential, QueryExecutionParameters executionParameters, ExecutorService workExecutor, DomainModelLocator modelLocator)

In all of the above cases, all parameters are optional, and the first two constructors are simply a convenience methods which pass null for the additional parameters to the third one, and is in place to maintain backwards compatibility with the Federated Query Engine from caGrid 1.3 and earlier.

The four parameters can be used as follows:

  • credential
    • A Globus client credential can be passed along to the Federared Query Engine, and will be used to query secure data services involved in any DCQL queries issued to the engine.
  • executionParameters
    • Query Execution Parameters (described later) allow the user to define how they'd like the engine to behave with respect to things like target data service failures, retries, and timeouts.
  • workExecutor
    • The Federated Query Engine will perform query related tasks in threads. This has the benefit of potentially greatly speeding up the final stage of query processing, which involves broadcasting the final CQL query generated by the engine to all target data services specified by the DCQL query. The Executor Service passed in through this parameter allows users to control the way those threads are allocated and managed.
  • modelLocator
    • The domain model locator which will be used in operations where query processing requires information from a service's domain model. The domain model may be required when converting CQL 2 to CQL 1 for communicating with caGrid 1.3 or earlier data services and making decisions regarding data type conversion. The FQP tools provide a simple concrete implementation of the domain model locator interface in gov.nih.nci.cagrid.fqp.common.DefaultDomainModelLocator.

Query Execution Parameters

The Federated Query Engine allows the caller to specify parameters which control various aspects of query execution. This QueryExecutionParameters data type has been designed to be flexible and extensible for future versions of the Federated Query Processor. For this version, it contains a child data type, TargetDataServiceQueryBehavior, which controls how the query engine handles various failure conditions when submitting CQL to the target data services specified in the DCQL query. This type contains three properties:

  • failOnFirstError
    • Type: Boolean
      If this property is set to true, the other two properties are meaningless.
    • This property controls how the query engine handles failures while querying target data services.
    • If set to true, the engine will terminate query processing and throw an exception when querying against any target data service fails for any reason. No query results will be returned.
    • If set to false, the other two parameters are used to determine how to handle the failure, and a partial result set may be returned.
  • retries
    • Type: Integer
    • This property specifies the number of times the query engine will retry a query against a target data service if it fails to execute.
  • timeoutPerRetry
    • Type: Integer
    • This property specifies the number of seconds the query engine will wait before retrying a query against a target data service if it fails to execute.

API Methods

The Federated Query Engine exposes two methods for executing a DCQL 1, and two methods for executing a DCQL 2 query:

Simple Query Execution

Two execute methods exist in the Federated Query Engine; one for each version of DCQL. They both take a single DCQL query parameter and return a single DCQLQueryResultsCollection instance. This method may throw a FederatedQueryProcessingException

    public DCQLQueryResultsCollection execute(DCQLQuery dcqlQuery)
    throws FederatedQueryProcessingException

The version of this method which takes DCQL 1 first converts the query to DCQL 2 then invokes the method which natively takes DCQL 2. The DCQL 2 execution method processes the DCQL 2 query by breaking it down into parts according to foreign join conditions and generating further CQL 2 queries until it has produced a single CQL 2 query which is then distributed to all target data services specified by the query. For data services which do not support CQL 2, the query is first converted to CQL 1, then the results are converted back to CQL 2 results. The results of this final query are placed in the DCQLQueryResultsCollection according to which target data service returned them, and then returned to the caller.

Execute and Aggregate Results

Two executeAndAggregateResults methods exist; one for each version of DCQL. Both take a single DCQL or DCQL 2 query parameter and returns a single CQLQueryResults instance. This method may also throw a FederatedQueryProcessingException.

    public CQLQueryResults executeAndAggregateResults(DCQLQuery dcqlQuery)
    throws FederatedQueryProcessingException

The version of this method which takes DCQL 1 first converts the query to DCQL 2 then invokes the method which natively takes DCQL 2. The DCQL 2 execution method processes the DCQL 2 query by breaking it down into parts according to foreign join conditions and generating further CQL 2 queries until it has produced a single CQL 2 query which is then distributed to all target data services specified by the query. For data services which do not support CQL 2, the query is first converted to CQL 1, then the results are converted back to CQL 2 results. The results of this final query are aggregated into a single CQL query results instance, which allows it to be manipulated by existing data service infrastructure tooling (iterators, enumerators, etc), while loosing the context of which target data service produced a given result.

Federated Query Processing Exceptions

These exceptions may be thrown from either public API method when something goes wrong in the course of processing a DCQL query. Several common causes of this exception are:

  • Failure of a data service involved in the DCQL query
    • Failure handling behavior for target data services is controllable by the Query Execution Parameters used to construct the Federated Query Engine
  • Invalid CQL is passed along to a data service (typically due to invalid DCQL originally)
  • Bad / unrecognized user certificate

Query Processing Status Listeners

The Federated Query Engine provides an API by which updates to the current status of query processing may be received and handled, much like the callback APIs found in Java Swing components. Leveraging this API requires implementing an interface, and passing an instance of that implementation in to the Federated Query Engine. The engine supports any number of listeners to this API, however just as in user interface callbacks, the implementation should be reasonably fast so as not to slow down the work of the query engine.

The Federated Query Engine provides three public methods for adding, removing, and listing processing status listeners:

  • addStatusListener
    • Parameter: A processing status listener to be added to the engine's list of listeners
    • Returns: none
    • Adds a status listener instance to the list of listeners which will be notified of various query processing events
  • getStatusListeners
    • Parameter: none
    • Returns: An array of status listeners which are registered to the engine
  • removeStatusListener
    • Parameter: A processing status listener to be removed from the engine's list of listeners
    • Returns: boolean true if the listener was found and removed, false otherwise

Details of the FQP Processing Status Listener interface

Federated Query Processor Client


The Federated Query Processor Client is the client-side API for communicating with the caGrid Federated Query Processor Service.

Constructing an Instance

The Federated Query Processor Client has four constructors, most of which are simply convenience accessors to a final constructor. The various constructors are as follows:

  • public FederatedQueryProcessorClient(String url) throws MalformedURIException, RemoteException
  • public FederatedQueryProcessorClient(String url, GlobusCredential proxy) throws MalformedURIException, RemoteException
  • public FederatedQueryProcessorClient(EndpointReferenceType epr) throws MalformedURIException, RemoteException
  • public FederatedQueryProcessorClient(EndpointReferenceType epr, GlobusCredential proxy) throws MalformedURIException, RemoteException

The url parameter passed in the first two constructors is the URL of the Federated Query Processor Service you wish to connect to. The epr parameter in the last two constructors is an Endpoint Reference which resolves to the FQP service you wish to connect to. The proxy parameter is a Globus Credential Proxy which you may use to authenticate to and communicate securely with the FQP service. These constructors should look familiar to users of other Introduce-generated caGrid services, since they are the standard client constructors.

Connecting to a Secure FQP Service

The 1.2 and earlier versions of the Federated Query Processor client contained special modifications to always connect with a caller ID when the client supplied a credential. This ensured that the caller's ID was always passed on the grid service when connecting securely. In caGrid 1.4, all Introduce-generated grid service clients implement a special method which gives the caller control over this behavior, and so the old functionality of always sending the caller ID has been removed. Introduce generated client code can utilize a certificate to communicate with its service securely. This certificate can come from many places; it may be a user certificate or a host certificate or some other certificate received from a delegation service for example. The client by default will attempt to use the credential in the default location in the users home directory. So for example, if you logged in using Dorian the credential would be written to the file system in the default location and your client would automatically use that credential when required by the service. If you want the client to use a different certificate you must pass that certificate into the constructor of the client or by calling the setProxy operation on the client. When the client makes a call to the service it will check the security metadata which tells the client how to configure itself so that it can properly communicate with the service. Even though a service's method is set to require secure communication, this does not mean the client will always use its own credentials. Introduce generated clients by default will connect anonymously to methods that allow both anonymous and non-anonymous access. If you want your client to use its credentials to invoke a method, even though that method can be invoked anonymously, you can set the client to prefer not connecting anonymously. This will force the client to use its own credentials to communicate with the service as opposed to connecting anonymously. In order to do this you must call the setAnonymousPrefered operation on the client you are using:
client.setAnonymousPrefered(false);

The client will then connect with credentials always until you set this back to true letting the client know it is ok to connect anonymously to methods that allow anonymous users. The reason for having this capability is because there may be methods that change the way they work based on who they are talking to. If they are talking to an anonymous user they may not return all the data and if the user has authenticated using their credentials than maybe they get back more privileged information.

You can also change the proxy (the credentials) that your client is using by calling the setProxy operation and passing in the new credentials you now want to use.

client.setProxy(newCredentials);

API Methods

The Federated Query Processor Client offers seven public methods for executing DCQL and DCQL 2 queries in different ways:

Simple Query Execution

The execute method takes a single DCQL 1 query parameter and returns a single DCQLQueryResultsCollection instance. This method may throw a FederatedQueryProcessingException

    public DCQLQueryResultsCollection execute(DCQLQuery dcqlQuery)
    throws RemoteException, FederatedQueryProcessingFault

This method sends a DCQL 1 query to the service, which then uses the Federated Query Engine to processes the DCQL 1 query as described above.

The executeQuery method takes a single DCQL 2 query parameter and returns a single DCQLQueryResultsCollection instance. This method may throw a FederatedQueryProcessingException

    public DCQLQueryResultsCollection executeQuery(DCQLQuery dcqlQuery)
    throws RemoteException, FederatedQueryProcessingFault

This method sends a DCQL 2 query to the service which then uses the Federated Query Engine to process the query as described above.

Execute and Aggregate Results

The executeAndAggregateResults method takes a single DCQL 1 query parameter and returns a single CQLQueryResults instance. This method may also throw a FederatedQueryProcessingException.

    public CQLQueryResults executeAndAggregateResults(DCQLQuery dcqlQuery)
    throws RemoteException, FederatedQueryProcessingFault

This method sends a DCQL query to the service, which then uses the Federated Query Engine to processes the DCQL as described above.

The executeQueryAndAggregate method takes a single DCQL 2 query parameter and returns a single CQLQueryResults instance. This method may also throw a FederatedQueryProcessingException.

    public CQLQueryResults executeQueryAndAggregate(DCQLQuery dcqlQuery)
    throws RemoteException, FederatedQueryProcessingFault

This method sends a DCQL 2 query to the service, which then uses the Federated Query Engine to processes the DCQL 2 as described above.

Asynchronous Query Execution

The Federated Query Processor Client offers APIs to perform a DCQL or DCQL 2 query asynchronously. With this functionality, a client can issue a DCQL query, immediately receive a Federated Query Results Client, and use that new client to retrieve results at a later time, potentially using WS-Notification functionality to determine when the query has completed processing on the service and results are available.

The executeAsynchronously method takes a single DCQL query parameter and returns a single Federated Query Results Client instance. This method may also throw a Malformed URI Exception, and a Remote Exception.

    public FederatedQueryResultsClient executeAsynchronously(DCQLQuery query)
    throws RemoteException, org.apache.axis.types.URI.MalformedURIException

The Federated Query Processor Client offers an API to perform specialized DCQL queries in an asynchronous fashion.

    public FederatedQueryResultsClient query(DCQLQuery query,
        DelegatedCredentialReference delegatedCredentialReference,
        QueryExecutionParameters queryExecutionParameters)
    throws RemoteException, org.apache.axis.types.URI.MalformedURIException,
    FederatedQueryProcessingFault, InternalErrorFault

The Federated Query Processor Client offers an API to perform DCQL 2 in an asynchronous fashion, and optionally with specialized properties.

    public FederatedQueryResultsRetrievalClient queryAsynchronously(DCQLQuery query,
        DelegatedCredentialReference delegatedCredentialReference,
        QueryExecutionParameters queryExecutionParameters)
    throws RemoteException, org.apache.axis.types.URI.MalformedURIException,
    gov.nih.nci.cagrid.fqp.results.stubs.types.InternalErrorFault,
    gov.nih.nci.cagrid.fqp.stubs.types.FederatedQueryProcessingFault

Federated Query Results Client


The Federated Query Results Client is a caGrid service client which can retrieve results and information about the current state of query processing from a Federated Query Processor Service which has been previously issued a query.

The Federated Query Results Client has the same constructors that any standard Introduce generated service client would have, but only those constructors which take an Endpoint Reference Type should be used, since EPRs contain the necessary resource key to access the server-side query results resource.

API Methods

The Federated Query Results Client supplies standard WS-ResourceLifetime methods, which may be used to set the resource's termination time and immediatly dispose of the resource.  The results client also includes methods supporting WS-Notification, which can be used to determine when various query processing events have happened, and when query results are available.

Other methods specific to the Federated Query Results Client are as follows:

    public boolean isProcessingComplete() throws RemoteException

This method simply returns true if the Federated Query Processor Service has completed execution of the original DCQL query and results are available, or false otherwise.

    public DCQLQueryResultsCollection getResults() throws RemoteException,
        ProcessingNotCompleteFault, FederatedQueryProcessingFault, InternalErrorFault

This method gets the DCQL query results from the resource to which the Federated Query Results Client is connected. If processing has not yet completed (as indicated by the isProcessingComplete method), this will throw a Processing Not Complete Fault.  Problems encountered while processing the query will cause a Federated Query Processing Fault to be thrown.

    public CQLQueryResults getAggregateResults() throws RemoteException, 
        FederatedQueryProcessingFault, ProcessingNotCompleteFault, InternalErrorFault

This method behaves very similarly to the Federated Query Engine's executeAndAggregate method. It gets the DCQL query results as a single, aggregate CQL Query Results instance which can be processed further with the standard data service tools.

    public EnumerationResponseContainer enumerate() throws RemoteException, 
        FederatedQueryProcessingFault, ProcessingNotCompleteFault, InternalErrorFault

This method allows a client to make use of WS-Enumeration to retrieve results of DCQL query processing via an Enumeration client.

    public TransferServiceContextReference transfer() throws RemoteException,
        FederatedQueryProcessingFault, ProcessingNotCompleteFault, InternalErrorFault

This method allows a client to use caGrid's Transfer tools to retrieve results of DCQL query processing via the Transfer client tools. This can help with the speed of large result set retrieval.


Federated Query Results Retrieval Client


The Federated Query Results Client is a caGrid service client which can retrieve DCQL 2 results and information about the current state of query processing from a Federated Query Processor Service which has been previously issued a query.

The Federated Query Results Client has the same constructors that any standard Introduce generated service client would have, but only those constructors which take an Endpoint Reference Type should be used, since EPRs contain the necessary resource key to access the server-side query results resource.

Typically an instance of the Federated Query Results Retrieval client will be created by the Federated Query Processor client when the queryAsynchronously method is invoked, and no further action should need to be taken to configure the client.

API Methods

The Federated Query Results Retrieval Client supplies standard WS-ResourceLifetime methods, which may be used to set the resource's termination time and immediately dispose of the server side resource. The results client also includes methods supporting WS-Notification, which can be used to determine when various query processing events have happened, and when query results are available.

Other methods specific to the Federated Query Results Client are as follows:

    public boolean isProcessingComplete() throws RemoteException

This method simply returns true if the Federated Query Processor Service has completed execution of the original DCQL 2 query and results are available, or false otherwise.

    public DCQLQueryResultsCollection getResults() throws RemoteException,
        ProcessingNotCompleteFault, FederatedQueryProcessingFault, InternalErrorFault

This method gets the DCQL 2 query results from the resource to which the Federated Query Results Client is connected. If processing has not yet completed (as indicated by the isProcessingComplete method), this will throw a Processing Not Complete Fault. Problems encountered while processing the query will cause a Federated Query Processing Fault to be thrown.

    public CQLQueryResults getAggregateResults() throws RemoteException, 
        FederatedQueryProcessingFault, ProcessingNotCompleteFault, InternalErrorFault

This method behaves very similarly to the Federated Query Engine's executeAndAggregate method. It gets the DCQL query results as a single, aggregate CQL 2 Query Results instance which can be processed further with the standard data service tools.

    public EnumerationResponseContainer enumerate() throws RemoteException, 
        FederatedQueryProcessingFault, ProcessingNotCompleteFault, InternalErrorFault

This method allows a client to make use of WS-Enumeration to retrieve results of DCQL 2 query processing via an Enumeration client.

    public TransferServiceContextReference transfer() throws RemoteException,
        FederatedQueryProcessingFault, ProcessingNotCompleteFault, InternalErrorFault

This method allows a client to use caGrid's Transfer tools to retrieve results of DCQL 2 query processing via the Transfer client tools. This can help with the speed of large result set retrieval.

Last edited by
David Ervin (743 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence