Federated Query Processor 1.4 Administrators Guide
| Navigation | ||
|---|---|---|
| caGrid | caGrid 1.4 Documentation | |
| FQP | FQP 1.4 Documentation | FQP 1.4 Administrators Guide |
| |
|
|
| |
Contents |
|
| |
|
|
Prerequisites
The Federated Query Processor service does not require any software or special system configuration beyond the standard caGrid stack.
To make use of the caGrid Transfer infrastructure, the transfer project is required, and the FQP service must be deployed to the same Tomcat container as the transfer service.
The performance of the FQP service may benefit from large amounts of RAM and multiple processor cores for handling concurrent query operations, but this is not strictly required to deploy the service.
Obtaining the Service
The Federated Query Processor service is available in the caGrid release, and can be found in the directory $CAGRID_LOCATION/projects/fqp. If you have obtained a source release or checkout of caGrid, the FQP service must first be complied. From the directory $CAGRID_LOCATION, execute the command ant all to compile all of caGrid including the FQP service, or ant build-project -Dsingle.project.name=fqp to build just the FQP service and the projects on which it depends.
Installing the Software
Installation of the Federated Query Processor can be accomplished with either the caGrid Installer, or manually.
Install caGrid and a Container
In this step you will download and install the FQP service and a grid service container using the caGrid Installer. If you already have caGrid 1.4 installed on your machine, and a suitable container, you may proceed to the next section.
| Once you have installed caGrid, the FQP software can be found in the directory location where you installed caGrid, in the caGrid/projects/fqp directory. This guide will refer to that location as FQP_HOME |
To install caGrid/FederatedQueryProcessor and set up a container, please complete the following steps:
Installer Prerequisites
The caGrid Installer installs all prerequisites except for Java and MySQL.
- Java 6 JDK
- Make sure the JAVA_HOME environment variable is set and points to the location where the JDK has been installed.
- (Optional) If you are deploying caGrid core services locally, you may also need a MySQL database
.

Note
MySQL is only required for the security services and GME. You can use 4.x (with transaction enabled; i.e., use InnoDB engine) or 5.x.
Install caGrid and Configure a Secure Container Using the caGrid 1.4 Installer
- Download the caGrid 1.4 Installer. The downloaded installer should be contained in the file caGrid-installer-1.4.zip.
- Unzip the file caGrid-installer-1.4.zip, this should create the directory caGrid-installer-1.4, from this point forward we will refer to this directory as CAGRID_INSTALLER_LOCATION.
- From a command prompt launch the installer:

> cd *CAGRID_INSTALLER_LOCATION\\\\\*
> java -jar caGrid-installer-1.4.jar - Select the I agree to this license checkbox and click Next.
- Select the Install/Configure caGrid Software and Install/Configure Grid Service Container checkboxes and click Next.
- The installer detects whether or not you have already installed Ant
. It installs or reinstalls it, depending on the installation status. In either case, you must specify where you want to install Ant. - The installer detects whether or not you have already installed Globus
. It installs or reinstalls it, depending on your installation status. In either case, you must specify where you want to install Globus. - The installer prompts you to speciry where you want to install caGrid. Specify a location and then click Next.
- The installer displays a list of tasks that the installer will perform. Click Next to start the installation process. The installer downloads, builds, and installs several components. Note: This process takes several minutes.
- Once the installer has completed installing all the components, click Next.
- The installer ask you which Grid you would like to configure your installation to use. The installer supports configuring caGrid to work out of the box with many community Grid environments. For testing and development purposes we recommend selecting the Training Grid. If you do not want to configure caGrid to work with an existing Grid, you may select that as well. The installer can also be modified to support additional Grids.
- The installer shows a summary of the tasks to be completed. Click Next to configure caGrid to use the selected target Grids. Note: This process takes several minutes.
- Once the installer has finished configuring caGrid to use the target Grid, click Next.
- Select the Container to which you want to deploy your service. This guide provides instructions for using the Tomcat container. Check the Should this container be secure? option and then click Next.
- In the hostname box, enter the hostname of your server; this should match the hostname you used in creating your host credentials. Click Next.

If you plan on using this container to deploy a service that registers to an existing grid, you must use a publicly resolvable DNS name (or static IP). If you do not, you will have to manually edit configuration files later. - From the Obtain host credentials method list, select Browse host credentials on the file system. Click Next.

If you do not have credentials for your service yet, then Request Credentials. - Enter the location of your host certificate into the Certificate box. Enter the location of your private key into Key box. Click Next.
- The next screen prompts you to specify where you want to install Tomcat. In the Directory box, enter the installation location and then click Next.
- The next screen displays a list of tasks that the installer will perform to install and configure Tomcat. Click Next.
- Once the installer has completed installing all the components, click Next.
- Click Next. The final screen will remind you set your ANT_HOME, GLOBUS_LOCATION and CATALINA_HOME environment variables. Set these variables immediately and then click Finish.

These instructions are also written to a file called CAGRID_POST_INSTALLATION.txt in the directory from which you ran the installer. - Close the installer by clicking Close.
There are no images attached to this page.
Configuration
| To simply deploy the FQP service with the default configuration, all you really need to edit is the service's standard ServiceMetadata, by following these instructions. |
Service Properties
The Federated Query Processor service may be configured by changing values specified in the service.properties file found in the root directory of the FQP distribution.
- maxTargetServicesPerQuery
- Default value: 12
- Type: Integer
- Controls the maximum number of target data services which may be included in any single DCQL query. If a client attempts to execute a query which specifies more than this number of target data services, an exception will be thrown and the query will not execute. If this value is set to zero (0), the number of services is unlimited.
- maxRetryTimeout
- Default value: 300
- Type: Integer
- Controls the maximum number of seconds a client may request the FQP service to wait between retrying queries to target data services which failed to respond correctly. If the client specifies a value greater than this, an exception will be thrown and the query will not execute. If this value is set to zero (0), the maximum timeout is unlimited.
- maxRetries
- Default value: 4
- Type: Integer
- Controls the maximum number of retries a client may request the FQP service to perform when retrying to execute queries to target data services which failed to respond correctly. If the client specifies a value greater than this, an exception will be thrown and the query will not execute. If this value is set to zero (0), the maximum number of retries is unlimited.
- threadPoolSize
- Default value: 10
- Type: Integer
- Controls the size of the thread pool used by the FQP service to perform DCQL and perform final query aggregation against target data services. Increasing this value may improve performance and responsiveness of the FQP service at the expense of potentially using more server resources.
- initialResultLeaseInMinutes
- Default value: 30
- Type: Integer
- Controls the initial time-to-live (lease time) of FederatedQueryResults resources. The value is specified in minutes. Unless the client explicitly requests a termination time for their results resource more distant in the future, after this time has elapsed, the resource will be destroyed. When the resource is destroyed, any remaining query execution tasks are terminated and any DCQL query results are lost.
These properties may be configured at deployment time by the Introduce service deployment GUI, or by directly editing the service.properties file before deploying it.
Service Metadata
The FQP service provides a Resource Property, which acts as metadata for its clients and describes the service's capabilities and information on where it is being hosted. This Resource Property is the caGrid standard ServiceMetadata, and is loaded from a file on the filesystem (serviceMetadata.xml), which is located in the FQP_HOME/etc directory, and is deployed with the service upon deployment. This file is fully populated, except for the information about where the service is being hosted. Before deploying the service, you must edit this file and provide the information which describes your organization.
| If you aren't comfortable manually editing XML, you can use Introduce's graphical editor instead when you deploy the service. |
Below is the relevant section of the file which you should edit:
<ns1:hostingResearchCenter> <ns18:ResearchCenter displayName="" shortName="" xmlns:ns18="gme://caGrid.caBIG/1.0/gov.nih.nci.cagrid.metadata.common"> <ns18:Address country="" locality="" postalCode="" stateProvince="" street1="" street2=""/> <ns18:ResearchCenterDescription description="" homepageURL="" imageURL="" rssNewsURL=""/> <ns18:pointOfContactCollection> <ns18:PointOfContact affiliation="" email="" firstName="" lastName="" phoneNumber="" role=""/> </ns18:pointOfContactCollection> </ns18:ResearchCenter> </ns1:hostingResearchCenter>
Deployment
| The Federated Query Processor service requires the caGrid Transfer Service be deployed to the same Tomcat or JBoss container the FQP service will be deployed to. The Transfer service must be deployed first. |
The Federated Query Processor is an Introduce-created service, and as such, supports all the standard Introduce deployment processes, which are described in the Introduce Administrator's Guide
For example, to deploy the service to Tomcat from the command line, you can type the following command from the FQP_HOME directory:

> ant deployTomcat
Linux / Unix

> $CATALINA_HOME/bin/startup.sh

> %CATALINA_HOME%\bin\startup.bat
Once the FQP service has been deployed, start the service container and verify that no error messages are printed to the console.
Validation
To validate the service has been deployed and is functioning correctly, use this simple client code to invoke a DCQL query and verify its results:
import gov.nih.nci.cagrid.common.Utils; import gov.nih.nci.cagrid.dcql.DCQLQuery; import gov.nih.nci.cagrid.dcqlresult.DCQLQueryResultsCollection; import gov.nih.nci.cagrid.fqp.client.FederatedQueryProcessorClient; import java.io.FileReader; import java.io.StringWriter; public class SimpleFQP { public static void main(String[] args) { String url = args[0]; String queryFile = args[1]; try { FederatedQueryProcessorClient client = new FederatedQueryProcessorClient(url); FileReader reader = new FileReader(queryFile); DCQLQuery query = (DCQLQuery) Utils.deserializeObject( reader, DCQLQuery.class); reader.close(); System.out.println("Querying " + url); DCQLQueryResultsCollection results = client.execute(query); StringWriter writer = new StringWriter(); Utils.serializeObject( results, DCQLQueryResultsCollection.getTypeDesc().getXmlType(), writer); System.out.println(writer.getBuffer().toString()); System.out.println("Done"); } catch (Exception ex) { ex.printStackTrace(); System.exit(1); } } }
The simple main method takes two parameters. The first is the URL of the Federated Query Processor service, and the second is the filename of a DCQL query to execute. Try running the following query, which simply queries the caArray data service for Publications with an ID less than or equal to 10.
<ns1:DCQLQuery xmlns:ns1="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql"> <ns1:TargetObject name="gov.nih.nci.caarray.domain.publication.Publication"> <ns1:Attribute name="id" predicate="LESS_THAN_EQUAL_TO" value="10"/> </ns1:TargetObject> <ns1:targetServiceURL>http://array.nci.nih.gov:80/wsrf/services/cagrid/CaArraySvc</ns1:targetServiceURL> </ns1:DCQLQuery>
Executing this query should produce results similar to the following:
<ns1:DCQLQueryResultsCollection xmlns:ns1="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcqlresult"> <ns1:DCQLResult targetServiceURL="http://array.nci.nih.gov:80/wsrf/services/cagrid/CaArraySvc"> <ns2:CQLQueryResultCollection targetClassname="gov.nih.nci.caarray.domain.publication.Publication" xmlns:ns2="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet"> <ns2:ObjectResult> <ns2:Publication id="1" authors="Calvo A, Xiao N, Kang J, Best CJ, Leiva I, Emmert-Buck MR, Jorcyk C, Green JE" pages="5325-35" publication="Cancer Research" pubMedId="12235003" title="Alterations in gene expression profiles during prostate cancer progression" uri="http://cancerres.aacrjournals.org/cgi/content/full/62/18/5325" volume="62" year="2002" xmlns:ns2="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.publication"> <ns2:status> <ns3:Term id="680" value="Published" xmlns:ns3="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary"> <ns3:categories/> </ns3:Term> </ns2:status> </ns2:Publication> </ns2:ObjectResult> <ns2:ObjectResult> <ns4:Publication id="10" authors="Rygaard K, Sorenson GD, Pettengill OS, Cate CC, Spang-Thomsen M." pages="5312-5317" publication="Cancer Research" pubMedId="2167152" title="Abnormalities in structure and expression of the retinoblastoma gene in small cell lung cancer cell lines and xenografts in nude mice" volume="50" year="1990" xmlns:ns4="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.publication"> <ns4:status> <ns5:Term id="680" value="Published" xmlns:ns5="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary"> <ns5:categories/> </ns5:Term> </ns4:status> <ns4:type> <ns6:Term id="47" accession="MO_430" url="http://mged.sourceforge.net/ontologies/MGEDontology.php#journal_article" value="journal_article" xmlns:ns6="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary"> <ns6:categories/> </ns6:Term> </ns4:type> </ns4:Publication> </ns2:ObjectResult> </ns2:CQLQueryResultCollection> </ns1:DCQLResult> </ns1:DCQLQueryResultsCollection>





