Sarah Honacki and Justin Permar, Knowledge Center
|Note about this guide:|
This guide was designed using version 1.3 of caGrid
Deploying a service may seem complicated at times. But it doesn't have to be! Once a service is created, the next step is to make it running and accessible on a target Grid with a successful deployment. A caGrid service is only a piece of software on your machine and cannot be accessed until deployed to a Grid. This guide first provides a detailed overview of the deployment process and important background information. It then walks you through the process in a "hands-on" approach. Finally, it discusses the process to verify a successful deployment and reviews common troubleshooting and deployment best practices.
This guide is intended for systems administrators and others who are responsible for ensuring a service is accessible on the Grid. Put another way, this guide is intended for anyone acting in a "deployment" role. The deployment role focuses on the process of deploying a Grid service to join a target Grid. This guide is also intended for service developers and others that need to understand the Grid service deployment process. Since readers of this article may have varying levels of knowledge about the Grid and may not be familiar with the terminology used throughout this article, this article provides a glossary. The glossary covers basic concepts about the Grid in order to ensure that everyone has a common understanding before continuing with the rest of the guide. Please note that review of the glossary is strongly recommended, as the usage of specific terms in this guide may be different from how these terms are used in other caGrid documents.
The purpose of this guide is to ensure that you have a complete understanding of the concepts and steps involved in deploying a Grid service to "join a target Grid". When reading this guide, it is critical to keep in mind that you have a choice of Grids to integrate with (that is, this article explains the Grid deployment process using a Grid-agnostic approach). "Joining a target Grid" involves deploying the Grid service to integrate with a "target Grid". A "target Grid" is simply a Grid of your choice.
Important key terms that users should be familiar with:
- caGrid - caGrid is open source grid software infrastructure aimed at enabling multi-institutional data sharing and analysis in clinical, translational research, and research environments.
- Grid - A Grid is an interconnected set of computing resources for the purpose of data sharing and analysis by a virtual organization. For caGrid and caBIG specifically, a Grid links computer and data resources of multiple organizations to securely share and analyze vast quantities of data. Computers, servers, or databases are shared as Grid services and join a target Grid by conforming to the Grid's data sharing and security policies. Most grids sit on a layer above the Internet and have core directory services and security services to incorporate services into the Grid. For purposes of this article, a Grid is a set of caGrid services that are deployed together by members of a virtual organization. In particular, there is not "one Grid" but many Grids.
- Grid entity - a Grid user (person) or a Grid service (software) that has joined a target Grid.
- Joining a target Grid - Every service is integrated into one target Grid. That means it interacts with other services that have also joined that same Grid. There is a conceptual difference between a service joining a target Grid and caGrid (the software) being configured to use a target Grid (see next entry). Please note that when you configure caGrid to use a target Grid, Introduce (see glossary for definition) is configured a) to utilize core services in that Grid and b) to create services that will join that target Grid. However, after the service is created, the target Grid that the service will join is completely customizable. A service deployer/administrator uses Introduce to deploy the service. At that time, the service deployer can configure the target Grid that the service should join. We will discuss this in further detail later in this article.
- Configure caGrid to use a target Grid - Re-configure the software included in the caGrid distribution to utilize services from a chosen target Grid. An example is the GAARDS UI. Re-configuring caGrid to use the OSU-hosted Community Training Grid will (among other things) configure the GAARDS UI to use the security service deployed to the Community Training Grid. That, in turns, allows a user of the GAARDS UI to, for example, login to the Community Training Grid. Configuring caGrid to use a target Grid also configures the Introduce toolkit. There are other functions performed when configuring the target Grid, but as of caGrid 1.3, these are the most noticeable.
- Configure a container - A service developer or service deployer uses the caGrid installer to configure a container that will host a Grid service. Containers are either secure or non-secure. A non-secure container can only host non-secure Grid services. A secure container can host secure Grid services (and non-secure Grid services). Every secure container is secure by definition because it has a (one) Grid identity that it uses to communicate with other Grid entities. Configuring a secure container with this identity involves specifying the host certificate and key that the container will use when communicating with the Grid.
- Secure Grid service - a service that has authentication and authorization policies set using Introduce. For example, a service may require that Grid entities contacting the service provides its Grid identity to the service or it may require authorization checks when a Grid entity invokes service operations
- Non-secure Grid service - a service that is openly accessible to all Grid entities. A non-secure Grid service cannot receive Grid credentials from entities invoking service operations, so the service cannot verify who is accessing the service and also cannot perform authorization. Moreover, a non-secure Grid service has no identity and so can only contact Grid services that allow anonymous access (access without a Grid identity).
- Grid container - a web application container (e.g., Apache Tomcat) that hosts one or more Grid services. The caGrid installer supports configuration of JBoss and Apache Tomcat.
- Deploy a service - A Grid service by itself is software that lives on a computer. In order to run the software program to share or analyze data, the service must be deployed to a Grid container. Then, the Grid service administrator starts the container to make the service accessible to other Grid entities.
- Index Service - Every Grid has its own service directory, called the Index Service, that supports service advertisement and discovery for services in one Grid. This service directory is the "white" and "yellow" pages for a Grid. A service advertises itself to the Index Service. The Index service then contacts the service to aggregate metadata. Clients can use the Index Service to discover available services based on service metadata. The service deployer configures a target Index service during deployment.
There are two steps involved in adding a Grid service to the Grid. The first (and lengthiest) step, performed by a service developer, is to develop the software that will provide functionality needed by the Grid service.. After the developer has verified the service is working as expected, a service deployer (anyone acting in a deployment role) integrates the Grid service with a target Grid. It's important to think of the Grid service itself as a piece of software with some desired capability that can be accessed from the Grid. In order to be able to contact and utilize the service, however, the Grid service must first be runningand accessible.
The first step, running the Grid service, involves starting up the Grid service so that other Grid entities can communicate with your Grid service. For example, running the Grid service allows users to invoke operations on your service. You run the Grid service by deploying it to a Grid container and starting the container, which listens on a port of your choice.
The first step to making the service accessible to others on a target Grid involves configuring the service's properties before deployment in order to choose the Index service for your chosen target Grid. Configuring the Index service will enable your service to advertise to that Index service. The second step is to configure the container to which you will deploy your service. At a high level, both secure and non-secure containers are configured for a target Grid and also configured with an appropriate domain name. Moreover, configuring a secure container involves ensuring your container has an identity that it uses to communicate with other Grid entities.
In summary, to ensure your service is both running and accessible, configure your Grid service properties and then configure your container properties as needed to join a target Grid of your choice. Then you deploy your service to the container and start the container.
First, the service deployer choose a container for service deployment. The service deployment features of Introduce currently support deploying Grid services to a Globus, JBoss or Tomcat container. We note that since caGrid 1.2, the recommended containers include Tomcat and JBoss (not Globus), as these two containers are intended for production use. Introduce displays the directory in which the container lives (which it gleans from the appropriate environment variable that specifies this location).
Next, the service developer configures service properties. The Introduce deployment framework supports the use of service properties so that the service can be customized for use as needed during deployment time. As an example, a service might have a service property to configure a database connection. When the service is deployed (using Introduce), the graphical interface prompts the deployer for the desired values (note: the defaults for the values are specified by the service developer). Introduce configures the service's deploy.properties file with the specified properties. This allows a very convenient way for those deploying the service to configure it appropriately for their environment, without requiring knowledge of how the grid service uses these configuration properties. We will discuss this in more detail later in this article.
Next, the service deployer configures the advertisement properties so that the service advertises properly to the Index service of the chosen target Grid.
Finally, the service deployer deploys the service. Introduce utilizes the layout of the Grid Archive (GAR) structure to organize and package a deployable service. Once the GAR for a service is generated by the deployment framework, the GAR can deployed to the appropriate container. Please note that in all cases an Introduce-generated Grid service is deployed as a service to the Globus web application. The Globus web application, in turn, is deployed to a supported web container. For both the two containers supported by caGrid, Tomcat and JBoss, the Globus web application is actually deployed directly to Tomcat. For JBoss support, Tomcat is in turn deployed to JBoss. This is graphically illustrated in the following diagram.
|Note: The installer can be used to install and configure either JBoss or Tomcat|
To completely understand service deployment, it is important to know what files are generated by Introduce and why they are needed for the service to join a target Grid. A basic Grid service created by Introduce has the following components:
- Ant targets for build, deploy, and test operations
- custom configuration files for IDE integration, e.g., Eclipse project files for editing of the service using the Eclipse platform (www.eclipse.org)
- standard interface for both client and service communication
- fully implemented client APIs
- stubs for the service implementation methods
- configuration files for service metadata and resource properties and the registration of metadata and properties
- configuration files and associated code specifying service authentication and authorization requirements
- A file specifying the service properties
The diagram below shows the use of the different common files of an example Introduce-generated service. It shows the files used for configuring registration and security for the grid service, as well as those used by Introduce for synchronization, and those used for build and deployment.
All of these files are automatically modified by Introduce as needed. The files listed below are the files that typically are modified by a service deployer during deploy-time (note: we clarify here that a service deployer uses Introduce to set the values in these files and Introduce in turn edits the files). Introduce provides a "Deployment" panel to eliminate the need to manually edit these files directly. If ther is a valid reason to edit the files directly, please refer to the Advanced section of this article to learn how to do this properly.
Please note that modifying any of the files below after the service has been deployed will necessitate a subsequent service deployment (a running service won't read the updated files until they are deployed to the container, which happens during service deployment).
What follows is a description of 2 of the files that the user can edit using the Introduce "Deployment" window. We will present step-by-step instructions for modifying these files later in this article.
Grid service advertisement information is specified in the deploy.properties file. Introduce updates this file during deployment with the advertisement information that the service deployer entered into Introduce. The table below lists each of properties that can be configured in the file GENERATED_SERVICE_HOME/deploy.properties.
Service properties are specified in the service.properties file. These properties are added by the service developer specifically so that the Grid service can be customized for specific environments. Please note that it is very error prone to manually edit this file. See the "Advanced" section later in this document for additional details. Some examples of service.properties that could be added by the service developer are:
Introduce keeps a model of the service in the introduce.xml file. That is, all user interactions with the Introduce UI directly change the introduce.xml file. Needless to say, manually editing the file is not recommended
In addition to understanding key files in deployment, it is also important to understand container configuration since the service will be deployed to a container. The deployment panel in Introduce will show basic information about the service you are going to deploy and provides the ability to choose the container to deploy to. In order to deploy an Introduce generated service to the container, it is required that ant is installed and that the container is installed as a pre-requisite for deployment. When a service deployer uses the installer to configure a container, the Installer directs the service deployer (on the last panel in the Installer wizard) to set an appropriate environment variable to point to the container. Introduce will detect the available containers on the host machine by checking for the existing of the following environment variables: GLOBUS_LOCATION (Globus), CATALINA_HOME (Tomcat), JBOSS_HOME (JBoss)).
A container must be installed and configured before the service deployer can deploy a service to it. A service deployer installs and configures a container using the caGrid installer. While using the installer, the first choice to make is which container to install (JBoss, Tomcat) and whether the container will be secure. You will also have to provide a hostname and select a shutdown port for the container. The following information aims to explain each of these options.
Tomcat is the primary recommended container, both for its simplicity and ease of use. Choose that if you have no other reasons to choose differently.
JBoss is an appropriate container if you would like to deploy applications other than Grid services. That is, JBoss is an enterprise application server that hosts multiple applications (such as Tomcat). If you have other kinds of applications in addition to Grid applications (e.g., J2EE applications) that you need to run, it may be a better choice to install JBoss so that you can deploy these other applications to minimize the systems administration burden.
A container can be either secure or non-secure. A non-secure container can only host non-secure Grid services. A secure container can host secure Grid services. Therefore, the service deployer must know from the service creator whether the service is secure or not (this should be in the documentation).
Every secure container is secure by definition because it has a Grid identity that it uses to securely communicate with other Grid entities. You specify the grid identity implicitly by selecting the host certificate and host key file during configuration. If you do not yet have a host certificate and key, you can use GAARDS to request a host certificate during installation. If you elect this option, the installer will allow you to open GAARDS to obtain these host credentials. For complete steps on obtain host credentials using GAARDS, see: 7.4 Obtain Host Credentials. Otherwise, you specify the location of the host certificate and key for the container to use. To see a complete explanation on how Grid security works, please see GAARDS
During the installation of a container, you must configure your container with a domain name that other Grid entities will use to contact your service. The domain name must be fully qualified and publically resolvable. It is important you use a publicly resolvable DNS name (or static IP), otherwise you will later need to edit configuration files manually to change this value.
The shutdown port is used during the shutdown of the container. It is important that this port is not already in use on the machine. Specifically, shutting down the container using the container's shutdown scripts will contact the service using the shutdown port. Please note that it is not required (and indeed it would be bad practice) to open this port through the firewall. Make sure the shutdown port is only accessible internally.
After you configure the container settings using the caGrid installer, the Installer installs the container on your machine. There are two primary steps to this process, outlined as follows:
- The installer unzips your container of choice.
- The installer deployed Globus to the container.
- The installer configures your container using the information you specific in the installer. For example, if you specified a "secure" container, then the installer will configure the container to enable security and properly configure the container's identity using the specified host certificate and key file.
- If you selected a target Grid when you installed caGrid, the installer automatically configures the container for that Grid. The installer will deploy Globus to that container and then deploy SyncGTS, configuring SyncGTS to synchronize with your target Grid. Please note that, as a result, when you use Introduce to deploy your service to the container, you must configure your service to advertise to the Index service in the same target Grid. Thus, your service must be deployed to the same target grid for which the container is configured! For more information on syncGTS see, SyncGTS
|Important note about target grids: If caGrid is configured with a target grid during installation of caGrid, part of that configuration includes configuring Introduce. Therefore, the default target grid for all new services created in Introduce is whichever Grid was chosen when caGrid was installed. This means that any new service is configured to advertise to that target Grid's Index Service. However, after a service is created, configuring caGrid with a target Grid has no effect on the existing service. Thus, to deploy a service to another target Grid means that you must use the "Deploy service" Introduce window to modify the Index service URL and other properties discussed earlier so that the service can advertise to the correct target Grid. If you are a service deployer and you did not create the service, it is therefore likely that you will need to edit the service advertisement settings in Introduce during deployment.|
Congratulations you have successfully installed and configured your Tomcat container.
|If you prefer, you can manually configure a Tomcat container instead of using the installer.|
Now we can now deploy our service to the container using Introduce. To begin, you will need to start Introduce by opening a command prompt and change directory to the caGrid installation directory. Then:
Introduce contains a graphical service deployment screen which can be used to configure a service for deployment by editing service properties, tuning index service registration, etc.
- In Introduce, select the Deploy Service button from the top menu bar.
- Once the file chooser shows browse to the generated service location and click the Open button.
- The Deployment Viewer will now be shown.
- The General Deployment tab shows basic information about the service you are going to deploy. Note that is shows your services deployment name and shows the namespace. This window also provides the ability to choose the container to deploy to. As stated earlier, Introduce will detect the available containers on the host machine by checking for the existing of environment variables (GLOBUS_LOCATION (Globus), CATALINA_HOME (Tomcat), JBOSS_HOME (JBoss
- The Advanced Deployment tab enables configuration of many standard deployment options:
Property Name Value(s) Description perform.index.service.registration true or false Whether or not the service should register with the Index Service index.service.url URL The URL of the Index Service to register with. index.service.index.refresh_milliseconds Integer How often to reregister with the index service (this should be a relatively large amount of time and is simply useful for making sure the index service does not lose your registration). index.service.registration.refresh_seconds Integer When registering to the index service this number tell the index service how often, in milliseconds, to contact me for updated met
- Changing these options will update the deploy.properties file. Modify the properties, including the Index service URL, as needed.
- The Service Properties tab allows for changes to be made to any properties that are added by the service developer specifically so that the Grid service can be customized for specific environments. Please note that it is very error prone to manually edit this file. Here are some examples:
Property Name Value(s) Description gridGrouperURL URL Grid Grouper service URL serviceStemSystemName File Path; ex: Training:photosharingtutorial:<HOSTNAME> System name of the service stem in Grid Grouper NOTE: For the serviceStemSystemName, <HOSTNAME> should be your machine name.
- Once you have selected the container that you wish to deploy to and have configured any additional settings, click Deploy to deploy your service to your target grid.
By deploying a service to a container and starting that container, the service is running. However, this does not mean it is accessible. To successfully advertise to the Index Service (and subsequently, by others on the Grid), there are two steps to complete:
- (Advertisement or running to advertise to the Index service) The service must successfully "register" to the Index Service; the service must be able to connect to the Index Service. This simply means the service will show up in the list of services registered, but does not mean it will be "discoverable." The primary requirement for this step is that you deploy the service to a machine that has an internet connection (assuming you need to us the Internet to contact your target Grid's index service).
- (Aggregation, or being accessible to the Index service) The service's metadata must be able to be retrieved and aggregated by the Index Service; the Index Service must be able to connect to the service. The primary requirement for this step is to open a firewall port so that all Grid entities (essentially, anyone) can contact your Grid service on the port that you specified during container configuration.
A grid service is accessible only when it successfully completes both Advertisement and Aggregation. This means it is properly configured so that the service can advertise to the Index Service, and the index service can aggregate the service's metadata. There must be a bi-directional flow of information.
This combination of advertisement and aggregation is illustrated in the diagram below.
After deployment, it is necessary to start the container to make it available on the grid. To begin, we will open a terminal and start up the grid service container to make the service available.
- Open a terminal and change directory to your container locations bin directory and start the container.
Example for Tomcat:
Once the container is started you will be able to verify that the service is running with a few validation steps you can go through on the local machine that you deployed the service from. Simply open a web browser and point to your service's URL.
You should see a "Hi there, this is an Axis Service" message. If you do, then congratulations! Your service is running! If you do not see this message, then please see troubleshooting to investigate further.
In addition, ensure that you can request metadata properly from your service on the same machine:
On Windows-based systems, run this command (replacing your service's URL):
On Unix-based systems, run this command (replacing your service's URL):
Once your service is running, you can verify that your service is accessible by contacting your service from a machine that is outside your firewall.
First, open the port in your firewall. Then, open a web browser from a machine outside of the firewall and point to your service's URL:
After running this command, if you see the "Hi, there is an Axis service" message displayed, then congratulations! Your service is running and accessible!
There are a number of related but important system configuration steps to ensure a successful local deployment as well as a functioning Grid.
The first issue is time synchronization. The system time for every Grid node must be accurate. Even a 2 minute drift from one server to another may prevent connection due to invalid certificates. use NTP or another system service to synchronize all clocks.
The second issue is to ensure that every service is synchronizing with the Grid GTS service. This is discussed in greater detail in the following sections.
There are two primary issues to keep in mind when deploying multiple caGrid services:
- Classpath conflicts
- SyncGTS conflicts
The first issue, classpath conflicts, arises because all caGrid services are deployed to the Globus web application inside your container. Recall that each web application in the container has its own classpath. The result is that all caGrid services share one classpath. When two different caGrid services have conflicting jar files, the result is a classpath conflict. This can result in errors that are extremely difficult to track down.
Thus, our recommendation is to only deploy one caGrid service per container. We emphasize that this is not a requirement but a recommendation. We also note that the "one service per container" recommendation explicitly excludes the SyncGTS service, which is installed to all containers. The recommendation applies to the caGrid service that is deployed to a configured container (resulting in SyncGTS and another Grid service co-existing in one container).
The second issue, SyncGTS conflicts, arises because SyncGTS modifies the files in the trusted certificates directory. By default, this is the USER_HOME_DIRECTORY/.globus/certificates directory. If you have multiple SyncGTS services that are configured for multiple different target Grids, you can run into a situation where the SyncGTS instances are conflicting and overriding each others certificates. The following situations clearly demonstrate when a problem will arise:
- User "joe" installed two caGrid containers and deployed one service to each container. Both containers are configured for the "Community Training Grid", resulting in a SyncGTS service (one per container) that periodically synchronizes with the Training Grid. This setup has no problems, as each time SyncGTS runs, it sees that the certificates in JOE_HOME_DIR/.globus/certificates directory match those from the Community Training Grid GTS service and leaves them in the directory.
- User "joe" installed two caGrid containers, one that is configured for the caBIG Stage Grid and one that is configured for the caBIG Production Grid. Because these Grids are two separate deployments (meaning they have completely separate trust fabrics), this configuration will exhibit problems. One container, with a SyncGTS service configured for the caBIG Production Grid, will periodically contact the Production Grid GTS and place the certificates in the JOE_HOME_DIR/.globus/certificates directory. The other container, with a SyncGTS services configured for the caBIG Staging Grid, will periodically contact the Staging Grid GTS and place the certificates in the JOE_HOME_DIR/.globus/certificates directory. This will result in unpredictable behavior. It may be that at any give point in time, the JOE_HOME_DIR/.globus/certificates directory has certificates from the Production Grid, from the Staging Grid, or some mix of certificates. Thus, when the services use the certificates directory to perform a trust check for a client that is contacting the service, the check may fail (meaning clients will not be able to contact secure services that are deployed to these containers).
Thus, our recommendation is to ensure that each user account runs caGrid services that have all joined only one target Grid. In the event that you need to run services that join multiple target Grids, you need to create multiple user accounts, one per target Grid.
Although not recommended, it is sometimes necessary to manually edit some deployment files generated by Introduce. We strongly suggest that you use the Introduce graphical interface to edit any of those files. This is to avoid any simple syntax errors that will prohibit the service from deploying properly. However, if you cannot use Introduce to deploy the service, then you will need to manually edit the files. please remember to follow proper syntax rules for Java properties files (both the deploy,properties file and service.properties files are Java properties files). Additional technical information can be found at the following link:
You may have a deployment environment where you cannot open up a container port directly through your firewall. Generally, the Globus toolkit which underlies caGrid doesn't support proxied deployments, however it may be workable behind an Apache HTTPD server acting as a proxy.
Secure caGrid services can be run behind an Apache Web Server acting as a proxy to the Tomcat or JBoss container which actually runs the grid service.
The following instructions use file paths which are specific to Red Hat Linux and CentOS, however the instructions should be generalizable to other configurations of Apache.
- Deploy your grid service to a secure Tomcat or JBoss instance as normal. You may have to re-configure your Globus deployment to use the port number you'll be connecting to Apache HTTPd with (usually 443), even though the container itself listens on 8443.
- Turn on the Apache SSL engine
- Edit /etc/httpd/conf.d/ssl.conf
- Set SSLEngine On
- Make Apache use your caGrid host certificate
- Edit /etc/httpd/conf.d/ssl.conf
- Set SSLCertificateFile to the location of your <hostname>-cert.pem file
- Set SSLCertificateKeyFile to the location of your <hostname>-key.pem file
- Create a config file for Apache to manage the proxy into your grid service at /etc/httpd/conf.d/cagrid.conf
- Start your caGrid Tomcat or JBoss instance
- Start (or re-start) Apache HTTPd
You should now be able to make connections to your grid service on the port Apache is listening to (usually 443, the standard HTTPS port) from the world.
If you absolutely cannot open the container port up directly, or cannot use the instructions for deployments behind Apache, please contact the caGrid Knowledge Center for additional information.
If you have any issues regarding deployment, please see our Troubleshooting page.