The architecture of the caGrid Transfer Service is simple yet powerful. It is comprised of the following four main components:
- Transfer Service
- Transfer Service Helper
- Transfer WebApp
- Transfer Client Helper
Each component plays a role in staging the data, persisting the information representing the data to be transferred, securing the data, transferring the data from service to client or client to service, and/or cleaning the data.
The caGrid Transfer Service is a WSRF-based grid service and is responsible for creating resources that represent the data to be held and transferred. It utilizes the WS-ResourceFramework to create unique, stateful resource instances for each data item to be transferred. Grid Transfer is designed to run in the same container as the invoking service, and is better suited than SOAP to move large data items.
The user passes either a pointer to a file or the data itself to the TransferServiceHelper. The Helper creates an instance of the TransferServiceContext resource (via its ResourceHome); data and its pointer are stored until either the client picks it up or it is destroyed:
- Data is stored as a file on the file system.
- The resource stores information needed to operate so it can survive a container or system restart. This persists as a ResourceProperty of the type DataStorageDescriptor on the service.
Grid Transfer also supports the WS-Notification specification. By subscribing, clients can listen for changes to the DataStorageDescriptor and be notified of changes to the data-staging status of the resource.
If deployed into a secure container, it uses the caller's identity to protect the data. The callers distinguished name (DN) and the file location are written into its persisted data. This is used by the TransferSerlvlet to ensure the retriever of the data is authorized to do so. The service has a public method called getTransferDataDescriptor() that returns a TransferDataDescriptor object when invoked. This object contains the URL to retrieve the data over http(s).
The Transfer Webapp is a Java servlet deployed into the same container as Globus. It delivers the data to the consumer over an HTTP or GSI-based HTTPS connection:
- If Globus is deployed to this container in a non-secure mode, basic http sockets will be used and security will not be enforced (i.e., anyone with the URL to the data item will be able to retrieve it).
- If the container is secured, this servlet communicates over HTTPS connections using the GSI secure sockets (by using the same connector in Tomcat or JBoss, which is used by Axis/Globus). This secure connection contains the credentials of the caller, enabling the Transfer Webapp to compare the caller's identity to the identity of the requested resource. The Transfer Webapp checks the resource in the TransferServiceContext, which represents this data in the DataStorageDescriptor, and compares the userDN attribute to the caller's authenticated DN. If they match, the caller has the same credentials as the creator of the data item, and the data is streamed back to the caller. If not, the connection is dropped and the data remains protected on the server.
The Transfer Service Helper is a server side API used to create the TransferServiceContextResource for the data being transferred. It uses the ResourceHome of the TransferServiceContext to create an instance of a TransferServiceContextResource, which maintains the user identification and file location of the data to be transferred. The Transfer Service Helper has several createTransferContext methods to create a Transfer resource. For download scenarios, it can consume a byte array, input stream, or file; it doesn't need to have the data for upload scenarios. Each of these methods will return a TransferServiceContextReference, which contains the EPR to the resource. The user uses this EPR to get the DataTransferDescriptor of the resource for retrieval or submission. The API is as follows:
The Transfer Client Helper is a client-side API used for retrieving or submitting the data item. The data item is either created by the grid service and held by the servlet (waiting to deliver it) or will later be received by the servlet and held for the service to process. In the download scenario, if the container is secure, the user will call the operation below (Note: if insecure null can be passed in for the credentials):
or else the user can call the:
This call will create the appropriate socket connection to the url provided in the DataTransferDescriptor, which points to the data item to be transferred or the url where the data can be posted. If this connection is opened properly and the user is authorized, the InputStream to the data will be returned. This InputStream will then be readable by the user to obtain the data. Once the data is read in total, the user can call destroy on the TransferServiceContextClient to let the server know it can now remove the cached data.
In the case of upload, the same applies, but different methods will be used:
and else the user can call the:
In this case, after the data has been read from the InputStream, the user should call setStatus(Status.Staged) on the TransferServiceContextClient to let the service know that the data is present and can be processed. If the user service registered a callback with the TransferServiceContextResource when it was created with the TransferServiceHelper, that callback method will then be invoked.
Due to way in which the service's resources are normally stored, it is possible that the DataStagedCallback object used to process a completed upload might be destroyed by the Java garbage collector. To prevent this, the resource class that is created for the service by Introduce (e.g., class org.cagrid.mygrid.service.globus.resource.MyGridServiceResource) needs to implement the PersistentTransferCallback interface. The API is as follows:
The chart below shows trends in transfer time vs data size. It shows that:
- Performance with respect to data size grows linearly.
- HTTP vs GSI Encrypted HTTPS is, on average, an order of magnitude slower.
Tests were run with the client and server on the same machine to remove the possible variations due to network traffic. This allowed the tests to show the latency trends of the software itself and ignore those caused by network anomalies or bandwidth.