|caGrid||caGrid 1.2 Documentation|
|WS-Enumeration||WS-Enumeration 1.2 Documentation||WS-Enumeration 1.2 Developers Guide|
CaGrid provides mechanisms for integration of the WS-Enumeration specification both singularly and as part of the Bulk Data Transfer support. This integration is accomplished with a variety of provided tools for both client and server side use, as well as a specialized service extension to the Introduce toolkit.
Relevant external links:
- The The WS-Enumeration spec document is available from Globus
- WS-Enumeration project at dev.globus.org
The WS-Enumeration spec makes the assumption that the service context which provides a method to begin an enumeration is also the same context which implements the other enumeration methods (Pull, Renew, Status, etc). In caGrid, the data to be enumerated is stored in a server side WSRF resource. This means the method to begin an enumeration must return both the spec required Enumeration Context and an Endpoint Reference (EPR) indicating the service context which will be responsible for handling the enumeration, as well as the resource key for the data. In caGrid, this combined response is known as an Enumeration Response Container.
When WS-Enumeration is utilized from within a Bulk Data Transfer resource, the BDT resource creates the enumeration resource and returns the Enumeration Response Container to encapsulate both the EPR of the new service context and resource, and the enumeration context object.
caGrid's support for WS-Enumeration is provided by a service extension to the Introduce toolkit. This extension adds the service context for WS-Enumeration operations, copies relevant WSDL files and schemas, and finally sets the Globus provided EnumProvider class as the implementation for the enumeration operations.
To utilize WS-Enumeration in a grid service, two things are required. First, the service must be generated with the caGrid WS-Enumeration extension enabled. Next, one or more operations must be added to the grid service which return an Enumeration Response Container.
Such a "begin enumeration" method is required to begin the enumeration and hand off control of some data resource to the enumeration context and resource. The WS-Enumeration resource requires an implementation of the org.globus.ws.enumeration.EnumIteratorinterface be supplied to it at creation time. This interface is the means through which a data resource is exposed with the server side enumeration implementation. To simplify the process of exposing data via this interface, the caGrid enumeration implementation includes several utilities.
caGrid WS-Enumeration supplies the factory class gov.nih.nci.cagrid.wsenum.utils.EnumIteratorFactory which creates concrete instances of the EnumIterator. The factory method createIterator() takes parameters to determine the type of implementation, as well as the list of data objects to be enumerated (or a java.util.Iteratorto the same), the XML QName of those objects, and a WSDD configuration stream to manage the serialization of the data objects. The Java enumeration type gov.nih.nci.cagrid.wsenum.utils.IterImplType defines the five implementations and gives a brief description of each:
- The Globus-provided simple enum iterator.
- The Globus-provided indexed file enum iterator.
- A simple iterator which persists objects to disk.
- This iterator only respects the maxElements iteration constraint.
- This iterator uses threads to respect maxTime constraints as well as respecting maxCharacters. Elements overflowing either of these constraints, however, are lost, and wait states for thread completion are not optimized.
- This iterator uses the Java 5 java.util.concurrentpackage to fully support the WS-Enumeration specification for an EnumIterator implementation. All iteration constraints are respected, and elements which cause maxCharacters to be exceded are queued for later retrieval.
For most purposes, the CAGRID_CONCURRENT_COMPLETE implementation should be used as provides full support for the server side WS-Enumeration spec. This is also the default implementation selected by the caGrid data services infrastructure. The other implementations are less complete, and may be useful in emulating the behavior of other enumeration-enabled systems.
Once an EnumIterator instance has been created, it must be sent to the enumeration resource, and an Enumeration Response Container returned to the user. The caGrid provided utility class gov.nih.nci.cagrid.wsenum.utils.EnumerateResponseFactory encapsulates this functionality in a simple static method. The method _createEnumerationResponse()_takes a single EnumIterator parameter and returns an Enumeration Response Container, which can immediately be returned to the client. This method encapsulates locating the EnumResourceHome, creating a new resource, setting its visibility, and deriving an endpoint reference (EPR) from the resulting resource key.
Below is an example code snippet which uses the provided utilities to create an enumeration iterator over a list of Strings and return an appropriate enumeration response container:
The Globus-provided org.globus.ws.enumeration.ClientEnumIterator API provides java.util.Iteratorabstraction for retrieving enumeration data and supports automatic data deserialization. The caGrid WS-Enumeration implementation provides a simplified factory interface to create a new instance of a Client Enum Iterator from an Enumeration Response Container.
For a more detailed discussion of the WS-Enumeration client tools, please see the developer's wiki regarding the WS Core WS-Enumeration.
The caGrid-provided class gov.nih.nci.cagrid.wsenum.utils.EnumerationResponseHelpercontains static methods which can take an Enumeration Response Container and return a client enum iterator instance. The method createClientIterator takes only the response container, while another implementation of this method takes both the container and a java.io.InputStreamto the client-config.wsdd file. The information contained in this file will be used to deserialize results from the enumeration.
Given a service (here an enumeration-enabled Data Service) which supports enumeration, the following pattern may be used by a client to enumerate over data results.
The use of IterationConstraints to express how data should be fetched from the service is optional, and the default behavior is to retrieve one element at a time, with no regard for max characters or time constraints. Some server side implementations to not respect all iteration constraints. These constraints may change at any time during enumeration.
While iterating, it is possible that the call to 'hasNext' may return true, yet the call to 'next' return null, or even throw a NoSuchElementException. This is often the case when results are being dynamically populated on the server side at a rate slower than the client is capable of consuming them. This condition may also arise when the iteration constraints are such that no element could be returned which fulfills the constraints, yet the server is still holding more elements. Changing the constraints (allowing more time, increasing the maximum number of characters) may allow these elements to be retrieved.
The case of using the caGrid Bulk Data Transport infrastructure slightly complicates the client code to access data via enumeration by adding a level of indirection as the BDT resource is created first, and then used to create an enumeration resource.