The caGrid metadata infrastructure consists of a number of APIs, services, and graphical user interfaces. Information on their relationships and the overall infrastructure can be found on the metadata overview documentation. This document details the use of the non-service APIs. For information on the service APIs, consult the GME Developer's Guide and MMS Developer's Guide.
Unless otherwise fully specified, the following set of imports can be assumed for the code examples provided in this guide (they are omitted to make the examples more readable).
The caGrid 1.4 Deprecation Plan details comprehensively the major deprecations and stable aspects of the caGrid 1.4 release.
A new resource property has been added to all caGrid 1.4 data services which indicates their support for CQL 2 and any extensions to CQL 2 they may support. Clients may access this resource property to verify the data service supports CQL 2. It's absence indicates a data service which was built with caGrid 1.3 or earlier. A utility class called the DataServiceFeatureDiscoveryUtil has been created to simplify checking for CQL 2 support and listing the supported extensions.
As described in the metadata model documentation, caGrid services all expose a standard ServiceMetadata model that describes the services capabilities. In order to interact with this metadata, there are a number of utility APIs provided.
All caGrid services are expected to expose a standard set of service metadata. Details about this design and the specifics of the metadata can be found in the metadata model documentation. This section describes the high-level API, which can be used to access and manipulate instances of this metadata. The APIs described here can be used access these models from services, and serialize and deserialize them to and from XML. These methods complement the Discovery Client. Once an EPR (End Point Reference) is returned from the Discovery API, these methods can be used to access and inspect the full metadata.
The ResourcePropertyHelper API, not detailed here, is the lower level API, which can be used to directly gather information about ResourceProperties (this is how metadata is exposed in caGrid). The MetadataUtils, described here, leverage this API, and expose some of its exceptions. The possible exceptions generated by the metadata utility methods are detailed below.
A non-discerning client may simply opt to catch ResourcePropertyRetrievalException, as it is the base-checked exception. An additional non-checked exception, InternalRuntimeException, can also be thrown but is solely used to represent an internal logic error in the APIs. It is not expected clients can "recover" from such an exception. As such, clients should not attempt to catch this runtime exception for any other reason than to mask the problem.
QueryInvalidException is thrown if an invalid XPath query is issued. Problems originating from remote services are thrown in the subclass RemoteResourcePropertyRetrievalException. During general use of the metadata utilities, this is the most likely exception clients may see, as it is thrown if a service is not properly exposing the proper metadata. Clients leveraging the lower level resource property APIs should take care to appropriately address each type of exception if they are communicating with services. For example, even though it is a caBIG requirement to expose the standard service metadata, clients should properly handle the case where it is not present. Asking for specific metadata that a service does not provide would yield an InvalidResourcePropertyException.
This section describes typical usage of the Metadata API. The exception handling shown in the code examples is not recommended practice, and is simplistic for demonstration purposes. The MetadataUtils class is the primary means of accessing and manipulating service metadata. It provides a number of static utility methods that can be directly invoked. This API provides an abstraction layer over lower-level APIs, specializing them to deal with the standard metadata types. Clients wishing to work with custom (or non-standard) metadata need to use the lower-level APIs and can consult the source code of the MetadataUtils class for guidance.
In order to access a service's metadata, an End Point Reference pointing to the service must be provided. This can be obtained as a direct result of an invocation of a discovery method from the Discovery API, or manually constructed by specifying the service's Address. Examples of both can be found in the Discovery Client section. As caGrid requires that standard metadata be made publicly available, client credentials are not necessary for invocation of these methods.
The first example, shown below, demonstrates accessing a service's standard ServiceMetadata, which is common to all caGrid services. As described above, the first step is to obtain an appropriate EPR (line 1). Given this EPR, the MetadataUtils's getServiceMetadata method, shown on line 4, can be used to obtain the bean representation of the metadata. Upon successful completion of this method, the fully populated bean can be inspected to obtain the information of interest. Several exceptions, subclassed from the base ResourcePropertyRetrievalException, can be thrown by this operation. A non-discriminating client may choose to simply handle this base exception. Additional details on the other exceptions, and why they may be thrown, are described in the section above, as well as the javadoc of the APIs.
The process for accessing data service DomainModel metadata, shown below, is the same as accessing standard metadata. Once the metadata is obtained, in line 3, it can be inspected, as shown in line 4 where the long name of the project being exposed by the data service is printed to the console.
In addition to accessing metadata from services, the MetadataUtils provide the capability to read and write metadata instances as XML documents. This can be useful for not only storage and display of metadata, but also for exposing grid service metadata as XML. These methods also provide a way to inspect the metadata in object (bean) form.
In the figure below, example code is shown that saves an instance of standard service metadata to a file named seviceMetadata.xml. The metadata instance, defined in line 1, can be acquired using code similar to that shown above, or by some other mechanism. The serializeServiceMetadata method can be then passed this instance, and an instance of the java.io.Writer interface, as shown on line 11. Any Writer implementation works, but the example below shows using a FileWriter, on line 5, to write the metadata to the specified file. After the MetadataUtils have been used to write the metadata to XML, the Writer used should be closed, as shown on line 18. Though not shown, a similar method, serializeDomainModel, exists for writing data service metadata to XML; its usage pattern is the same.
As a complement to the serialization methods described and shown above, deserialization methods also exist which read XML representations of metadata and return appropriately populated metadata beans. In the example code shown below, a new ServiceMetadata instance is populated from an XML representation stored in a file named serviceMetadata.xml. This code, used in conjunction with the previous example, reconstitutes the original metadata instance. Similar to the serialization methods that use a java.io.Writer, the deserialization methods use a java.io.Reader to read the XML representation. In the example below, a FileReader is used on line 4. This Reader is then passed to the deserializeServiceMetadata method on line 11, and the populated ServiceMetadata instance is returned. As with the Writer instance in the serialization methods, the Reader instance should be closed once it is used (as shown on line 18). Though not shown, a similar method, deserializeDomainModel, exists for reading data service metadata from XML; its usage pattern is the same.
The Discovery API provides an abstraction over the standard operations used to query the Index Service. It provides a number of operations that can be used to discover services of interest. The basic process of use is to construct an instance of the DiscoveryClient, optionally specifying the End Point Reference (EPR) of the Index Service to query, and then invoking the appropriate discovery methods. Each method returns an array of EPRs of the matching appropriate services. These returned EPRs can then be used to invoke the services, or ask them for their metadata for further discrimination. It is worth noting that the Index Service, as an aggregated source of distributed information, inherently operates on out of date information. It is possible that services that are running do not yet have their metadata aggregated in the Index Service, and it is possible that services present in the Index Service have recently been taken down. caGrid attempts to strike a balance between performance and reliability of information in the Index Service. The information returned by the Discovery API should be accurate within a few minutes, but applications building upon it should be aware of this, and should not assume a service in the Index Service will always be available when it is invoked.
The DiscoveryClient uses the lower level "metadataUtils" project to communicate with the Index Service. It exposes the exceptions generated from this lower level API, instead of wrapping them with discovery-specific exceptions. The possible exceptions that discovery methods can throw are detailed in the section above. A non-discerning client may simply opt to catch ResourcePropertyRetrievalException, as it is the base checked exception. An additional non-checked exception, InternalRuntimeException, can also be thrown, but it is solely used to represent an internal logic error in the APIs and so it is not expected clients can "recover" from such an exception. As such, clients should not attempt to catch this runtime exception for any other reason than to mask the problem. Generic problems caused by the DiscoveryClient itself are thrown in the base ResourcePropertyRetrievalException. A subclass of it, QueryInvalidException, is thrown if an invalid XPath query is issued. Unless the DiscoveryClient is extended, it is not expected that clients should encounter this. Problems originating from remote services are thrown in the subclass, RemoteResourcePropertyRetrievalException. During general use of the metadata utilities, this is the most likely exception clients may see, as it is thrown if a service is not properly exposing the proper metadata. In the context of the DiscoveryClient, it is not expected clients should experience any exceptions unless there is an issue with the Index Service. However, clients leveraging the lower level APIs should take more care to appropriately address each type of exception if they are communicating with other (community provided) services. For example, even though it is a caBIG requirement to expose the standard service metadata, clients should properly handle the case where it is not present. Asking for specific metadata that a service does not provide would yield an InvalidResourcePropertyException.
While the methods in the API are designed around the caGrid standard metadata, it is also acceptable to have services register additional domain or application specific metadata to the Index Service. The Discovery API is designed for easy extensibility, such that additional application or domain specific discovery scenarios can be provided to compliment such additional metadata. The "business logic" of the DiscoveryClient, consists almost entirely of constructing appropriate XPath queries over the appropriate metadata, and leveraging lower-level APIs to actually invoke the queries. These lower-level APIs are made available to extenders of the client, such that they need only construct appropriate XPath queries to implement additional discovery scenarios.
The DiscoveryClient uses commons-logging to log general and debugging information. If configured to DEBUG level, the client prints out the XPaths it is sending to the Index Service, which may facilitate the creation of new discovery operations, or help track down problems.
As with most Globus clients, a properly configured client-config.wsdd file must be accessible by the underlying Axis engine. The simplest way to do this is to either run with your $GLOBUS_LOCATION as the "working directory," add $GLOBUS_LOCATION to your classpath, or copy $GLOBUS_LOCATION/client-config.wsdd to your working directory or classpath.
|If you don't do this, you will most likely see an exception similar to that shown below when you run the DiscoveryClient.|
This section describes typical usage of the Discovery API. The exception handling shown in the code examples is not recommended practice and is simplistic for demonstration purposes. Additional examples can be found in the source code of the discovery project, in the main of the DiscoveryClient itself, as well as in the test source directory.
The main method of the DiscoveryClient can be run from the project's source folder by entering ant runClient. The discovery unit tests can also be run by entering ant test. The unit tests do not actually communicate with the Index Service; rather they simulate it with a Mock object.
The first step in using the Discovery API is constructing an instance of the DiscoveryClient. There are three constructors that can be used. The first, shown in line 7 below, takes no arguments, and indicates that the "default" Index Service should be used for discovery queries. A second constructor, shown in line 5, takes a String as an argument, and the String is expected to represent the service URL of the Index Service to query. The final constructor, not shown, takes an EndPointReferenceType, which can be used to directly indicate the Index Service Resource to query. The standard caGrid Index Service installation is stateless, and so a resource unqualified EPR can be used, but most clients can just use the shortcut String constructor.
The Index Service to use can also be reconfigured at runtime, by invoking the setIndexEPR method, shown in line 11 below. Just as specifying the Index Service in the constructor generates an exception if the Address is not valid, so will the setter method.
Once a DiscoveryClient is configured, it can be continually used to discover services of interest. While the client is technically thread safe as long as the Index Service is not reconfigured during use, it is recommended a new DiscoveryClient instance is used in each thread context where discovery operations are performed, as it is an extremely light weight object.
The simplest discovery scenario, shown below, is to query the Index Service for all registered services. The boolean value specified in line 3, indicates whether services should be ignored if they do not expose the caGrid standard metadata. In most application scenarios, a value of "true" is used, but specifying "false" is useful for identifying all services that are attempting to register. It is common for a service running behind a firewall to maintain registration status with the Index Service, but not have caGrid metadata aggregated, as the Index Service is not able to communicate with the inaccessible service.
There are numerous discovery operations which take some form of text input, and all are case sensitive. The simplest discovery operation that takes some form of input is the basic string search operation, discoverServicesBySearchString, which is shown below. This is a full text search that examines all registered metadata values for the specified input. It is not likely this operation will be useful for programmatic discovery (as it is a completely unstructured query), but it is useful for applications that take direct input from the user (such as a web form), and makes a good starting point for applications that provide capability to "drill down" and examine the full metadata of the satisfying services.
Beyond the full text search operation, there are many discovery operations that take a search string as input, but perform a more structured search and are more useful for programmatic discovery. For example, services providing a named operation can be discovered using the method discoverServicesByOperationName, or Data Services exposing a given model can be discovered, as shown below, using the discoverDataServicesByDomainModel method. This operation, and all methods named like discoverDataServices* only return services that implement the standard Data Service operations.
Another potentially useful method for discovering services or displaying information about available services on the grid is the discoverServicesByResearchCenter method, shown below.
There are several discovery methods that support semantic discovery by allowing search on concept code. The simplest of these methods, discoverServicesByConceptCode_,_ shown below , searches for services based on concepts applied to the services itself. There is a concept representing "Grid Service" in the ontology and derived concepts such as "Analytical Grid Service" and "Data Grid Service." By determining these concept codes, or any other specialized concepts, this operation provides a simple way to discover services of a certain "type." Similarly, there is a method to discover services by the semantics of the operations they provide using the discoverServicesByOperationConceptCode method. At the time of this writing, services operations are not yet semantically annotated, but are expected to be soon. Finally, two methods: discoverDataServicesByModelConceptCode and discoverServicesByDataConceptCode provide the capability to discover services based on the information about the data types they operate over. Both examine the semantic information of the UML Classes used by the services. The first, discoverDataServicesByModelConceptCode, locates Data Services that are exposing access to data based on the concept. The second, discoverServicesByDataConceptCode, locates services that directly produce or consume data based on the concept. In both cases, the concept is considered a match if the Class is based on the concept or one of its attributes, attribute value domains, or enumerated value meanings. These methods are all based on direct concept matching; not only ontological operations. However, these methods coupled with the EVS grid service, provide a powerful ability to traverse the caBIG ontology for information of interest, and discover services providing this information, or the ability to manipulate it.
Beyond the simple String based discovery methods, some discovery methods take complex objects as input, such as a PointOfContact or UMLClass. In these cases, the objects act as a prototype (or "query by example" as in the caCORE APIs), and can be as partially populated as desired. For example, the method show below, discoverServicesByPointOfContact, searches for services which are associated with a person with the information described by the supplied PointOfContact instance; in this case services associated with "Scott Oster" are located. There are many other fields in PointOfContact that are not populated in this example, and are ignored.
There are many discovery methods that take a UMLClass prototype to discover services based on data types; an example is shown below. This method, discoverServiceByOperationInput, locates services that provide an operation that takes, as input, an instance of the specified data type. In the example below, services provide operations taking caBIO's Gene instances as input. Again, this object can be as partially populated as desired (such as only specifying the package name, or being more explicit in specifying the exact project name and version).
Several metadata-related features of caGrid manifest themselves as functionality in Introduce. Introduce is the graphical service development environment used in caGrid, and supports an extension framework, whereby functionality can be plugged into Introduce dynamically. While such functionality could have alternatively been implemented directly in Introduce, this approach promotes a loose coupling between the components without lose of functionality of any difference to the end user. This chapter details the various metadata-related Introduce extensions individually. All of these extensions are automatically installed into Introduce during the caGrid build process, and the caBIG Introduce Creation Viewer automatically loads the appropriate service-specific extensions to all services created with Introduce in caBIG.
The caDSR Grid Data Service, provides read and query access to the information available in the caDSR. As such, the service provides useful information when creating services and so is integrated with Introduce as two extensions. Introduce has two types of datatype "discovery" extensions, which are both implemented for the caDSR. Specifically, there is a Discovery Tools extension, the CaDSRTypeDiscoveryComponent, and a Discovery Selection extension, the CaDSRTypeSelectionComponent.
The CaDSRTypeDiscoveryComponent, allows the user to browse registered Projects from the caDSR, and view a UML rendering of a selected package. This provides a means to browse the caDSR for available data types which could be used in the development of services. This component is a simple Panel, which uses the caDSR Data Service to populate the Project and Package combo boxes, and makes use of the MMS to generate Domain Models. The caGrid graph project is then used to render UML views of the generated Domain Models.
The CaDSRTypeSelectionComponent extension complements the CaDSRTypeDiscoveryComponent by providing a way to add XML Schemas to an Introduce service, which correspond to projects registered in caDSR. This component integrates into the "Import Data Types" section of the Types panel in Introduce. When a user browses to a particular package and presses the Add button, the component identifies the appropriate XML Schemas(s) for that package, and retrieves them from the GME. As of caGrid 1.3, the component makes use of the caDSR's ability to annotate Projects and Packages with their approproate XML Namespaces. If the selected Project and Package do not have such information registered in the caDSR, then the Namespace field will be populated with a guess based on XML Schema naming conventions, and will be shown in a blue font. The schemas may still be attempted to be added to the service using the Add button, but it should not be unexpected if an error is generated when the GME is consulted for those schemas. If the caDSR actually has such namespace annotations, then the Namespace field will be populated with the appropriate information, and shown in a black font. This indicates a much stronger confidence that such XML Schemas actually exist in the GME. Upon adding the schemas to the service, the extension will annotate the Introduce Namespace entries with details indicating the caDSR Project name and version from which they were extracted. This information can be used by other extensions (such as the caGrid Service Metadata Generator described below).
One of the most metadata-relevant Introduce extensions is a service-specific extension, which is actually a suite of components that hook into the Introduce service synchronization process when the extension is added to a service. These components comprise the Service Metadata Generator extension, and are responsible for creating an instance of the standard Service Metadata for a service whenever it is saved. The components, ServiceMetadataCreationPostProcessor, MetadataCodegenPreProcessor, and MetadataCodegenPostProcessor, run during the post creation, pre code generation, and post code generation processes (consult the Introduce design document for more details), respectively. The ServiceMetadataCreationPostProcessor is responsible for copying and installing the caGrid Service Metadata XML Schemas into the service, adding the appropriate metadata jars, and generating a shell Service Metadata instance. Then, each time a service is saved in Introduce, the code generation components read the Introduce model and edit the Service Metadata instance appropriately. That is, all of the descriptions from Introduce are put into the metadata, and all of the service contexts, operations, metadata, etc are updated. Essentially these components are responsible for extracting all the metadata-relevant information from the more complex Introduce service metadata model, and representing it in the caGrid standardized Service Metadata model. While somewhat laborious, the process is fairly straightforward. Upon editing the metadata model appropriately, the extension extracts any Namespace annotations that are present in the Introduce Namespace model (such as would be present if the schemas were added by the caDSR type selection component), and sends the model and annotations to the MMS Service for annotation.
A similar extension exists to generate standardized Data Service Metadata instances as part of the suite of Data Service Introduce extensions. Details about that extension and its functionality can be found in the Data Service design documents.
Complementing the Service Metadata Generator, an editor for the user-editable fields of the standardized Service Metadata instance of the service is provided as an Introduce metadata editor extension. The component, ServiceMetadataEditor, is a simple Panel that displays these fields from the current instance of metadata, and allows the user to edit and save it. Specifically the component allows the hosting research center information to be edited, including points of contact, address, and other information such as the display name and websites. It also allows the points of contact for the service to be edited.
Similar to the ServiceMetadataEditor, the DomainModelViewer is an Introduce metadata editor extension. It provides a read-only UML display of the domain model to which the data service is providing query access. It uses the caGrid graph project to render the view, and simply reads the service's current Domain Model metadata instance to populate the view.
To simplify the process of making use of the caCORE SDK XML Schema generation capabilities, as described in the section on Schema generation titled caCORE SDK in the Metadata Design, an Introduce extension is provided which can produce XML Schemas from an XMI model. The extension, SDKTypeSelectionComponent, is a type discovery extension as is the caDSR Grid Service Type Discovery component. Rather than using the caDSR however, this extension makes use of the caCORE SDK transparently without requiring the user to install or configure the caCORE SDK. The feature can be found on the "Types" tab of Introduce, under the "Create from XMI" subtab.
The component, provides the user with a browse button to select their XMI file, and various input boxes to enter supplemental information about the project represented by the XMI file. These fields are used to control the XML Schema generation process (indicating the information needed to select a namespace, and which packages from the Project to process). As the user fills out the form, a status box below is updated, as are status icons on each of the field. These are validators that ensure valid information is provided. If the user presses the "Add" button before all of the error indicators are cleared, a dialog showing the errors will be displayed, and no processing will occur. The warnings will not prevent processing from occurring, but generally indicate something that should probably be examined. Once the "Add" button is press with no validation errors, the component goes through the process of generating XML Schemas, and adding them to the service.
The components which implement the logic of the extension, use the following process when a type is to be added. First, as mentioned above, the input is validated. Next, if everything is valid, the component extracts a local copy of the caCORE SDK version 4.1.1, to a temporary directory. Then, the SDKExecutor is used to execute the process, by passing it an instance of SDKGenerationInformation, which is basically a Bean to represent the input gathered from the user. The executor creates an instance of SDKExecutionResult, which is a bean which represents access to the artifacts created by execution on the SDK. To do this, the executor applies the necessary configuration changes in the SDK configuration file, by reading the values of the SDKGenerationInformation bean. Then the SDK is executed as an Ant process. The results are then validated and returned. If everything is valid, the component then copies and installs the generated schemas to the service.