caDSR Design
| |
|
|
| |
Table of Contents
|
|
| |
|
|
caDSR Grid Service Overview
Introduction
The caGrid caDSR Grid Service provides access to information in the caDSR that is relevant to caGrid, and has capabilities to generate caGrid standard metadata instances. Specifically, the service provides operations to access UML-like information stored in the caDSR. It also has operations to generate Data Service metadata for a described subset of a given project registered in caDSR. Finally, it has an operation which augments a description of an Analytical Service, via a partially populated service metadata instance, with the necessary UML-like and semantic information, extracted from caDSR, to describe the service and its operations.
The components of the service, shown below in Figure 1, are composed of the actual grid service and the various APIs used to gather the necessary information. These APIs can also be used directly, if necessary, without going through the grid service.
caDSR Component APIs
Introduction
This chapter describes the various components, shown below in Figure 2, that make up the business logic of the caDSR grid service. Each component is generally usable outside of the grid service, as a stand alone API.
Component Details
caCORE ApplicationService
The caCORE 3.1 ApplicationService is used be various other components, and by the service logic itself, to access information in the caDSR. It provides a simple query API which is available for remote access. The component APIs that make use of the ApplicationService, generally take in instance of it in their constructor. The service itself, creates an instance by using the ApplicationService.getRemoteInstance(url) call, where the URL used is populated via a configuration parameter.
The ApplicationService is used to issue both HQL (Hibernate Query Language), and simple "query by example" queries. The two caCORE models which are consulted are the gov.nih.nci.cadsr.domain model, which represents the caDSR information and the gov.nih.nci.cadsr.umlproject.domain, which represents a UML-like view of information in the caDSR.
DomainModelBuilder
The DomainModelBuilder provides the ability to generate caGrid standard Data Service metadata instances for project's registered in the caDSR. The Data Service metadata describes the information model being exposed by a Data Service. For more information on the model, consult the caGrid metadata design document. The DomainModelBuilder uses the ApplicationService to access the necessary information from the caDSR. It mostly uses optimized HQL queries to efficiently access the necessary information, utilizing eager association fetching where appropriate. The majority of work performed by the DomainModelBuilder is simply aggregating and transforming information in the caDSR into the format necessary to describe the DomainModel metadata of Data Services. As much of the necessary extraction and transformation is independent, and the information is located in a remote system where network delays slow down computation, the DomainModelBuilder benefits greatly from parallelism. In order to achieve this parallelism, the builder employs a work/thread pool. The common WorkManager, distributed with Globus, is used for this purpose. In this framework, the work to be done is modeled as implementations of the Work interface, and Work items are scheduled with the WorkManager for execution with a configurable pool of threads. This provides a mechanism to manage the amount of system resources consumed by the service for the purposes of scheduling background tasks. Each task concurrently scheduled beyond the maximum number of worker threads is placed in a priority queue for processing once a currently executing task completes, and a thread becomes available. The DomainModelBuilder utilizes the WorkManager by creating Work items for each UML Class and UML Attribute it processes in the model.
The DomainModelBuilder provides four variants of the operation used for creating domain model instances. Each method, takes as input, a representation of the caDSR Project for which the DomainModel should be created. The first, createDomainModelForProject, takes only the project description, and generates a model which describes the entire domain model being exposed for the project. The second, createDomainModelForPackage, additionally takes an array of Strings which represent UML package names in the Project which should be exposed. The method generates a model which describes exposing all UML Classes which are in UML Packages with a name specified in the array. Any associations to UML Classes outside of the specified packages are not exposed. The third method, createDomainModelForClasses, also takes an array of Strings which represent the fully qualified UML Class names which should be exposed in the model. Any association between classes not specified is omitted. The final method, createDomainModelForClassesWithExcludes, also takes an additional array of Strings which represent the fully qualified UML Class names which should be exposed in the model, but also takes an array of UMLAssociationExcludes which can be used to exclude specific associations from the model (in addition to the already excluded associations which reference classes not specified in the array of class names). The UMLAssociationExclude Class allows the client to specify a sourceRoleName, sourceClassName, targetRoleName, and targetClassName. Any UML Association which would otherwise be included in the computed subset of the DomainModel is omitted if it meets the criteria described by any of the UMLAssociationExcludes. The value of any attribute of the UMLAssociationExclude can be the wildcard "" which indicates it should match anything. As such, specifying an exclude with "" as the value for all attributes would effectively omit all associations from the DomainModel. By using no wildcards, a single association can be omitted, and by using a combination of some values and some wildcards, groups of associations can be omitted. For example, specifying an exclude instance with a sourceClassName value of "gov.nih.nci.cabio.domain.Gene" and wildcards for all other attributes would effectively omit any associations from the DomainModel where gov.nih.nci.cabio.domain.Gene was the source of the association. Using these methods, in combination with the service provided methods of finding all Projects, Packages, Classes, and Associations a DomainModel exposing any subset of Classes and Associations can be created. The DomainModelBuilder API allows a client to simply identify the items to be exposed in the Project, and it does the work to create the conforming DomainModel instance.
ServiceMetadataAnnotator
The ServiceMetadataAnnotator is similar to the DomainModelBuilder in that it creates caGrid metadata instances. However, the ServiceMetadataAnnotator produces the standard ServiceMetadata common to all caGrid services, and it requires the client to supply a partially populated model as input. The caGrid common service metadata specifies information about a grid service and its operations. For more information on the model, consult the caGrid metadata design document. The ServiceMetadataAnnotator takes this model and populates the UML and semantically oriented components by querying the caDSR appropriately. Specifically, it populates the semantically annotated UML Class information (similar to the type used in Data Service Domain Model metadata) for each input and output type of every operation the service provides. It does this by examining the XML Qualified Name (QName) of each type used in the signature of the operation and locating its UML equivalent in caDSR. In caGrid every grid service operation is required to use data types which are XML representations of UML Classes registered in the caDSR. The is a one to one mapping of UML Class to XML QNames (XML elements). In order to identify the appropriate UML Class for each QName used by a grid service operation, the ServiceMetadataAnnotator requires an implementation of the UML2XMLBinding interface. This interface, described further in the following section, defines operations for extracting information about the corresponding UML Class of a given QName. The ServiceMetadataAnnotator takes the class name of the implementation of this interface, and uses reelection to dynamically instantiate an instance of it. Once the binding is determined, the annotator uses the ApplicationService to extract the information relevant to populating the ServiceMetadata instance from the caDSR. It follows a similar process to the DomainModelBuilder for such queries, and the two APIs share a common set of low level utilities defined in the caDSRUtils Class.
In addition to annotating operation input's and output with their caDSR information, the SemanticMetadataAnnotator will be able to augment the service and its operation's with semantic metadata once services are registered in the caDSR.
UML2XMLBinding
As mentioned above, the UML2XMLBinding interface defines operations for extracting caDSR registered information about XML QNames. As each operations on the grid operate over XML data, caBIG requires that the structure of the XML is defined by XML Schemas registered in the Global Model Exchange. Each data type is uniquely identified by its Qualified Name (QName), which consists of an XML namespace and name. Further, each data type used in the grid, identified by a QName, is the XML materialization of a UML Class registered in the caDSR. As such, there is a one to one mapping between caDSR registered classes, and XML QNames used in the grid as operation inputs and outputs. The UML2XMLBinding defines operations which provide access to this mapping (or binding).
The caDSR is expected to maintain this binding in the future. At which time this occurs, an implementation of this interface could be created which accesses the caDSR in response to method invocations. Until this point, an implementation, UML2XMLBindingNamingConventionImpl, is provided which uses the XML naming conventions defined by caGrid to try to identify the appropriate information in caDSR. For projects which defined their XML Schemas using the recommended naming convention, this implementation should suffice. For information about this recommendation, consult the caGrid Metadata design document. Projects which are using an alternate convention for namespaces, such as an existing XML standard, will not be properly retrieved by this implementation, and an exception, CaDSRGeneralException, will be raised. APIs should only use the interface so they may be updated to pull the information from caDSR for all projects (removing these limitations) at which point the caDSR maintains this information and another implementation is provided. This is the approach the ServiceMetadataAnnotator takes, as its constructor allows the specification of the implementation of the interface it should use. The caDSR grid service also makes this available as a configuration option, so it can be updated without code changes.
caDSR Grid Service Infrastructure
Introduction
This section describes the actual service infrastructure, shown below in Figure 3, for the caDSR grid service. The service was created with the Introduce service development toolkit. It is a basic non-stateful grid service, which exposes operations over the various components described in the previous chapter.
The service implementation consists of the CaDSRServiceImpl and the ServiceConfiguration classes, which respectively implement the business logic to call into the component APIs from the grid service calls, and provide access to configuration parameters which are configurable at service deployment time.
Service Data types
The primary data types used by the caDSR grid service are those which are defined in caCORE in the gov.nih.nci.cadsr.domain model, which represents the caDSR information and the gov.nih.nci.cadsr.umlproject.domain, which represents a UML-like view of information in the caDSR. The umlproject model, shown below in Figure 4, is the main model, but it associates with and extends a few classes from the cadsr base model, so it is used as well. As is evident from the figure below, the model provides a UML-like view of the caDSR registered projects. One class of note is the SemanticMetadata class which is associated to many UML-like classes, and provide a link to the semantic content of those items. Specifically, it exposes information about the EVS-maintained concepts.
In addition to the existing caCORE-defined types, the caDSR grid service defines two new data types for exclusive use by its exposed operations. The first is the UMLAssociationExlude, described above in the DomainModelBuilder section as the type used to specify UML Associations which should be excluded from a generated DomainModel. The second is an alternative representation of a UML Association, namely the UMLAssocation class. This has the same semantics as the UMLAssociationMetadata class in the umlproject model, but uses an alternate syntactic representation which is more suitable to transport over the grid.
Finally the service also makes use of the ServiceMetadata and DomainModel caGrid metadata models, as it provides operations to manipulate them. For more information on these models, consult the caGrid metadata design document.
Service Details
The service exposes three main categories of operations. The first are operations which expose access to the UML-like view of caDSR registered items. For example, findProjects, findPackagesInProject, findClassesInPackage, findAttributesInClass, etc. These provide basic discovery and access to the UML information in the caDSR. While these operations are stateless, they take sufficient context during each invocation to enable traversal of all registered projects. Aside from the operations to locate Projects, each operation takes a description of the caDSR Project of interest. Each operation in turn throws an InvalidProjectException if the Project specified is not valid.
The second set of operations contains operations enable clients to generate caGrid standard Data Service metadata. The four operations mirror those described in the DomainModelBuilder section of this document. These operations all throw an InvalidProjectException if the Project specified is not valid or if it ambiguously identifies more than one Project (for example is a version is not specified, yet there are multiple versions of a given Project registered in caDSR).
The final type of operation is the operation, annotateServiceMetadata, which provides clients the ability to augment a ServiceMetadata skeleton instance with the information extracted from caDSR. This operation simply exposes the corresponding operation in the ServiceMetadataAnnotator component, described in the previous chapter.
The service makes a few aspects of its implementation configurable through deploy-time options. This feature, provided by Introduce, allows the service to access configuration options through the ServiceConfiguration class, where the values can be set by deployers of the service. The values can be set in the Introduce GUI when deploying the service using Introduce, or Ant properties when deploying using Ant; the defaults are set in the service.properties file. One configurable aspect is the number of threads used by the thread pool of the WorkManager. This option, named threadPoolSize, defaults to 10. When the service constructs an instance of the DomainModelBuilder to service a request to generate a DomainModel, it uses a WorkManager implementation, initialized to threadPoolSize number of threads. This same WorkManager is used by every invocation of the service's operations which use it. The second configuration option is the URL of the caCORE service, to which the ApplicationService component should connect. This is specified by the caCOREServiceURL option, and defaults to http://cabio.nci.nih.gov/cacore31/http/remoteService
. This can be changed to, for example, point to a caCORE "staging" server. The final configuration point is the specification of the UML2XMLBinding implementation which should be used by the ServiceMetadataAnnotator. This option is uml2xmlBindingClassname, and the default value is gov.nih.nci.cagrid.cadsr.uml2xml.UML2XMLBindingNamingConventionImpl, which is the implementation described above which uses namespace naming conventions to identify caDSR items. At which point the binding information is stored in caDSR, a new implementation can be specified, via this option, which makes use of this information. When the service constructs a ServiceMetadataAnnotator instance to service requests to annotate a ServiceMetadata instance, it passes this value in to the constructor.
As the service is a stateless service, it contains a single WSRF Resource, which exposes service-level metadata as resource properties. The only resource property currently exposed by the service is the standard caGrid 1.0 ServiceMetadata, which describes the data types and operations of the service.










