Access Keys:
Skip to content (Access Key - 0)

Metadata

Metadata 1.1 Design Guide

Table of Contents

List of Figures
Figure 1 : Data Description Overview
Figure 2 : UML to caDSR mapping
Figure 3 : caGrid Standard Service Metadata
Figure 4 : Common Model
Figure 5 : Data Service Metadata Model
Figure 6 : Service Model
Figure 7 : Advertisement and Discovery Overview
Figure 8 : Globus Resource Framework
Figure 9 : Globus Resource Property Framework
Figure 10 : caGrid Resource Implementation
Figure 11 : Example Registration Configuration
Figure 12 : Discovery API
Figure 13 : Discovery API Utilities
Figure 14 : Discovery API Example Code
Figure 15 : Example Custom Serialization configuration
Figure 16 : SDK Serialization Framework
Figure 17 : Example WSDD Type Mapping
Figure 18 : EA generated Chromosome type from caBIO
Figure 19 : GME Namespace Format
Figure 20 : Namespace Naming Conventions
Figure 21 : Example caTIES Document serialization
Figure 22 : caDSR and GME relationship
Figure 23 : UML to XML Mapping
Figure 24 : BookStore Example UML Model
Figure 25 : Package Structure of Example UML Model
Figure 26 : Example Project Schema
Figure 27 : Common Package of Example Project Schema
Figure 28 : Bookstore Package of Example Project Schema

caGrid Metadata Infrastructure Overview


Introduction

Extending beyond the basic grid infrastructure, caBIG specializes these technologies to better support the needs of the cancer research community. A primary distinction between basic grid infrastructure and the requirements identified in caBIG and implemented in caGrid is the attention given to data modeling and semantics. caBIG adopts a model-driven architecture best practice and requires that all data types used on the grid are formally described, curated, and semantically harmonized. These efforts result in the identification of common data elements, controlled vocabularies, and object-based abstractions for all cancer research domains. caGrid leverages existing NCI data modeling infrastructure to manage, curate, and employ these data models. Data types are defined in caCORE UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR). The definitions draw from vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described.
In caGrid, both the client and service APIs are object oriented, and operate over well-defined and curated data types. Clients and services communicate through the grid using respectively Globus grid clients and service infrastructure. The grid communication protocol is XML, and thus the client and service APIs must transform the transferred objects to and from XML. This XML serialization of caGrid objects is restricted in that each object that travels on the grid must do so as XML which adheres to an XML schema registered in the Global Model Exchange (GME). As the caDSR and EVS define the properties, relationships, and semantics of caBIG data types, the GME defines the syntax of the XML serialization of them. Furthermore, Globus services are defined by the Web Service Description Language (WSDL). The WSDL describes the various operations the service provides to the grid. The inputs and outputs of the operations, among other things, in WSDL are defined by XML schemas (XSDs). As caBIG requires that the inputs and outputs of service operations use only registered objects, these input and output data types are defined by the XSDs which are registered in GME. In this way, the XSDs are used both to describe the contract of the service and to validate the XML serialization of the objects which it uses. Figure 1 details the various services and artifacts related to the description of and process for the transfer of data objects between client and service.
Figure 1 : Data Description Overview

caGrid Metadata


Cancer Data Standards Repository (caDSR)

Semantic Annotation of UML Domain Models

Proper semantic integration requires that each class and it's attributes from the UML domain model gets mapped to appropriate concepts in a controlled terminology. The caCORE SDK utilizes the NCI Thesaurus as its primary terminology source, but any well structured, concept-based description logics terminology should in principle be suitable. The concept selection process can be entirely manual, or it can be partially automated using the Semantic Connector, a tool supplied by the caCORE SDK. The Semantic Connector uses the UML domain Model expressed in XMI as input and uses the caCORE EVS APIs hosted at the NCI to search the NCI Thesaurus for appropriate concepts. Semantic annotations for classes and attributes are specified using tagged values in the UML domain model.

UML Domain Model Loader

The UML domain model, annotated with semantic concept codes, contains a considerable amount of metadata about the ultimate system - both data and analytical services - that will be deployed to the grid. However, it is not in a form that is amenable to query and retrieval in a runtime environment nor easily queried by humans to make use of this information for other purposes. UML domain model loader addresses these limitations by transforming and loading the models into the caDSR, which provides APIs that support runtime access to metadata.
UML domain model annotated with semantic concept information is exported to XMI format using a UML modeling tool such as Enterprise Architect. It is then used as an input to the UML domain model loader, which uses a set of mapping rules to load metadata represented by Classes, Attributes and Associations into entities of caDSR. Following section contains the details of the UML to caDSR mapping rules.

UML to caDSR Mapping

Metadata represented in UML domain model is mapped to caDSR administered component types, illustrated in Figure 2 and using the following mapping rules:

  • A UML Class is mapped to an Object Class, which according to ISO 11179 specification represents a thing in real-world.
  • An attribute of a UML Class is mapped to a Property, which according ISO 11179 specification represents an attribute of a real-world thing
  • Combination of a UML Class and one of it's attributes is mapped to a Data Element Concept.
  • Combination of UML Class, one of it's attributes and data type of the attribute is mapped to a Data Element, commonly referred to as a Common Data Element (CDE).
  • Project to which the UML domain model belongs to is mapped to a Classification Scheme.
  • Packages in the UML model - which may represent sub-projects within a project - are mapped to Classification Scheme Items
  • Association between two classes is mapped to Object Class Relationship

Refer to "Registering Metadata" chapter of caCORE SDK Programmer's guide for complete details on loading UML domain models to caDSR
Figure 2 : UML to caDSR mapping

caGrid Reliance on caDSR

After a UML domain model is transformed, loaded and curated in caDSR, the model is ready to be used as the basis of an object oriented grid client and service. All data movement in caGrid between client and service is done so using instances of Classes registered in the caDSR. caGrid requires that all data types used in the grid are registered in caDSR, and come from a given Project version. That is, even though Attributes and other items in caDSR can be versioned individually, in order to use those types on the grid, they need to be associated with a specific Project version.
Several components of caGrid make use of the wealth of information in the caDSR. As mentioned above, grid services use registered data models as their information model. By doing so, they are able to advertise both the syntax and semantics of the model by exposing an export of the relevant caDSR information as service metadata. The details of the model used to expose this information are shown in the section below. Once the information is exposed in this model, caGrid leverages for grid service advertisement and discovery. These processes are described in detail in ?Chapter 4. Finally, the information models registered in caDSR are used as the conceptual foundation for the actual communication format used to exchange data on the grid. This process of serializing and deserializeing data instances on the grid, is detailed in ?Chapter 5.

caGrid Standard Metadata Model

All caGrid Services are expected to publish a set of standard metadata which draws heavily from the metadata registered in caDSR and EVS; it details the functionality of the service, and the institution providing it. The following sections describe these models.

Standard Metadata Model (gov.nih.nci.cagrid.metadata)

_Figure 3 : caGrid Standard Service Metadata_

metadata::ServiceMetadata

public Class:
metadata::ServiceMetadata Connections

Connector Source Target Notes
Association
source > target
ServiceMetadata
+serviceMetadata
1, unordered
Service
+serviceDescription
0..1, unordered

Association
source > target
ServiceMetadata
+serviceMetadata
1, unordered
ResearchCenter
+hostingResearchCenter
0..1, unordered


Common Components (gov.nih.nci.cagrid.metadata.common)

_Figure 4 : Common Model_

metadata::ServiceMetadata

public Class:
metadata::ServiceMetadata Connections

Connector Source Target Notes
Association
source > target
ServiceMetadata
+serviceMetadata
1, unordered
Service
+serviceDescription
0..1, unordered

Association
source > target
ServiceMetadata
+serviceMetadata
1, unordered
ResearchCenter
+hostingResearchCenter
0..1, unordered

common::Address

public Class:
common::Address Connections

Connector Source Target Notes
Association
source > target
ResearchCenter
+researchCenter
1, unordered
Tagged Values
anonymousRole = true

Address
+address
0..1, unordered


common::Address Attributes

Attribute Type Notes
country«XSDattribute» public : java.lang.String  
locality«XSDattribute» public Range:0 to 1: java.lang.String  
postalCode«XSDattribute» public Range:0 to 1: java.lang.String  
stateProvince«XSDattribute» public Range:0 to 1: java.lang.String  
street1«XSDattribute» public : java.lang.String  
street2«XSDattribute» public Range:0 to 1: java.lang.String  



common::Enumeration

public Class:
common::Enumeration Connections

Connector Source Target Notes
Association
source > target
Enumeration
+enumeration
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
ValueDomain
+valueDomain
1, unordered
Enumeration
+enumerationCollection
0..*, unordered


common::Enumeration Attributes

Attribute Type Notes
valueMeaning«XSDattribute» public : java.lang.String  
permissibleValue«XSDattribute» public : java.lang.String  



common::PointOfContact

public Class: For the static model, instances of these should be the POCs associated with the design and implementation of the service itself (not deployments of it, e.g. not system support staff)The "role" attribute should probably be an enumeration of known types
common::PointOfContact Connections

Connector Source Target Notes
Association
source > target
Service
+service
1, unordered
PointOfContact
+pointOfContactCollection
0..*, unordered

Association
source > target
ResearchCenter
+researchCenter
1, unordered
PointOfContact
+pointOfContactCollection
0..*, unordered


common::PointOfContact Attributes

Attribute Type Notes
affiliation«XSDattribute» public : java.lang.String  
email«XSDattribute» public : java.lang.String  
firstName«XSDattribute» public : java.lang.String  
lastName«XSDattribute» public : java.lang.String  
phoneNumber«XSDattribute» public Range:0 to 1: java.lang.String  
role«XSDattribute» public : java.lang.String  



common::ResearchCenter

public Class:
common::ResearchCenter Connections

Connector Source Target Notes
Association
source > target
ResearchCenter
+researchCenter
1, unordered
Tagged Values
anonymousRole = true

Address
+address
0..1, unordered

Association
source > target
ServiceMetadata
+serviceMetadata
1, unordered
ResearchCenter
+hostingResearchCenter
0..1, unordered

Association
source > target
ResearchCenter
+researchCenter
1, unordered
Tagged Values
anonymousRole = true

ResearchCenterDescription
+researchCenterDescription
0..1, unordered

Association
source > target
ResearchCenter
+researchCenter
1, unordered
PointOfContact
+pointOfContactCollection
0..*, unordered


common::ResearchCenter Attributes

Attribute Type Notes
displayName«XSDattribute» public : java.lang.String  
shortName«XSDattribute» public : java.lang.String  



common::ResearchCenterDescription

public Class:
common::ResearchCenterDescription Connections

Connector Source Target Notes
Association
source > target
ResearchCenter
+researchCenter
1, unordered
Tagged Values
anonymousRole = true

ResearchCenterDescription
+researchCenterDescription
0..1, unordered


common::ResearchCenterDescription Attributes

Attribute Type Notes
description«XSDattribute» public : java.lang.String  
homepageURL«XSDattribute» public : java.lang.String  
imageURL«XSDattribute» public Range:0 to 1: java.lang.String  
rssNewsURL«XSDattribute» public Range:0 to 1: java.lang.String  



common::SemanticMetadata

public Class:
common::SemanticMetadata Connections

Connector Source Target Notes
Association
source > target
Service
+service
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
Operation
+operation
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
UMLAttribute
+umlAttribute
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
Enumeration
+enumeration
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
ValueDomain
+valueDomain
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
UMLClass
+umlClass
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered


common::SemanticMetadata Attributes

Attribute Type Notes
conceptCode«XSDattribute» public : java.lang.String  
conceptDefinition«XSDattribute» public : java.lang.String  
conceptName«XSDattribute» public : java.lang.String  
order«XSDattribute» public Range:0 to 1: int  
orderLevel«XSDattribute» public Range:0 to 1: int  



common::UMLAttribute

public Class: caDSR-relatedRepresents a UML attribute of the parent UML Class. Indication of isRequired=false means the operation will function without the existence of this attribute.
common::UMLAttribute Connections

Connector Source Target Notes
Association
source > target
UMLAttribute
+umlAttribute
1, unordered
Tagged Values
anonymousRole = true

ValueDomain
+valueDomain
0..1, unordered

Association
source > target
UMLClass
+umlClass
1, unordered
UMLAttribute
+umlAttributeCollection
0..*, unordered

Association
source > target
UMLAttribute
+umlAttribute
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered


common::UMLAttribute Attributes

Attribute Type Notes
dataTypeName«XSDattribute» public : java.lang.String  
description«XSDattribute» public : java.lang.String  
name«XSDattribute» public : java.lang.String  
publicID«XSDattribute» public : long  
version«XSDattribute» public : float  



common::UMLClass

public Class: caDSR-relatedRepresents the UML Class of the given input or output.
common::UMLClass Connections

Connector Source Target Notes
Association
source > target
UMLClass
+umlClass
1, unordered
UMLAttribute
+umlAttributeCollection
0..*, unordered

Association
source > target
Output
+output
1, unordered
Tagged Values
anonymousRole = true

UMLClass
+umlClass
0..1, unordered

Notes
This must be present for compliance, but is allowed to be absent so a partially created model will still validate.


Association
source > target
UMLClass
+umlClass
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
InputParameter
+inputParameter
1, unordered
Tagged Values
anonymousRole = true

UMLClass
+umlClass
0..1, unordered

Notes
This must be present for compliance, but is allowed to be absent so a partially created model will still validate.


Generalization
source > target
UMLClass
Child
UMLClass
Parent


common::UMLClass Attributes

Attribute Type Notes
className«XSDattribute» public : java.lang.String  
description«XSDattribute» public : java.lang.String  
id«XSDattribute» public : java.lang.String This is used soley for the purposes of referencing this class in associations. It does not represent any caDSR identifier.
packageName«XSDattribute» public : java.lang.String  
projectName«XSDattribute» public : java.lang.String  
projectVersion«XSDattribute» public : java.lang.String  



common::ValueDomain

public Class:
common::ValueDomain Connections

Connector Source Target Notes
Association
source > target
UMLAttribute
+umlAttribute
1, unordered
Tagged Values
anonymousRole = true

ValueDomain
+valueDomain
0..1, unordered

Association
source > target
ValueDomain
+valueDomain
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
ValueDomain
+valueDomain
1, unordered
Enumeration
+enumerationCollection
0..*, unordered


common::ValueDomain Attributes

Attribute Type Notes
longName«XSDattribute» public : java.lang.String  
unitOfMeasure«XSDattribute» public Range:0 to 1: java.lang.String  


Standard Data Service Metadata Model (gov.nih.nci.cagrid.metadata.data)

_Figure 5 : Data Service Metadata Model_

data::DomainModel

public Class:
data::DomainModel Connections

Connector Source Target Notes
Association
source > target
DomainModel
+domainModel
1, unordered
UMLClass
+exposedUMLClassCollection
0..*, unordered

Tagged Values
modelGroup: all

Association
source > target
DomainModel
+domainModel
1, unordered
UMLAssociation
+exposedUMLAssociationCollection
0..*, unordered

Association
source > target
DomainModel
+domainModel
1, unordered
UMLGeneralization
+umlGeneralizationCollection
0..*, unordered


data::DomainModel Attributes

Attribute Type Notes
projectDescription«XSDattribute» public : java.lang.String  
projectLongName«XSDattribute» public : java.lang.String  
projectShortName«XSDattribute» public : java.lang.String  
projectVersion«XSDattribute» public : java.lang.String  



data::UMLAssociation

public Class:
data::UMLAssociation Connections

Connector Source Target Notes
Association
source > target
UMLAssociation
+umlAssociation
1, unordered
UMLAssociationEdge
+sourceUMLAssociationEdge
1, unordered

Association
source > target
UMLAssociation
+umlAssociation
1, unordered
UMLAssociationEdge
+targetUMLAssociationEdge
1, unordered

Association
source > target
DomainModel
+domainModel
1, unordered
UMLAssociation
+exposedUMLAssociationCollection
0..*, unordered


data::UMLAssociation Attributes

Attribute Type Notes
bidirectional«XSDattribute» public : boolean  



data::UMLAssociationEdge

public Class:
data::UMLAssociationEdge Connections

Connector Source Target Notes
Association
source > target
UMLAssociation
+umlAssociation
1, unordered
UMLAssociationEdge
+sourceUMLAssociationEdge
1, unordered

Association
source > target
UMLAssociationEdge
+umlAssociation
1, unordered
Tagged Values
anonymousRole = true

UMLClassReference
+umlClassReference
1, unordered

Association
source > target
UMLAssociation
+umlAssociation
1, unordered
UMLAssociationEdge
+targetUMLAssociationEdge
1, unordered


data::UMLAssociationEdge Attributes

Attribute Type Notes
maxCardinality«XSDattribute» public : int  
minCardinality«XSDattribute» public : int  
roleName«XSDattribute» public : java.lang.String  



data::UMLClass

public ClassExtends: UMLClass. :
data::UMLClass Connections

Connector Source Target Notes
Association
source > target
DomainModel
+domainModel
1, unordered
UMLClass
+exposedUMLClassCollection
0..*, unordered

Tagged Values
modelGroup: all

Generalization
source > target
UMLClass
Child
UMLClass
Parent


data::UMLClass Attributes

Attribute Type Notes
allowableAsTarget«XSDattribute» public : boolean Initial Value: true;



data::UMLClassReference

public Class: Represents a "pointer/reference" to a UMLClass exposed by this DomainModel. The refid attribute must share the value of an UMLClass.id on the exposedClassCollection of this model. This exists solely as an optimization for not duplicating the UMLClass (in XML) everywhere it is associated (which is a significant savings).
data::UMLClassReference Connections

Connector Source Target Notes
Association
source > target
UMLGeneralization
+umlGeneralization
1, unordered
Tagged Values
anonymousType = true

UMLClassReference
+subClassReference
1, unordered

Association
source > target
UMLAssociationEdge
+umlAssociation
1, unordered
Tagged Values
anonymousRole = true

UMLClassReference
+umlClassReference
1, unordered

Association
source > target
UMLGeneralization
+umlGeneralization
1, unordered
Tagged Values
anonymousType = true

UMLClassReference
+superClassReference
1, unordered


data::UMLClassReference Attributes

Attribute Type Notes
refid«XSDattribute» public : java.lang.String Must be the value of the UMLClass.id for the "referenced" UMLClass



data::UMLGeneralization

public Class:
data::UMLGeneralization Connections

Connector Source Target Notes
Association
source > target
UMLGeneralization
+umlGeneralization
1, unordered
Tagged Values
anonymousType = true

UMLClassReference
+subClassReference
1, unordered

Association
source > target
UMLGeneralization
+umlGeneralization
1, unordered
Tagged Values
anonymousType = true

UMLClassReference
+superClassReference
1, unordered

Association
source > target
DomainModel
+domainModel
1, unordered
UMLGeneralization
+umlGeneralizationCollection
0..*, unordered



Service Description Components (gov.nih.nci.cagrid.metadata.service)


_Figure 6 : Service Model_

service::CaDSRRegistration

public Class:
service::CaDSRRegistration Connections

Connector Source Target Notes
Association
source > target
Service
+service
1, unordered
Tagged Values
anonymousRole = true

CaDSRRegistration
+caDSRRegistration
0..1, unordered


service::CaDSRRegistration Attributes

Attribute Type Notes
registrationStatus«XSDattribute» public : java.lang.String  
workflowStatus«XSDattribute» public : java.lang.String  



service::ContextProperty

public Class: This represents an exposed property of a service context's state.This is manifested as a resource property in the grid.
service::ContextProperty Connections

Connector Source Target Notes
Association
source > target
ServiceContext
+serviceContext
1, unordered
ContextProperty
+contextPropertyCollection
0..*, unordered


service::ContextProperty Attributes

Attribute Type Notes
description«XSDattribute» public : java.lang.String  
name«XSDattribute» public : java.lang.String  



service::Fault

public Class: This represents an error that could occur during the execution of the operation.This is manifested as an operation fault in the grid.
service::Fault Connections

Connector Source Target Notes
Association
source > target
Operation
+operation
1, unordered
Fault
+faultCollection
0..*, unordered


service::Fault Attributes

Attribute Type Notes
description«XSDattribute» public : java.lang.String  
name«XSDattribute» public : java.lang.String  



service::InputParameter

public Class: Represents an input parameter to an operation.This is manifested as a parameter of a service request in the grid.
service::InputParameter Connections

Connector Source Target Notes
Association
source > target
Operation
+operation
1, unordered
InputParameter
+inputParameterCollection
0..*, unordered

Association
source > target
InputParameter
+inputParameter
1, unordered
Tagged Values
anonymousRole = true

UMLClass
+umlClass
0..1, unordered

Notes
This must be present for compliance, but is allowed to be absent so a partially created model will still validate.



service::InputParameter Attributes

Attribute Type Notes
dimensionality«XSDattribute» public : int Only valid if isArray is true; represents the dimensionality of the array
index«XSDattribute» public : int This is the 0-based index of the parameter in the operation's signature
isArray«XSDattribute» public : boolean  
isRequired«XSDattribute» public : boolean Whether the given parameter is allowed to be null or not
name«XSDattribute» public : java.lang.String  
qName«XSDattribute» public : QName  



service::Operation

public Class: This represents a method/operation/function in a service context. Its input parameters are described by its InputParameter associations, its output by its Output association, and any errors it produces by its Fault associations.This is manifested as an operation of a service in the grid.
service::Operation Connections

Connector Source Target Notes
Association
source > target
Operation
+operation
1, unordered
Fault
+faultCollection
0..*, unordered

Association
source > target
Operation
+operation
1, unordered
Tagged Values
anonymousRole = true

Output
+output
0..1, unordered

Association
source > target
Operation
+operation
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
Operation
+operation
1, unordered
InputParameter
+inputParameterCollection
0..*, unordered

Association
source > target
ServiceContext
+serviceContext
1, unordered
Operation
+operationCollection
0..*, unordered


service::Operation Attributes

Attribute Type Notes
description«XSDattribute» public : java.lang.String  
name«XSDattribute» public : java.lang.String  



service::Output

public Class: Represents the result/output of an operation. Its non-existence represents the operation produces no result.This is manifested as the value of an operation response in the grid.
service::Output Connections

Connector Source Target Notes
Association
source > target
Operation
+operation
1, unordered
Tagged Values
anonymousRole = true

Output
+output
0..1, unordered

Association
source > target
Output
+output
1, unordered
Tagged Values
anonymousRole = true

UMLClass
+umlClass
0..1, unordered

Notes
This must be present for compliance, but is allowed to be absent so a partially created model will still validate.



service::Output Attributes

Attribute Type Notes
dimensionality«XSDattribute» public : int Only valid if isArray is true; indicates number of dimensions in the array
isArray«XSDattribute» public : boolean  
qName«XSDattribute» public : QName  



service::Service

public Class: A service is a "conceptual" definition of a collection of functional contexts.This has no physical manifestation in the grid.
service::Service Connections

Connector Source Target Notes
Association
source > target
Service
+service
1, unordered
Tagged Values
anonymousRole = true

SemanticMetadata
+semanticMetadataCollection
0..*, unordered

Association
source > target
Service
+service
1, unordered
Tagged Values
anonymousRole = true

CaDSRRegistration
+caDSRRegistration
0..1, unordered

Association
source > target
Service
+service
1, unordered
PointOfContact
+pointOfContactCollection
0..*, unordered

Association
source > target
Service
+service
1, unordered
ServiceContext
+serviceContextCollection
1..*, unordered

Association
source > target
ServiceMetadata
+serviceMetadata
1, unordered
Service
+serviceDescription
0..1, unordered


service::Service Attributes

Attribute Type Notes
description«XSDattribute» public : java.lang.String  
name«XSDattribute» public : java.lang.String  
version«XSDattribute» public : java.lang.String  



service::ServiceContext

public Class: This is a functional collection of operations that work over a common collection of stateful resources.A service without stateful resources would have a single context.This is manifested as an actual service in the grid.
service::ServiceContext Connections

Connector Source Target Notes
Association
source > target
Service
+service
1, unordered
ServiceContext
+serviceContextCollection
1..*, unordered

Association
source > target
ServiceContext
+serviceContext
1, unordered
ContextProperty
+contextPropertyCollection
0..*, unordered

Association
source > target
ServiceContext
+serviceContext
1, unordered
Operation
+operationCollection
0..*, unordered


service::ServiceContext Attributes

Attribute Type Notes
description«XSDattribute» public : java.lang.String  
name«XSDattribute» public : java.lang.String  


Advertisement and Discovery


Introduction

As caBIG aims to connect data and tools from 50+ disparate cancer centers, a critical requirement of its infrastructure is that it supports the ability of researchers to discover these resources. caGrid enables this ability by taking advantage of the rich structural and semantic descriptions of data models and services that are available. Each service is required to describe itself using caGrid standard service metadata. When a grid service is connected to the caBIG grid, it registers its availability and service metadata with a central indexing registry service (Index Service). This service can be thought of as the "yellow pages" and "white pages" of caBIG. A researcher can then discover services of interest by looking them up in this registry. caGrid provides a series of high-level APIs and user applications for performing this lookup which greatly facilitate the process.
As the Index Service contains the service metadata of all the currently advertised and available services in caBIG, the expressivity of service discovery scenarios is limited only by the expressivity of the service metadata. For this reason, caGrid provides standards for service metadata, as described in the previous chapter, to which all services must adhere.

_Figure 7_ : Advertisement and Discovery Overview
As shown in Figure 7, the caGrid discovery API and tools allow researchers to query the Index Service for services satisfying a query over the service metadata. That is, researchers can lookup services in the registry using any of the information used to describe the services. For instance, all services from a given cancer center can be located, data services exposing a certain domain model or objects based on a given semantic concept can be discovered, as can analytical services that provide operations that take a given concept as input.

Requirements

The Globus Information Services are a major aspect of the infrastructure needed to satisfy the caBIG requirements for Advertisement and Discovery. These requirements specify what is needed to support the basic use cases of advertising the availability and function of a grid service by a service provider, and the discovery of said information by a user, respectively. The general use cases are explained below.
Advertisement:
The caGrid Grid Service Owner (Actor) composes service metadata describing the service to the grid and publishes it to grid. The service metadata describes properties of the grid services that caGrid users and other grid services may query.
Discovery:
A caGrid Researcher (Actor) specifies search criteria describing a service. The research submits the discovery request to a discovery service, which identifies a list of services matching the criteria, and returns the list to the researcher.
These general themes specify that the caBIG infrastructure must support the following use cases:

  • Publish advertisement a Grid Service
  • Remove advertisement of a Grid Service
  • Update advertisement of a Grid Service
  • Discover advertisement a Grid Service

Globus Information Services

The Globus Information Services component, realized as the Monitoring and Discovery System (MDS) is a suite of web services to monitor and discover resources and services on Grids. This system allows users to discover what resources are considered part of a Virtual Organization (VO) and to monitor those resources. MDS services provide query and subscription interfaces to arbitrarily detailed resource data and a trigger interface that can be configured to take action when pre-configured trouble conditions are met. MDS is composed of the following three main components:

  • WS MDS Index Service - This service contains a registry of grid resources and collects information from them, making it accessible and queryable from one location. Generally, a virtual organization deploys one or more index services, which then collect data on all of the grid resources available within that VO.
  • WS MDS Trigger Service - This service collects data from grid resources and passes the data to appropriate programs to perform various actions in response to events. (not currently used by the caGrid metadata infrastructure).
  • WS MDS Aggregator - This is the infrastructure on which the previous services are built. It collects, manages, and indexes data from an aggregator source and sends that data to an aggregator sink for processing.


WSRF Resource Properties

The section provides a brief recap of information about WSRF and the Globus implementation of it. More details can be found in the Globus documentation.
The Globus 4 toolkit provides a toolkit for create WSRF grid services. The WS-Resource Framework (WSRF) is a set of six Web services specifications that define what is termed the WS-Resource approach to modeling and managing state in a Web services context. In this approach, a resource is an entity that encapsulates the state of a stateful web service. Generally, each resource is a separate object but in certain cases it might be a singleton. A resource may just be a front end for state kept in an external entity, such as a file in a file system, a row in a database or an entity bean in a J2EE container.
Figure 8 : Globus Resource Framework
A resource key is represented by a ResourceKey interface. It is a combination of a key name and the actual key value. A resource is represented by a Resource interface. It is a marker interface without any method defined. All resource objects must implement this interface. Resources are managed by an object that implements the ResourceHome interface. The ResourceHome interface provides methods for finding and removing resources as well as methods for identifying the SOAP header element and class for the resource key. In addition to the methods specified by the interface, ResourceHome implementations will generally provide an implementation-specific create() call or any other methods that operate on a set of resources.
Resources may have resource properties. Resource properties are declared in the WSDL of the service as elements of a resource property document. The ResourceProperties interface contains a single accessor method for retrieving the ResourcePropertySet from a resource. It must be implemented by all resources that want to expose resource properties. The ResourcePropertySet is the representation of the resource property document associated with the resource. It contains methods for managing the set of resource properties, e.g. adding and removing resource properties, and for discovering properties of the document itself, e.g. its name. The ResourceProperty interface needs to be implemented by all resource properties. It contains methods for: managing the set of values associated with the resource property, discovering properties of the resource property element, and serializing the resource property to a array of SOAP or DOM elements. The ResourcePropertyMetaData interface contains metadata information about a ResourceProperty such as resource property name, cardinality, etc.

Figure 9 : Globus Resource Property Framework
Once metadata items are exposed as ResourceProperties, they can be queried using standard web service operations defined by the WS-ResourceProperties specification. Consult the specification for more details, but a synopsis of the operations are provided here:

  • GetResourceProperty: allows access to the value of any resource property given its QName.
  • GetMultipleResourceProperties: allows access to the value of several resource properties at once, given each of their Qnames.
  • QueryResourceProperties: allows complex queries on the resource properties document. Currently, the query language used is XPath.


caGrid Advertisement

caGrid provides the service infrastructure necessary to leverage MDS to enable service advertisement. The advertisement is made possible by realizing the conceptual caGrid standard metadata, described in the previous chapter, as ResourceProperties of caGrid services. Each caGrid service is expected to create a singleton Resource to manage its service metadata. Each service metadata item the service wishes to expose, is represented as a separate ResourceProperty of this singleton Resource. For example, a Data Service may expose two metadata items: ServiceMetadata, and DomainModel. Each of these items will be represented as ResourceProperties, contained in the ResourcePropertySet of the singleton Resource of the service.
The caGrid provided implementation of ResourceHome, BaseResourceHome, manages the initial creation and management of the singleton Resource. The caGrid provided implementation of Resource, BaseResource, contains all of the logic necessary to manage the collection of service metadata items, populate them from a file, and advertise them to an Index Service. The ResourceConfiguration class maintains the information needed to configure the advertisement process. Each of these classes and the corresponding configuration files are managed by Introduce. When a service developer adds or removes service metadata in Introduce, these files are edited, using code generation capabilities, to reflect the metadata configuration of the service. Developers not using Introduce to create their services can either: reuse this classes are a starting point, use this design and re-implement it, or use a completely separate process. The only requirement is that the end result is caGrid standard metadata exposed as ResourceProperties of a singleton Resource of their service.
The implementation requires that each metadata item be represented as a Java Bean capable of serializing itself to XML and deserializing itself from XML. caGrid provides XML Schemas describing the XML format of its standard metadata items. These XML schemas are used to generate appropriate Java Beans. The BaseResource, when initialized, will read the ResourceConfiguration. This configuration will specify which metadata items are to be instantiated from corresponding XML files, and where those files are located. It then will read each file, and deserialize the metadata instances. Those metadata items that are not populated from file are expected to be instantiated from the service's implementation code. Once the metadata items are instantiated, the BaseResource creates and populates its ResourcePropertySet with all of the appropriate items. These metadata items are then made available as service metadata exposed as ResourceProperties of the service's singleton Resource.
Figure 10 : caGrid Resource Implementation
The final process the BaseResource performs on initialization is the registration, or advertisement, of the appropriate metadata items to the Index Service. Again, this process is configured through the ResourceConfiguration. For each metadata item, the configuration specifies whether or not the Index Service should aggregate its value. This enables services to expose some service metadata as ResourceProperties, but not register it with the central Index. Additionally, the configuration specifies the location of a registration configuration file. This file is used to configure the MDS registration process. An example of this file is shown in Figure 11. The file specified which Index Service to register with, how the Index Service should obtain the values of the metadata being registered (on configuration options of that method), and which metadata items are being registered. In the given example, the service is registering the caGrid common ServiceMetadata to the Index Service at http://cagrid01.bmi.ohio-state.edu:8080/wsrf/services/DefaultIndexService, and is specifying that the values should be polled from the service every 5 minutes.
Figure 11 : Example Registration Configuration
caGrid leverages the MDS ServiceGroupRegistrationClient to perform the Index Service registration. It handles the "soft state registration" process, wherein the service periodically renews its registration with the Index Service for the duration of the services lifetime. The registration with the Index Service is only valid for a short lifetime (several minutes) and if the service fails to renew its registration, the Index Service will purge its corresponding entry. This dynamic process guarantees that Index Service will only contain relevant entries, as expired entries are discarded. It also ensures that the Index Service contains the most recent value of the metadata items, as it periodically gets the latest value using the process specified by the service at registration time. This process ensures that the integrity of the caGrid "yellow and white pages" will survive periodic Index Service failure, registered service failure, and general network failures. It should be noted, however, that there are various delays in this process and the Index Service will always contain slightly stale information, and if the most up to date information is needed, it should be extracted from the service in question directly.

Discovery API

The Index Service aggregates all of the ResourceProperties registered to it, and makes them available as a single ResourceProperty for aggregated query. This aggregated information can then be accessed using the operations specified in the WS-ResourceProperties specification described above. While these operations are technically sufficient to satisfy the discovery use cases of caBIG, the mechanisms for doing so are fairly low level. In order to better facilitate discovery for caGrid services a higher level API was created. This Discovery API, is built on top of the standard WS-ResourceProperties operations and provides higher level access to registered services and their metadata. This client side API enables the user to implement discovery as part of a caGrid application without having to know the schema of the service metadata which were advertised by a particular service type. Additionally, it abstracts away the details of ResourceProperties themselves, and allows the client access to the metadata without requiring them to know how it is exposed by the grid.
The API, show in below in Figure 12, allows a client to specify an Index Service to query, and then provides a series of high level discovery operations. Each discovery operation returns an array of EndPointReferenceType, which is essentially the address of the service matching the discovery criteria.
Figure 12 : Discovery API
The Discovery API also provides some utilities, which it uses internally, to facilitate interaction with the Index Service and Resource Properties. These utilities, show in Figure 13, provide such capabilities as accessing or querying a given service's Resource Properties, obtaining the Java Bean representation of caGrid standard metadata from a given service, and manipulating user-friendly XPath queries (support for namespace prefixes) to a format the Index Service understands (no namespace prefixes used).
Figure 13 : Discovery API Utilities
An example usage of the Discovery API is shown in Figure 14. This code snippet creates a new DiscoveryClient instance on line 1, specifying the Index Service that should be queried for all discovery operations. It then queries the Index Service for all registered services on line 4, and loops over them. Line 8, then accesses the caGrid standard metadata for each registered service, and prints out the display name of its hosting research center on line 10 and 11. This code example is for demonstration purposes only, and more error checking (handling nulls and exceptions) should be used in production code.
{_}Figure 14 : Discovery API Example Code_h1. Serialization and Deserialization

overview

XML schemas play a role in several aspects of the caBIG runtime environment. Figure 1 illustrates the various services and artifacts related to the description of and process for the transfer of data objects between client and service. Both the client and service APIs are object oriented, and operate over well-defined and curated data types. The objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR). The definitions draw from vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described.
Client and services in caBIG communicate through the grid using respectively Globus grid clients and service infrastructure. The grid communication protocol is XML, and thus the client and service APIs must transform the transferred objects to and from XML. This XML serialization of caBIG objects is restricted in that each object that travels on the grid must do so as XML which adheres to an XML schema registered in the Global Model Exchange (GME). As the caDSR and EVS define the properties, relationships, and semantics of caBIG data types, the GME defines the syntax of the XML serialization of them. Furthermore, Globus services are defined by the Web Service Description Language (WSDL). The WSDL describes the various operations the service provides to the grid. The inputs and outputs of the operations, among other things, in WSDL are defined by XML schemas. As caBIG requires that the inputs and outputs of service operations use only registered objects, these input and output data types are defined by the XSDs which are registered in GME. In this way, the XSDs are used both to describe the contract of the service and to validate the XML serialization of the objects which it uses.

Object Serialization

As mentioned in a preceding section, objects must serialize to and from XML as they traverse the grid. This section details the alternative approaches for said process.

Standard Globus Serialization

caGrid is built using the Globus 4 toolkit. The Globus toolkit has a complete set of tools for automatic generation of serializable objects using models defined in XSDs. Using this mechanism, Globus has the ability to automatically create a set of Java Beans which represent this model which will be able to be serialized and deserialized automatically at client and service runtime via the Globus toolkit with no extra configuration.
In order to use these types in a grid service the developer must describe the types that they will be using in the WSDL file. This will enable Globus to locate the types, and generate the required beans during stub generation time. For more information on this process please see the Globus documentation.
Using this approach, the object-oriented client and service APIs are written using the Globus generated Java Objects. The toolkit will automatically serialize and deserialize the Objects as the travel to and from the grid.

Custom Serialization

If a developer already has a java object model that they are already using which is either not serializable or uses a custom serialize the developer will need to configure the Globus service and client to be able to use the custom serialize. This can be done by using the Globus type mapping configuration xml in the services WSDD and in the client's configuration WSDD. This type mapping paradigm is document in the Globus documentation but we will cover the basics in this document.
If a user has an object called MyObject which has its own serializer then the following configuration (Figure 15) must be placed in the service WSDD and in the client configuration WSDD:
<service ...>
...
<typeMapping
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
serializer="gov.nih.nci.cabig.MySerializerFactory"
deserializer="gov.nih.nci.cabig.MyDeserializerFactory"
type="java:gov.nih.nci.cabig.bean.MyType"
qname="ns1:myType"
xmlns:ns1=http://cabig.nci.nih.gov/1/myType
/>
...
</service> Figure 15 : Example Custom Serialization configuration

This configuration will now allow the service to use the MySerializerFactory and MyDeserializerFactory to marshal its object back and forth across the grid. The typeMapping element must be in the server and client wsdd and in order for the custom serializer and deserializer to be invoked.

caCORE SDK Serialization

caGrid provides an implementation of the previously mentioned custom serialization process for objects generated with the caCORE SDK version 3.1. This feature takes advantage of the fact that the caCORE SDK generates a "XML mapping" file which specifies the mapping between every Class and attribute to a corresponding XML entity. This mapping file is of the format specified by Castor (http://castor.codehaus.org/xml-mapping.html), and can be used by the Castor Marshalling framework to marshall between an XML document meeting the corresponding XML schema, and the corresponding Java objects. Similar to the process used in the Globus generated Java Beans, this provides sufficient functionality to use these Java classes in the service and client code, and have them automatically serialized and deserialized to and from the grid. Castor provides the additional functionality, however, to separate the Java Beans themselves from the serialization process. In this way, Castor can be used to serialize and deserialize between arbitrary Beans and XML, given an appropriately defined XML mapping. caCORE SDK creates such a file for each model it creates Java Beans and XML Schemas. The caGrid SDK Serializer and Deserializer make use of this functionality by providing the necessary hooks to automatically invoke Castor, using the appropriate mapping, from the Globus infrastructure.

_Figure 16 : SDK Serialization Framework_
The components used to support this functionality are shown in Figure 16. The Factory classes sole responsibility is to be hooked into the underlying Axis framework, and return an appropriate instance of the Serializer and Deserializer classes as needed. Both of these classes internally utilize the Castor APIs to marshall and unmarshall the Java Beans as needed. They leverage the EncodingUtils to access the appropriate Castor mapping. This utility first attempts to read a configuration parameter "castorMapping" from the current Axis context. This property can be specified in the client and service WSDD file, as shown in the highlighted section of Figure 16. The value of this parameter is expected to be a classpath reference to the Castor XML mapping file which should be used. This parameter allows multiple services or clients running in the same environment, to use different mappings. If this parameter is not set, the mapping is expected to be loaded from the default location (/xml-mapping.xml). This location will work for SDK generated Java Beans, as it is included in their jar files under said location. However, as mentioned, the preferred approach is to explicitly specify a unique location, because if two SDK generated systems were used in the same environment, only one of the mappings would be loadable (determine by the classpath settings).



Figure 17 : Example WSDD Type Mapping
Introduce supports automatically specifying SDK Serialization, and the appropriate WSDD configurations are made automatically.

Schema Creation

As detailed above, XML Schemas play an important role in the runtime environment of caBIG. This section details some of the alternatives for creating these schemas. It should be clear from the preceding sections that the mechanism used to serialize the objects for a given service are highly dependant on the schemas it uses, as the XML serialization must conform to the schemas. In other words, if the runtime objects were not generated from an existing schema, either the serialization process must adhere to some generated schema, or a schema must be created to describe the serialization format; the two are highly dependent on one another.

XMI Based Generation

Object Management Group's standard for XML Metadata Interchange (XMI) is data representation developed to enable easy interchange of metadata between modeling tools (based on the OMG-UML) and metadata repositories (OMG-MOF based) in distributed heterogeneous environments.
As caBIG requires the modeling of data types in UML, and provides a UML modeling tool capable of generating XMI, Enterprise Architect, to its developers, it is reasonable create schema generation tools that process XMI. Furthermore, the UML Loader, which is responsible for registering metadata in the caDSR, takes a semantically annotated XMI file as input. The following sections detail approaches for generating XML schemas from XMI.

Enterprise Architect

In Enterprise Architect, an XML schema corresponds to a UML package. Therefore the XSD generation is a package-level operation in Enterprise Architect (EA). The basic generation is fairly simple to invoke, and is detailed below in Table 2. Using this process on the caBIO model, data types such as the Chromosome shown in Figure 18are produced.

1. Select the package to be converted to XSD by right-clicking on the package in the Project Browser.
2. Select Project Generate XML Schema from the main menu.
3. Set the desired output file using the Filename field.
4. Set the desired xml encoding using the Encoding field.
5. Click on the Generate button to generate the schema.
6. The progress of the schema generator will be shown in the Progress edit box.

Table 2: Steps to generate an XSD in EA
Figure 18 : EA generated Chromosome type from caBIO
As can be seen in the example, the naming scheme used by EA is possibly not what one would expect. The parent data type's name is prefixed on all of its children. This is not only overly verbose, but also would produce non-standardly named Objects if it was used. Another disadvantage is fact that schemas are generated on a package-level basis and thus if a project contains several packages, the process must be done manually for each. EA does provide several configuration points for customizing the XSD generation. More details can be found in the user documentation: http://www.sparxsystems.com.au/resources/xml_schema_generation.html. Of particular interest are the mechanisms to set the namespaces. All customizations can be set using the "tagged values" feature of EA. See the EA documentation for more details on how to create tagged values. The basic steps are to open the tagged values view, click on a package of interest, click the new tag button in the tagged values view, type the appropriate name (see below), then edit its value appropriately. Specifically the "targetNamespace" should be set to the namespace under which the generated schema will be published. Additionally, the "targetNamespacePrefix" should be set to a meaningful abbreviation (e.g. cabio for the caBIO schema). An important final step in generating the XML Schemas is to correct any xsd:import statements EA generates. These should have an appropriate relative path to the imported schema for the attribute schemaLocation.
The tagged values described by the EA documentation are an implementation of the "UML Profile for XML Schema." A UML profile has three key items: stereotypes, tagged values (properties), and constraints. A profile provides a definition of these items and explains how they extend the UML in a particular domain, which is XML schema in this case. Each of the configuration points in the profile gives the user control over the generation capabilities. Leveraging the UML profile, EA can generate highly customizable XML Schemas. The details of the profile, and the effect they have, can be found in the EA documentation linked above, and in the following articles:

  1. http://www.xml.com/lpt/a/2001/08/22/uml.html
  2. http://www.xml.com/lpt/a/2001/09/19/uml.html
  3. http://www.xml.com/lpt/a/2001/10/10/uml.html


The following table details some best practices for modifying the default settings of EA.

UML Construct Tagged Value Name Tagged Value Value Example Notes
Package targetNamespace {set according to caGrid Recommendations below} gme://caBIO.caCORE/3.1/gov.nih.nci.cabio.domain Used to specify the namespace of the XSD. This is important, as it uniquely identifies the XSD on the grid.
Package targetNamespacePrefix {something short and unique within the project} cabio Used to specify namespace prefixes in the XSD
Package memberName unqualified   This prevents EA from prefixing every element and attribute with the Class name. For example, taxon instead of Chromosome.taxon.
Package elementFormDefault qualified   Necessary to ensure created elements have the proper namespace.
Package/Association Source anonymousType false   Ensures that created elements are of the proper xsd:type; forces element references instead of anonymous elements of a particular xsd:type. (This is supposed to be the default in EA, but doesn't appear to be, at least in version 6.1) You may want to set this to true on some associations. Experiment with this value for your Package and see what works best.
Package/ Association Source anonymousRole false   Ensures that a new element is created for each association, and given the name of the target rolename. This is the default value in EA. You may want to set this to true on some associations (such as when the max cardinality is 1, to prevent the creation of the "wrapper" element). Experiment with this value for your Package and see what works best.

Table 3 : EA UML Profile Best Practices The caGrid standard metadata, detailed in ?Chapter 3, was modeled in EA and its corresponding XML Schemas were generated using the mechanisms described above. You can view the EA project file for the caGrid metadata to see some of these settings used in practice.

hyperModel

Another tool capable of generating XSDs from XMI, amongst other things, is hyperModel (www.xmlmodeling.com). It leverages the same UML Profile as EA, and can import an XMI file. It also suffers from the same package-level issues as EA. It claims to offer more configuration points than can be easily identified through its user interface. The documentation of hyperModel is fairly lacking. There is a book published on its use and philosophies, so it is likely more detail is provided therein.

caCORE SDK

The caCORE SDK team is working on generalizing their code generation engine to be able to generate XML Schemas, plain Java Beans, and castor mapping files to configure serialization and deserialization to and from XML which adheres to those schemas. Once this is available, these artifacts will be compatible with caGrid, using the serialization approach described above in the caCORE SDK Serialization section.

User Authored Approach

In the event an existing set of Java Objects planned to be used on the grid which has the ability to serialize and deserialize to XML already, it is likely desirous to use those capabilities. In this case, unless an XML schema already exists to describe the serialized XML, the schema will probably need to be hand written to describe the format of the XML. Most XML Schema editing tools have the capability to create a schema from an existing XML document. This mechanism could be leveraged to create a starting schema using an existing XML serialization, though care should be taken to review the schema for things like cardinality, enumerations, and data types, as those things may very from document to document.

XML Schema Namespace Conventions

As GME manages schemas by their respective namespaces, a consistent approach to selecting namespaces for data types is necessary. This section details the current recommendation for assigning namespaces to objects based on the information about them in caDSR. This section assumes a working knowledge of caDSR terminology, and the caDSR documentation should be consulted for reference.

Namespace Format

In caDSR, each project (application) will have its own Classification Scheme (e.g. caCORE). A Classification Scheme may define a subproject, which is represented as a Classification Scheme Item (CSI) (e.g. caBIO). For projects creating new XML Schemas, a reasonable approach to modeling is to assign each CSI its own schema. One could certainly create XML schemas for each object in a Classification Schema but that seems unnecessary. Unlike caGrid 0.5, which did not support this, caGrid can now handle this using the process, for mapping caDSR items to GME items, defined in the following section.
The GME has a specific restriction on the namespace format of the schemas it stores, shown in Figure 19.

_Figure 19 : GME Namespace Format_
The "gme://" part specifies that the protocol to use is GME. This is technically optional, as the GME service stores namespaces in a protocol independent way. The <domain> identifies the traditional "namespace" of the data type and is used to locate the GME which is the authority for that namespace. The <schema> is the identifier used within that <domain> to identify the schema. In GME, once a schema is published, it is not allowed to change; it can only be versioned, in which case the <schema> part of the namespace can be changed. Generally, a best practice is to have some version identifier a part of the <schema>.
Given the information in caDSR, and the restrictions of GME, the current recommendation for assigning namespaces for caBIG objects is shown in Figure 20.

_Figure 20 : Namespace Naming Conventions_
Internally, each Object will have an element/complex type defined using its name. For example, if the caTIES model has an object Document, it would serialize to a document similar to that shown in Figure 21.

Figure 21 : Example caTIES Document serialization
Although caBIG will be using a single GME, GMEs can be federated. Using this namespacing scheme, we would ultimately be able to federate authoritative schema storage on a per Classification Scheme basis (which basically means that each project could have its own GME). As a side note, GMEs can be unboundedly federated for redundancy reasons (for example running several caching GMEs), but ultimately there must be only one authoritative GME for a given schema (where the caches pull from, and the user originally publishes to).
In caGrid 0.5, this namespace format was required in order to bind caDSR items to the XML representation. However, this approach has several drawbacks and limitations. Firstly, the caDSR does not actually enforce the naming uniqueness that is assumed by this model. Uniqueness in caDSR is done using public identifiers and versions, and the "names" of things are theoretically editable. Secondly, this approach does not allow the use of existing XML Schemas, with existing namespaces, to be used to represent caDSR registered models, which is a valid use case in caBIG. Finally, this model is not sufficient to account for fine grain XML Schema modeling differences. For example, it is not possible to assert whether a given attribute of a Class is represented as an XML element, or attribute (or even know what its name is, if it does map directly). To overcome these limitations, a model to represent fine grain mapping of caDSR items to their corresponding XML manifestations is being developed, and is described in the following section. It is not yet programmatically available in caGrid 1.0, so the naming convention used in caGrid 0.5 is still a requirement. However, work is under way to register this binding in caDSR, and make it programmatically available. When this becomes available, these namespace conventions will no longer be necessary for caGrid.

caDSR/GME Mapping

As described in ?Chapter 3, caGrid relies heavily on the caDSR to register the object models to be used in the grid. Similarly, this chapter detailed the importance of XML schemas being registered in GME. As the process of serialization and deserialization is the mapping, respectively, from object model to XML, and XML to object model, it is important there is a formal binding between the caDSR and the GME. Review again, the conceptual overview of this relationship as shown below in Figure 22. The binding in question, is shown as a highlighted yellow arrow in the diagram. This mapping is important, as it is the bridge between the conceptual and semantic models of the caDSR/EVS, and the actual syntactic structure of the grid.

_Figure 22 : caDSR and GME relationship_

XML Schema Existence Rules

The first aspect of the binding between the caDSR and GME is the definition of which components of caDSR must have corresponding definitions in XML Schemas in GME. These rules are detailed below:

  1. Each caDSR data object (as part of a specific Project version) used in the grid must have its XML format modeled in an XML Schema that is registered in the GME. It is expected to be able to be represented as an XML document (a self-standing element). That is, the object must be able to be passed around the grid (as result of a query to a data service, or invocation of an analytical service, as well as act as input to a operation).
  2. Within each data object, the format of its attributes and associations must be modeled in the corresponding XML Schema type.
  3. Every caDSR Package (as part of a specific Project version) must have a corresponding XML Schema that is registered in the GME, even if it just imports a series of other XML Schemas and doesn't define any of its own types.
  4. Every caDSR Project must have a corresponding XML Schema registered in the GME. It will most likely just imports a series of other XML Schemas (corresponding to its Packages' schemas), though it may define its own types.
  5. These rules let the XSD modeler create any level of schema granularity from 1 per project, down to 1 per Class, but there is always a defined way to retrieve all of the types for a given Project, Package, and Class.


Mapping Rules

The rules above only detail the necessity of existence of the various XML Schemas and schema entities, they do not specify how a particular instance can be located. That is, given a particular Class in caDSR, for instance, there must be a way to locate the XML Schema that defines its corresponding XML structure, and even its type definition within that schema. The proposed model to describe this information, shown below in Figure 23, addresses this problem. It specifies a detailed mapping between every caDSR item relevant to grid serialization, and its corresponding XML manifestation. It may also be necessary to map from XML Schema types to caDSR CDEs. For example, to inspect a WSDL from a service to determine if it exposes some CDE. The model defined below is a one-to-one mapping and it should be trivial to implement a reverse-lookup. The current plan is to extend caDSR to represent this additional information about the model's it registers. In which case, runtime access to this information would be available through the caDSR APIs.

_Figure 23 : UML to XML Mapping_

The UML entities shown on the left of the diagram don't all have direct representation in caDSR, and are subject to the UML to caDSR mappings previously described. However, they represent the information with which service developers will be familiar, and that which needs a corresponding XML manifestation. These UML items are detailed below.

Association

public Class: Represents a UML association between Source Class and Target Class. The XML representation (within the Source Object's XML) of the target Object or Colleciton of Objects is pointed to by the targetXMLRepresentation.If the association is bi-directional, the corresponding representation (within the Target Object's XML) of the source Object or Collection of Objects is pointed to by the sourceXMLRepresentation

Attribute

public Class: Represents a specific UML attribute. Its corresponding XML representation (within the context of the XML representation of a specific Class) is defined by the xmlRepresentation.

Class

public Class: Represents a specific UML Class. If a class is modified in anyway during resuse, it should be represented by a different one of these. If it is resused completely in multiple projects, there should only be one of these. Its XML representation is defined by the associated xmlRepresentation.

Package

public Class: Represents a specific UML package. Each package is required to link to a unique XML schema, as defined by the associated namespace. The XML Schema will either define the types (of its contained Classes) locally, or simply import other XSD, or a combination of both.

Project

public Class: Represents a specific UML project. Each project is required to link to a unique XML schema, as defined by the associated namespace. The XML Schema will either define the types (of its contained Classes) locally, or simply import other XSD (such as those of its contained Packages), or a combination of both.
The proposed mapping of these UML items to their manifestations in XML, and therefore, the GME, are described below.

umlprofile::XMLElement

public Class: Represents an XML QName (namespace qualified Element). It should have a global element definition in the XML Schema for the associated Namespace.

umlprofile::XMLLocationReference

public Class: Defines a specific location within an XMLElement. The contained XPath expression should be interpreted as a pointer relative to the associated XMLElement. The expression should be namespace unaware, and in the event namespace differentiation is required, the XPath operator "namespace-uri()" should be used as a predicate to specify the desired namespace.

umlprofile::XMLNamespace

public Class: Represents an XML namespace.It should have a corresponding XML Schema in GME.
As described above, UML Projects and Packages are both mapped to XML Namespaces, which have corresponding XML Schemas registered in the GME. That is, the value of the XMLNamespace's namespace should be set to the targetNamespace of the XML Schema registered in the GME that represents them. Secondly, a UML Class maps to an XML Element. This element must be defined in an XML Schema, represented again, by the associated XMLNamespace.
Lastly, a given Class's attributes and associations are described relative to their location in the Class's XML element. In these cases, the item of interest is defined by an XPath expression, relative to the containing Class's XML Element. This is represented by the association from XMLLocationReference to XMLElement, with the role name containingElement. For attributes, the XPath will generally just directly reference an XML attribute of the Class's element, or a direct sub-element. For UML Associations, the mapping is slightly more complex. In this case, the Association will always have the Association target represented by the targetXMLRepresentation, and the containingElement will actually represent the Association source. That is, every Association, be it unidirectional or bidirectional, will at least connect a source to a target. For the case of bidirectional Associations, in addition to the targetXMLRepresentation association, the sourceXMLRepresentation association will also be populated. In this case the containingElement will actually represent the Association target. Or more generally, for a given Association to XMLLocationReference, the associated end of the association (target or source), will represent the opposite end of the association as the XMLLocationReference's containingElement. These scenarios are further explained below by an example. In all cases, the use of XPath for the XMLLocationReference enables expressivity to represent any structural design choices of the XML Schema. For example, naming changes, element vs. attribute, and the existence of wrapper elements can all be modeled. However, it has the desirable property of being simple (human readable) when the mapping is straightforward.

Example UML Project

This section details an example UML project (Figure 24 and Figure 25), one way to represent it in XML Schemas, and the corresponding caDSR to XML mappings as defined above.
Figure 24 : BookStore Example UML Model
Figure 25 : Package Structure of Example UML Model

Example XML Schemas

This section shows some hypothetical XML Schemas that could be used to represent the model above. These XSDs were generated from EA using the process specified elsewhere in this chapter. The XSD namespace's were created arbitrarily to demonstrate the flexibility of the mapping.
The first XSD, shown in Figure 26, represent the Project-level schema for the example Project, and simply includes the Package-level schemas.
Figure 26 : Example Project Schema
The second XSD, shown in Figure 27, represents the "common" Package of the example Project, and contains the XML definitions of the Classes defined in it.
Figure 27 : Common Package of Example Project Schema
The third XSD, shown in Figure 28, represents the "bookstore" Package of the example Project, and contains the XML definitions of the Classes defined in it.
Figure 28 : Bookstore Package of Example Project Schema

UML Profile Mappings

This section details the appropriate mappings, using the model above, for the example Project and XML Schemas.

Project Mappings

Project maps to:

  • namespace -> XMLNamespace
  • namespace=gme://example.com/version/1.0


Package Mappings

com.example.common Package maps to:

  • namespace -> XMLNamespace
    • namespace= gme://common.example.com/version/2.5


com.example.bookstore Package maps to:

  • namespace -> XMLNamespace
    • namespace= gme://bookstore.example.com/version/1.0


Class Mappings

com.example.common.Address Class maps to:

  • xmlRepresentation -> XMLElement
    • name= Address
    • xmlNamespace -> XMLNamespace
      • namespace= gme://common.example.com/version/2.5


com.example.bookstore.Author Class maps to:

  • xmlRepresentation -> XMLElement
    • name= Author
    • xmlNamespace -> XMLNamespace
      • namespace= gme://bookstore.example.com/version/1.0


com.example.bookstore.Book Class maps to:

  • xmlRepresentation -> XMLElement
    • name= Book
    • xmlNamespace -> XMLNamespace
      • namespace= gme://bookstore.example.com/version/1.0


com.example.bookstore.BookStore Class maps to:

  • xmlRepresentation -> XMLElement
    • name= BookStore
    • xmlNamespace -> XMLNamespace
      • namespace= gme://bookstore.example.com/version/1.0


Class Association Mappings

BookStore (bookstore 1...1) to Book ( bookCollection 0...*)*Association* maps to:

  • targetXMLRepresentation -> XMLLocationReference
    • relativeXPathExpression=bookCollection/Book
    • containingElement -> XMLElement
      • name= BookStore
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0


BookStore (bookstore 1...1) to Address (address 1...1)Association maps to:

  • targetXMLRepresentation -> XMLLocationReference
    • relativeXPathExpression=address/Address
    • containingElement -> XMLElement
      • name= BookStore
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0



Book (bookCollection 1...*) to Author(author1...1)*Association* maps to:

  • targetXMLRepresentation -> XMLLocationReference
    • relativeXPathExpression=author/Author
    • containingElement -> XMLElement
      • name= Book
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0
  • sourceXMLRepresentation -> XMLLocationReference
    • relativeXPathExpression=bookCollection/Book
    • containingElement -> XMLElement
      • name= Author
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0


Attribute Mappings

Author firstName Attribute maps to:

  • xmlRepresentation -> XMLLocationReference
    • relativeXPathExpression=@firstName
    • containingElement -> XMLElement
      • name= Author
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0


Author lastName Attribute maps to:

  • xmlRepresentation -> XMLLocationReference
    • relativeXPathExpression=@lastName
    • containingElement -> XMLElement
      • name= Author
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0


Book title Attribute maps to:

  • xmlRepresentation -> XMLLocationReference
    • relativeXPathExpression= title
    • containingElement -> XMLElement
      • name= Book
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0


BookStore name Attribute maps to:

  • xmlRepresentation -> XMLLocationReference
    • relativeXPathExpression= name
    • containingElement -> XMLElement
      • name= BookStore
      • xmlNamespace -> XMLNamespace
        • namespace= gme://bookstore.example.com/version/1.0


Address address Attribute maps to:

  • xmlRepresentation -> XMLLocationReference
    • relativeXPathExpression= address
    • containingElement -> XMLElement
      • name= Address
      • xmlNamespace -> XMLNamespace
        • namespace= gme://common.example.com/version/2.5


Index Service


overview

For the purposes of Advertise and Discovery, caGrid leverages the Globus-provided Index Service. The Index Service implements the standard WS-ServiceGroup specification. When services are added to the service group, they specify what and how metadata should be accessed from them, and the Index Service performs this aggregation. Clients can then query this aggregated information using standard Resource Property operations.
caGrid services are expected to maintain soft-state registration to a well-known, Index Service instance, specifying polling of standard caGrid standard metadata. For more information, see the section on caGrid Advertisement, and the caGrid Service Specification document.
For more information on the Index Service, see the Globus documentation (http://www.globus.org/toolkit/docs/4.0/info/).

"trusted registration" design

While the Index Service and service registration mechanisms previously described sufficiently address the Advertisement and Discovery use cases of caGrid, there is no security or level of assurance built into the model. That is, the services themselves publish the information which clients may use to select amongst them. In this model, there is nothing preventing a rogue service from "false advertising," by misrepresenting its capabilities through the metadata it publishes. While this is not likely to be a major problem initially, it may become more of an issue if/when caBIG starts certifying service compliance and there is a desire to be able to discover only certified services. Alternatively, if significant tooling is built which programmatically leverages the discovery API, it is likely some level of assurance would be desired.
While the best solution to this problem is probably to maintain the open registration, but add additional service selection information (such as a signed certificate, white list, etc). That is, a separate certifying source could maintain a list of certified or trusted services, and a discriminating client could use the Index Service for discovery, and cross check the results with the "official" list of services.
Another possible solution to this problem would be to maintain an additional Index Service instance, but disable open registration on it. The service could allow anyone to query it, but could perform authorization on registration. Services themselves would not be authorized to register directly with the Index Service, but rather a trusted third party (such as a caBIG certifying authority) would registered the certified services in this Index Service. The trusted third party could monitor the open Index Service (using either standard resource operations directly, or using the Globus Trigger Service), and add and remove services from the secure Index Service appropriately. Services themselves could also be authorized to register directly with the Index Service, but that would require them to run securely, such that they would have credentials which could be authorized by the Index Service. In either model, the existing DiscoveryClient could be used by simply querying this secure Index Service.

  1. References

This appendix could include lists and hypertext links, where appropriate, to technical manuals, articles, scientific publications, etc. Examples are shown:

Technical Manuals/Articles

Scientific Publications

caBIG Material

  1. caBIG: http://cabig.nci.nih.gov/
  2. caBIG Compatibility Guidelines: http://cabig.nci.nih.gov/guidelines_documentation

caCORE Material

  1. caCORE: http://ncicb.nci.nih.gov/core
  2. caBIO: http://ncicb.nci.nih.gov/core/caBIO
  3. caDSR: http://ncicb.nci.nih.gov/core/caDSR
  4. EVS: http://ncicb.nci.nih.gov/core/EVS
  5. CSM: http://ncicb.nci.nih.gov/core/CSM
  1. Glossary

Following is an example list of terms and their definitions.

Term Definition
{jboss-home} The base directory where JBoss is installed on the server
API Application Programming Interface
caArray cancer Array Informatics
caBIG cancer Biomedical Informatics Grid
caBIO Cancer Bioinformatics Infrastructure Objects
caCORE cancer Common Ontologic Representation Environment
caDSR Cancer Data Standards Repository
caMOD Cancer Models Database
cardinality Cardinality describes the minimum and maximum number of associated objects within a set
CDE Common Data Element
CGAP Cancer Genome Anatomy Project
CMAP Cancer Molecular Analysis Project
CN Common Name
CS Classification Scheme
CSI Classification Scheme Item
CSM Common Security Module
CTEP Cancer Therapy Evaluation Program
CUI Concept Unique Identifier
CVS Concurrent Versions System
DAML DARPA Agent Markup Language
DAO Data Access Objects
DARPA Defense Advanced Research Projects Agency
DAS Distributed Annotation System
DL Description Logic
EA Enterprise Architect
EBI European Bioinformatics Institute
EVS Enterprise Vocabulary Services
GAI CGAP Genetic Annotation Initiative
GEDP Gene Expression Data Portal
HIPPA Health Insurance Portability and Accountability Act
HLGT High Level Group Term
HLT High Level Term
HTTP Hypertext Transfer Protocol
ISO International Organization for Standardization
JAAS Java Authentication and Authorization Service
JAR Java Archive
Javadoc Tool for generating API documentation in HTML format from doc comments in source code (http://java.sun.com/j2se/javadoc/)
JDBC Java Database Connectivity
JET Java Emitter Templates
JMI Java Metadata Interface
JSP JavaServer Pages
JUnit A simple framework to write repeatable tests (http://junit.sourceforge.net/)
LDAP Lightweight Directory Access Protocol
LLT Lowest Level Term
LOINC Logical Observation Identifier Names and Codes
MAGE MicroArray and Gene Expression
MAGE-OM MicroArray Gene Expression - Object Model
MedDRA Medical Dictionary for Regulatory Activities
metadata Definitional data that provides information about or documentation of other data.
MGED Microarray Gene Expression Data
MMHCC Mouse Models of Human Cancers Consortium
MO MGED Ontology
multiplicity Multiplicity of an association end indicates the number of objects of the class on that end may be associated with a single object of the class on the other end
NCI National Cancer Institute
NCICB National Cancer Institute Center for Bioinformatics
OIL Ontology Inference Layer
OilEd Ontology editor allowing you to build ontologies using DAML+OIL
OLLT Obsolete Lower Level Terms
OMG Object Management Group
ORM Object Relational Mapping
PT Preferred Term
RDBMS Relational Database Management System
SDK Software Development Kit
Semantic connector A development kit to link model elements to NCICB EVS concepts.
SOC System Organ Class
SPORE Specialized Programs of Research
SQL Structured Query Language
SSC Special Search Categories
UI User Interface
UID User Identification
UML Unified Modeling Language
UMLS Unified Medical Language System
UPT User Provisioning Tool
URL Uniform Resource Locators
VD Value Domain
WAR Web Application Archive
WSDL Web Services Description Language
XMI XML Metadata Interchange (http://www.omg.org/technology/documents/formal/xmi.htm) - The main purpose of XMI is to enable easy interchange of metadata between modeling tools (based on the OMG-UML) and metadata repositories (OMG-MOF) in distributed heterogeneous environments
XML Extensible Markup Language (http://www.w3.org/TR/REC-xml/) - XML is a subset of Standard Generalized Markup Language (SGML). Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML

Index


It is very easy to generate an index and extremely helpful to the manual user. See the Technical Style Guide located at: https://cabig.nci.nih.gov/working_groups/Training_SLWG/Documents/Technical_Pubs_Style_Guide_021405_jbh.pdf for directions for inserting index markers and generating an index in MS Word (PC).

caGrid
Grid Service Owner, 41
model-driven, 5
researcher, 41
semantically-discoverable, 40
caGRID
architecture specifications for Globus, 41
XML schemas in, 50
Document conventions, Supplement, 3
Globus
architecture specifications, 41
serialization in caGRID, 50
Glossary, 75
Mapping
UML models to caDSR, 7
References
caBIG materials, 73
caCORE material, 73
scientific publications, 73
technical manuals, guides, 73
Semantic annotation
described, 7
Semantic Connector
described, 7
Text conventions, Supplement, 3
UML domain model
mapping to caDSR, 7
to metadata, 8
UML domain model loader, 7
XML schemas
creation in Enterprise Architect, 55
creation in XMI, 54
creation of, 54
custom serialization, 51
description, 50
editing tools, 58
Globus serialization, 51
namespace conventions, 58
namespace format, 58
object serialization, 50
overview, 50
role in caBIG, 50

Last edited by
Knowledge Center (1518 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence