Access Keys:
Skip to content (Access Key - 0)

GME


GME 1.3 Design Guide


Contents


Introduction


Overview

Unable to render {include} Couldn't find a page to include called: gme:GME Overview

Purpose

The purpose of this document is to describe the architecture of the Global Model Exchange (GME) grid service. The information in this document is to help developers interested in extending or modifying the GME.

Definitions

  • XML Schema X may have associations to other XML Schemas; in the context of XML Schema X those associated XML Schemas are defined as:
    • "dependency schemas": are those schemas which schema X imports/includes
    • "depending schemas": are those schemas which import/include schema X
  • A "locally sound" schema is one which:
    • is well formed and valid (with respect to schema schema, etc)
    • all its dependency schemas are locally sound
  • A "locally unsound" schema is one which is not locally sound
  • A "globally sound" schema is one which:
    • is locally sound
    • all its depending schemas are locally sound
  • A "globally unsound" schema is one which is not globally sound

Relevant Requirements

  • The GME must...
    • enforce all published schemas adhere to the XML Schema Specification
    • enforce all XML Schemas have a "targetNamespace" ("no namespace" namespace is not allowed)
    • enforce all XML Schemas have a unique targetNamespace
    • enforce all XML Schemas are globally sound
    • support downloading an XML Schema and all of its dependencies
    • support XML Schemas comprised of multiple documents (i.e. xsd:redefine and xsd:include)
    • support XML Schemas with cyclic import dependencies
    • support storing and loading XML Schemas to and from the local file system and the GME

Observations

  • The previous GME enforced all schemas were globally sound, but its limited operations required it to only need to actively enforce they were locally sound.
    • That is, the only real schema-related modifying operation was addSchema and it required all dependency schemas to already be added and locally sound, so it just needed to check that the schema being added was valid. Once a schema was added, it could not be removed or modified, so a depending schema could never be "compromised", thus it was globally sound
    • Supporting schema deletion and modification can create a scenario where depending schemas become locally unsound (by for example, leaving a dependency missing or invalid)
      • Modifying a schema can break soundness by either making the schema itself locally unsound, or one or more of its depending schemas unsound
      • Deleting a schema can only break soundness by making one of its depending schemas unsound
      • Adding a (previously unpublished) schema can only create local unsoundness
  • To support schemas with cycles, we need to support "batch" (re)submission and deletion (as adding/deleting them individually would break soundness)

Implementation


Architecture

The GME is a stateless, single-resource grid service created with Introduce. As such, it follows the standard package implementation breakdown between, client, common, service, and resource. The GME also contains some additional packages which are specific to its implementation. The organization and purpose of the packages of GME are described in the table below.

Package in org.cagrid.gme Description
client The Introduce generated client package
common The Introduce generated common package, with additional utility APIs
domain The domain model classes used to exchange information between client and service
persistence The classes related to storing information in the persistence store
sax The classes used to implement custom SAX processing of XML Schemas
serialization The classes used to aid in the serialization and deserialization of the domain model classes
service The Introduce generated service package, with additional classes for the spring-loaded implementation and server-side domain model
stubs The Introduce/Axis generated protocol

The service-side implementation of the GME is a standard Introduce-generated service infrastructure. However, rather than the standard service "*Impl" class being used to implement the business logic, the GME leverages Spring's dependency injection capabilities to load and configure the business logic from a deploy-time specified configuration file. The default implementation makes use of Hibernate to create an object-relational persistence layer for storing and retrieving the XML Schemas it manages.

Domain Model

Model Overview

All operations of the GME work with the Class Model shown below, which is a representational model for describing XML Schemas and their dependencies.

The primary means by which an XML Schema is identified by external entities is its targetNamespace (see the XML Schema Specification). This is a URI, and is represented as such in the GME model when used as an attribute within a class (such as in the XMLSchema class), but is represented using the XMLSchemaNamespace when used by itself (such as when it is the input or output of an operation).

XML Schemas can be comprised of multiple actual "documents." As such, the GME model represents XML Schemas using the XMLSchema class, which consists of one or more XMLSchemaDocuments. An XMLSchema has a targetNamespace a "root" document (XMLSchemaDocument), and zero or more additional documents (XMLSchemaDocuments). Each XMLSchemaDocument has a unique (within an XMLSchema) identifier, represented by the systemID attribute (see the XML Specification), and the actual text of the document, represented by the schemaText attribute.

The XMLSchema, XMLSchemaDocument, and XMLSchemaNamespace are the only Classes the GME client uses to describe XML Schemas to the GME service. The GME service uses two additional Classes, XMLSchemaBundle and XMLSchemaImportInformation to describe, to the client, additional details about how XML Schemas relate to each other. The XMLSchemaImportInformation Class is used to describe the set of XML Schemas (identified via a set of XMLSchemaNamespace representing their targetNamespaces) imported by a given XML Schema (identified via its targetNamespace). The XMLSchemaBundle class conceptually represents a graph of XML Schemas. It is a complex data structure which contains a set of XMLSchema and a set of XMLSchemaImportInformation and utility methods to interrogate those sets, such as getImportedSchemasForTargetNamespace{*}.

Castor Serialization

The GME is able to provide a class model with complex data structures (such as using Java5 typed collections), and utility operations, as it leverages Castor for serialization and deserialization into custom, hand written Java Beans. The GME class model, schema, and serialization were all hand-written to allow fine-grain control over the structure and capabilities of the model. For example, many of the domain classes provide utility methods as opposed to just being "beans," and they all make use of complex data structures like typed-Maps for ease of use. Also of notable interest is the lack of coupling to Axis for things like URIs, and rather the use of Java's native implementation. This has been accomplished by using a custom castor handler (org.cagrid.gme.serialization.URIFieldHandler), as seen in the Castor mapping below.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapping PUBLIC "-//EXOLAB/Castor Object Mapping DTD Version 1.0//EN" "http://www.castor.org/mapping.dtd">
<mapping xmlns:gme="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain">
	<class name="org.cagrid.gme.domain.XMLSchemaDocument">
		<map-to xml="XMLSchemaDocument"
			ns-uri="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain"
			ns-prefix="gme" />
		<field name="systemID" type="string">
			<bind-xml name="systemID" node="attribute" />
		</field>
		<field name="schemaText" type="string">
			<bind-xml name="gme:schemaText" node="element" />
		</field>
	</class>
	<class name="org.cagrid.gme.domain.XMLSchema">
		<map-to xml="XMLSchema"
			ns-uri="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain"
			ns-prefix="gme" />
		<field name="targetNamespace" type="java.net.URI"
			handler="org.cagrid.gme.serialization.URIFieldHandler">
			<bind-xml name="targetNamespace" node="attribute" />
		</field>
		<field name="rootDocument" type="org.cagrid.gme.domain.XMLSchemaDocument">
			<bind-xml name="gme:XMLSchemaDocument" location="rootDocument" />
		</field>
		<field get-method="getAdditionalSchemaDocuments" set-method="setAdditionalSchemaDocuments"
			name="additionalDocuments" type="org.cagrid.gme.domain.XMLSchemaDocument"
			collection="set">
			<bind-xml name="gme:XMLSchemaDocument" location="additionalDocuments" />
		</field>
	</class>
	<class name="org.cagrid.gme.domain.XMLSchemaBundle">
		<map-to xml="XMLSchemaBundle"
			ns-uri="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain"
			ns-prefix="gme" />
		<field get-method="getXMLSchemas" set-method="setXMLSchemas"
			name="xmlSchemaCollection" type="org.cagrid.gme.domain.XMLSchema"
			collection="set">
			<bind-xml name="gme:XMLSchema" location="xmlSchemaCollection" />
		</field>
		<field get-method="getImportInformation" set-method="setImportInformation"
			name="importInformation" collection="set"
			type="org.cagrid.gme.domain.XMLSchemaImportInformation">
			<bind-xml name="gme:XMLSchemaImportInformation" location="importInformationCollection" />
		</field>
	</class>
	<class name="org.cagrid.gme.domain.XMLSchemaImportInformation">
		<map-to xml="XMLSchemaImportInformation"
			ns-uri="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain"
			ns-prefix="gme" />
		<field name="targetNamespace" type="org.cagrid.gme.domain.XMLSchemaNamespace"
			get-method="getTargetNamespace" set-method="setTargetNamespace">
			<bind-xml name="gme:XMLSchemaNamespace" />
		</field>
		<field name="imports" collection="set"
			type="org.cagrid.gme.domain.XMLSchemaNamespace" get-method="getImports"
			set-method="setImports">
			<bind-xml name="gme:XMLSchemaNamespace" location="imports" />
		</field>
	</class>
	<class name="org.cagrid.gme.domain.XMLSchemaNamespace">
		<map-to xml="XMLSchemaNamespace"
			ns-uri="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain"
			ns-prefix="gme" />
		<field name="uri" type="java.net.URI" get-method="getURI"
			set-method="setURI" handler="org.cagrid.gme.serialization.URIFieldHandler">
			<bind-xml name="uri" node="attribute" />
		</field>
	</class>
</mapping>

The corresponding XML Schema for representing this model in XML is shown below.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain"
	xmlns="gme://gme.cagrid.org/2.0/GlobalModelExchange/domain" xmlns:xs="http://www.w3.org/2001/XMLSchema"
	elementFormDefault="qualified" attributeFormDefault="unqualified">
	<xs:element name="XMLSchema" type="XMLSchema" />
	<xs:complexType name="XMLSchema">
		<xs:sequence>
			<xs:element name="rootDocument">
				<xs:complexType>
					<xs:sequence>
						<xs:element minOccurs="1" maxOccurs="1" ref="XMLSchemaDocument" />
					</xs:sequence>
				</xs:complexType>
			</xs:element>
			<xs:element name="additionalDocuments" minOccurs="0">
				<xs:complexType>
					<xs:sequence>
						<xs:element minOccurs="1" maxOccurs="unbounded" ref="XMLSchemaDocument" />
					</xs:sequence>
				</xs:complexType>
			</xs:element>
		</xs:sequence>
		<xs:attribute name="targetNamespace" type="xs:anyURI" />
	</xs:complexType>
	<xs:element name="XMLSchemaBundle" type="XMLSchemaBundle" />
	<xs:complexType name="XMLSchemaBundle">
		<xs:sequence>
			<xs:element name="xmlSchemaCollection" minOccurs="0"
				maxOccurs="1">
				<xs:complexType>
					<xs:sequence>
						<xs:element ref="XMLSchema" minOccurs="1" maxOccurs="unbounded" />
					</xs:sequence>
				</xs:complexType>
			</xs:element>
			<xs:element name="importInformationCollection" minOccurs="0"
				maxOccurs="1">
				<xs:complexType>
					<xs:sequence>
						<xs:element ref="XMLSchemaImportInformation" minOccurs="1"
							maxOccurs="unbounded" />
					</xs:sequence>
				</xs:complexType>
			</xs:element>
		</xs:sequence>
	</xs:complexType>
	<xs:element name="XMLSchemaDocument" type="XMLSchemaDocument" />
	<xs:complexType name="XMLSchemaDocument">
		<xs:sequence>
			<xs:element name="schemaText" type="xs:string" />
		</xs:sequence>
		<xs:attribute name="systemID" type="xs:string" />
	</xs:complexType>
	<xs:element name="XMLSchemaImportInformation" type="XMLSchemaImportInformation" />
	<xs:complexType name="XMLSchemaImportInformation">
		<xs:sequence>
			<xs:element ref="XMLSchemaNamespace" />
			<xs:element name="imports" minOccurs="0" maxOccurs="1">
				<xs:complexType>
					<xs:sequence>
						<xs:element ref="XMLSchemaNamespace" minOccurs="1"
							maxOccurs="unbounded" />
					</xs:sequence>
				</xs:complexType>
			</xs:element>
		</xs:sequence>
	</xs:complexType>
	<xs:element name="XMLSchemaNamespace" type="XMLSchemaNamespace" />
	<xs:complexType name="XMLSchemaNamespace">
		<xs:attribute name="uri" type="xs:anyURI" />
	</xs:complexType>
</xs:schema>

In order to make use of custom hand-written beans, the GME extends the Introduce-generated build process to compile its domain classes before the Axis generated "stubs" are compiled (traditionally the first step of an Introduce-generated service's compilation). This is necessary as the "stubs" reference the domain classes, and thus would not compile otherwise. This is accomplished by adding a custom ant target (compileDomain), which compiles the domain classes, as a dependency of the preCompileStubs Introduce-created ant target

Hibernate Annotations

The GME makes us of Hibernate to seamlessly store its domain model classes in a relational database. As the beans are hand-written (as opposed to being generated from the XML Schema), they are able to make use of Hibernate Annotations to configure their representation in the database. This feature, making use of Java5 annotations, greatly simplifies the configuration complexity of using Hibernate, increases the readability of the configuration, and reduces the chance of errors.

Note, while the use of annotations introduces hibernate specific imports into domain model classes, these are only needed at compile-time and do not introduce an undesired technology-specific dependency to the domain classes at runtime.

Similar to the custom serialization handling of URIs described in the Castor Serialization section, a custom Hibernate User Type, org.cagrid.gme.persistence.hibernate.types.URIUserType, is used to store URIs in the database as strings.

Service Components

The main service-side implementation classes of the GME are shown below. The service implementation class, GlobalModelExchangeImpl, reads the settings in the GlobalModelExchangeConfiguration class, and uses them to instantiate the GME class via Spring. The GME class implements the core business logic of the GME, and uses the XMLSchemaInformationDao and GMEXMLSchemaLoader to respectively interact with the persistence store (indirectly) and parse XML Schemas. These components are described in more detail in the following sections.

Spring

As described in the Spring overview, the Spring Framework consists of many modules. The GME currently makes use of the Core, ORM, and DAO modules.

The GME service implementation (GlobalModelExchangeImpl), uses Spring XMLBeanFactory to load and configure the GME implementation, using a configuration and property file which are specified via Introduce service properties. The Administrators Guide configuration section provides extensive documentation of this process. As such, the GME business logic can easily be modified by plugging into custom code or configuration settings without needing to modify any of the GME service's code or build/deploy process.

Within the default implementation, the GME make's use of Spring's declarative transaction management approach to describe transactional boundaries via Java5 annotations on the GME implementation class (org.cagrid.gme.service.GME). The GME also leverages heavily Springs Hibernate and DAO support as all interaction with the persistence store is done through the XMLSchemaInformationDao which extends from Spring's HibernateDaoSupport to implement most of the insert, update, delete, query functions automatically. This class is also responsible for controlling the "materialization" of the Hibernate lazy-loaded domain classes, as the Hibernate Session is closed prior to the serialization of the classes when returned to the client.

Hibernate

As described above, Hibernate is used as the persistence store for the GME and configured via Hibernate Annotations on the domain model. The use of Hibernate, and Spring's support for it, allows the GME business logic to be implemented purely based on the domain model without concerns for the underlying database. Full details of how Hibernate is configured, and the various configuration options, are found in the configuration section of the GME Administrators Guide. The default configuration uses MySQL as the database, and is configured to automatically create the database structure (provided the database is already created).

Schema Parsing

The GME leverages the XML schema processing capabilities of Apache Xerces to perform the complex logic of parsing and validating XML Schemas. The GME leverages the extensible capabilities of Xerces to install a custom SAX Entity Resolver (GMEEntityResolver)and SAX Error Handler (GMEErrorHandler) into the processing pipeline. The Entity Resolver GMEEntityResolver is passed the list of submission schemas to be parsed, and an instance of the XMLSchemaInformationDao to load schemas from the persistence store. It is called back by Xerces whenever an XML Schema reference such as an import, include, or redefine, is encountered, and it uses the appropriate logic to load the referenced schema from the submission package or the database, or generating an error if it couldn't be found. The GMEErrorHandler is simply responsible for capturing any such errors such that they can later be recovered and reported by the GME itself.

Algorithms

Submission Algorithm

Need for Algorithm

Basically what we need to do on submission is verify global soundness. We could theoretically check this by loading all new and existing schemas into a Schema Model and checking for errors, but that is obviously not efficient or even tractable, so we need an algorithm which can verify global soundness by only loading the bare minimum schemas required to do so. We could possibly just calculate and process the "connected" subgraphs to the submission package, but that may yet be intractable, and the calculation expensive (there is no need to scale with respect to total number of submitted schemas).

Algorithm Overview

View all schemas (those being submitted, and those existing) as a Directed Graph, where the schemas are the nodes and the vertices describe the imports/includes/redefines (from depending schema to dependency schema). The algorithm basically adds all new schemas to the Schema Model and recursively walks "upstream (opposite direction of vertices)," adding each depending schema to the model. We make use of the SAX entity resolver callbacks to recursively load all dependency schemas (and their dependency schemas) for each schema we added by walking upstream. The complete collection of schemas is then checked for errors (soundness).

Pseudo Code

  1. Check permissions on each schema being published; fail if don't have permissions
  2. Check that all schemas being published are either not yet published, or are in a state where the contents can be updated ; fail otherwise
  3. Create a list of "processed schemas"
  4. Create an SchemaGrammar model with error and entity resolver (on callback to imports, entity resolver needs to first load schema from submission if present, if not in submission load from DB, if not in DB error out)
  5. Call addSchema() for each schema being uploaded
  6. (this is a reused method, call it addSchema())
    1. Add the schema to the "processed schemas" list
    2. Add the schema to the model which will recursively fire callbacks to the entity resolver for all imports (loading all dependency schemas, and their dependency schemas, etc)
    3. Look in the DB for depending schemas (will only be present if schema was already published and is being updated)
    4. For each depending schema not in the list of "processed schemas" call addSchema()
  7. Call getResults() on the parsed model,; fail if errors
  8. Commit new/modified schemas to database, populating dependency schema information (gathered from imports)

Observations

  • When each new schema is added/modified, we recursively add the complete depending schema tree(s) (each depending schema's depending schemas, etc)
  • Whenever a schema is added to the model (either because it's being submitted, or because it's in the depending tree), the complete tree of all its dependency schemas are automatically added to the model (by our entity resolver).
  • No depending schemas of dependency schemas are added to model (unless they are already present for other reasons). For example, consider schemas A,B, and C, where both A and B import C (which is stand alone). If we update A or B, we load C as a dependency schema but don't load C's other depending schema. If we update C, we load both A and B as depending schemas.
  • This works because the entity resolver is only responsible for loading all dependency schemas, and the submission algorithm is only responsible for handling all depending schemas. If this wasn't the case, the graph wouldn't be traversed properly. This fact, and the "processed schemas" list, allow for proper handling of dependency cycles.

Deletion Algorithm

To insure global soundness, a schema can only be deleted if everything that depends on it is also deleted. The import information in the persistent store can be relied upon to check this, so no schema parsing is done at deletion time.

Pseudo Code

  1. Check permissions on each schema being deleted; fail if don't have permissions
  2. For each schema to be deleted:
    1. Get the list of depending schemas
    2. If the list contains any schemas which are not in the list of schemas to delete; fail.
  3. Commit schema deletes to database

Observations

  • While the list of depending schemas is critical to the insert/update/delete functioning, it's not actually persisted, but is computed using the "depends on" information. This allows deletion to simply delete a schema and the information about its imports, and the dependency graph is automatically updated.

Client API

The GME Client API is a standard Introduce-generated client API, which mirrors the operations of the service, the details of which can be found in the relevant section of the GME Developers Guide. In addition to the Introduce-generated methods, the client also provides a convenience method cacheSchemas which makes use of the FileSystemCacher utility API to save a collection of schemas to the file system.

The design and use of the client are otherwise just as described in the Introduce documentation.

Utility APIs

The GME provides some utility APIs provided in the common package for the purpose of facilitating use of the GME service operations. These classes, shown below, provide utilities to help saving schemas from the GME to the filesystem, loading schemas from the filesystem to publish them to the GME, and transforming GME domain objects to and from XML.

The FileSystemProcessor is a common base class for the FilesystemCacher and FilesystemLoader which respectively facilitate saving a schema bundle to the file system, and loading schemas from the file system into appropriate representations in the GME domain model for the purposes of publishing them to the GME. Both of these utilities are used by the user interface components. The XSDUtil class is a lower-level utility, also used by the FilesystemLoader, which provides convenience methods for creating GME domain objects (XMLSchema and XMLSchemaDocument) from schema documents on the local file system. The SerializationUtils class provides methods for writing and reading XMLSchema, XMLSchemaBundle, and XMLSchemaImportInformation to and from XML.

User Interfaces

The GME user interface components are developed in the "globalModelExchange-ui" project, not within the GME service project. All components are developed as Introduce Extensions, such that Introduce users can leverage them, rather than needing a separate application. The components are shown below in the class diagram below.

The GMEViewer and the GMETypeSelectionComponent are the main classes, which respectively provide the browsing/upload and schema discovery capabilities. Both leverage the GMESchemaLocator panel, which provides general capability to browse schemas in the GME.

For information on how to use the user interface components, see the relevant section in the GME Developers Guide.

Testing


The GME has an extensive suite of unit and system tests. The unit tests live within the GME project itself (globalModelExchange), but the system tests are within the caGrid system tests module (tests), which is not distributed with the release, in the globalModelExchangeTests project. Both sets of tests run automatically on a nightly basis on the caGrid Quality Dashboard.

Unit Tests

The GME currently has a large collection of unit tests that test the following aspects.

Test Package Description
org.cagrid.gme Tests the codebase for cyclic package dependencies
org.cagrid.gme.domain Tests the functionality of the utility operations of the domain model
org.cagrid.gme.metadata Tests that the grid service's metadata validates to the standard schema
org.cagrid.gme.persistance.test Tests that the Hibernate Annotations are correct
org.cagrid.gme.serialization Tests the "round-trip" correctness of the serialization utilities
org.cagrid.gme.service Contains numerous test cases which leverage Spring Mock's capability to simulate a deployed GME service. These tests fully exercise the service's behavior for a variety of XML Schemas using such features as xsd:include, xsd:redefine, cyclic schema imports, schemas with errors, usage errors, and the correct functionality of publish/delete/retrieve for complex schema heirarchies
org.cagrid.gme.test Testing utilities and test base classes
org.cagrid.gme.xerces Tests the GME SAX handlers for a variety of correct and incorrect schema documents

These tests can be run locally by typing the following command from the globalModelExchange project:

 > ant test 

This will create a local mysql database, run every unit test, and generate a test report in the test/logs/junit/report directory. An example report is shown below.

System Tests

The GME currently has a single comprehensive system/integration test. It makes use of the complex "real world" collection of 14 caArray schemas, which have numerous interdependencies. The test case exercises the following steps:

  1. sets up a Tomcat container
  2. creates a database
  3. configures the GME to use it
  4. deploys the GME service to the container
  5. verifies there are no published schemas
  6. publishes the caArray schemas
  7. validates the published namespaces are correct
  8. validates each schema can be successfully downloaded
  9. validates each individual schema and its own dependencies can be downloaded
  10. validates each individual schema can be successfully cached to the filesystem and those schemas are valid
  11. deletes the schemas
  12. validates there are no published namespaces
  13. stop tomcat
  14. deletes tomcat
Last edited by
Sarah Honacki (1416 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence