XML is used in diverse knowledge domains and is a popular format for representing data on the internet, whose use range from data storage to transmission. XML Schema is a W3C standard that can be used to describe and validate XML documents. Documents in XML format can be stored in native XML databases such as Oracle Berkeley DB XML, eXist, DB2 pureXML, Oracle XMLDB, etc. The XML Data Service Framework - xService is a generalized caGrid extension that allows grid-enabling native XML databases and rapid and flexible creation of caGrid Data Services from existing XML Schemas. The development of xService was motivated by the need to store cardiovascular research data (e.g. ECG measurements) represented with XML schemas. It was initially developed in the CardioVascular Research Grid (CVRG) as a middleware component to address this requirement. xService has not only been used to generate a significant number of data services for the CVRG, including an HL7 Annotated ECG data service, but also been employed to implement the caBIG In Vivo Imaging Annotation and Image Markup (AIM) data service for radiology annotations. A xService-based data service has also been developed for storage and management of Radiation Treatment Oncology Group's radiation treatment planning data files.
The caGrid middleware provides the ability to expose remote resources over the web as grid services. The service creation process is facilitated through the caGrid Introduce Toolkit, which provides a graphical interface for the specification of service properties and operations, as well as providing service code generation capability. The XML Data Service Extension adds to the caGrid Introduce capability by providing a customized user interface, as well as generic logical implementation, to simplify the creation of XML Data Services. These services provide the interface layer for remote access to data in native XML databases.
The Introduce XML Data Service extension facilitates the creation of an XML based data service by providing a wizard interface. The user specifies an XML schema, selects a Java bean binding for serialization to and deserialization from XML, and specifies the XML database collection for XML document storage. The extension then creates the caGrid-based XML data service. The generated XML data service contains the standard caGrid data service "query" operation, as well as an "upload" operation. The service translates the caGrid CQL query language into XPath, which is executed against the backend XML database engine. The results are returned to the client as XML documents, which may be optionally deserialized into Java beans.
Significant effort has been made to ensure that the XML data service is high performance, memory efficient, robust, and scalable to large number of concurrent users. Compression and streaming has been added, and data instance level security can be optionally turned on for the data services.