|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Comment:
Changes (1)
View Page Historyh1. Getting Started with caGrid 1.5
----
{cagridroundpanel}
{pre:class=cagridheaderfont}Table of Contents{pre}
{toc:outline=true|exclude=Getting Started with caGrid 1.5|style=none}
{cagridroundpanel}\\
h1. Glossary
----
This guide uses a number of terms that may be unfamiliar to readers. As you read, please refer back to this section for definitions of unfamiliar terms.
* Middleware: {include:Glossary:Middleware}
* Federation: {include:Glossary:Federation}
* Grid computing: {include:Glossary:Grid Computing}
* Service Oriented Architecture (SOA): {include:Glossary:Service Oriented Architecture (SOA)}
* Model Driven Architecture (MDA): {include:Glossary:Model Driven Architecture (MDA)}
* Analytical resource (and analytical service): {include:Glossary:Analytical Resource}
* Data resource (and data service): {include:Glossary:Data Resource}
* Administration and Security domain: {include:Glossary:Administration and Security domain}
* Common Data Elements: {include:Glossary:Common Data Elements}
* Controlled Vocabularies: {include:Glossary:Controlled Vocabularies}
* Harmonization: {include:Glossary:Harmonization}
* caCORE SDK: {include:Glossary:caCORE SDK}
* CQL: {include:Glossary:CQL}
* Index Service: {include:Glossary:Index Service}
* GAARDS: {include:Glossary:GAARDS}
* Dorian: {include:Glossary:Dorian}
For other terms, please view the [Glossary|Glossary:Home].
h1. Introduction
----
This guide provides an introduction to developing software using caGrid. It is targeted to software developers who are just getting started with caGrid and want to learn how to use caGrid to develop and deploy Grid services. You do not need to be familiar with the concepts of Grid computing, Service Oriented Architecture, or Model Driven Architecture. This document demonstrates the basics of caGrid and provides links to tutorials for hands-on experience developing Grid services. This guide also provides suggestions of additional technical information for those readers who are interested in the design and implementation of caGrid.
This guide is part of a more complete introduction to caGrid 1.5 as outlined in the [caGrid 1.5 Quick Start|downloads:caGrid 1.5 Installation Quick Start].
h1. caGrid in a Nutshell
----
caGrid is middleware designed to facilitate secure and federated access to information and analytical resources in a multi-institutional environment. Typically, resources available in this environment have been developed by independent groups. caGrid provides tools, libraries, and runtime support for: 1) resource providers to implement and deploy their analytical and data resources as secure, interoperable services and 2) resource consumers to discover available resources and use them (e.g., submit queries to multiple data sources and retrieve the query results).
\\
caGrid is designed to solve the problem of sharing data and analytical resources in an environment where resources are hosted by multiple organizations and located in multiple administrative and security domains. In addition, caGrid works just as well within a single institution, providing the tools required to share data seamlessly across departments. For example, a research project may require integrative analysis of microarray, imaging, and clinical data. These datasets may be collected by different entities, such as shared resources and medical information warehouses, and may not be stored in a centralized system. caGrid can be used to create a "virtually centralized" data warehouse of such datasets. Each dataset is managed by the respective owner but is integrated as a virtually centralized data warehouse using caGrid service interfaces and tools so that a researcher can access data from any of those datasets through a common interface.
\\
Authentication and authorization controls can be used to limit access to the datasets. A key benefit of using caGrid is that caGrid makes it easy to evolve from sharing data within an institution to sharing data with external collaborators. In most cases, no new software needs to be deployed. Resources can be shared both within an institution and with external collaborators simply by changing the security access restrictions.
h1. caGrid Employs Grid Computing, is Service-Oriented, and Draws from Model Driven Architecture
----
h2. Grid Computing
caGrid employs a Grid computing model. Grid computing refers to the notion of using distributed resources hosted at multiple institutions to solve large-scale, challenging problems in science and engineering. It was initially conceived as a mechanism to enable remote access to computational and storage machines across the administrative boundaries of supercomputer centers in order to solve large-scale, compute-intensive scientific and engineering problems. Over the years it has evolved into a platform made up of standards, tools, and middleware infrastructures for sharing data and analytical resources as well as computation and storage systems.
\\
At its foundation, caGrid employs the basic principles of Grid computing and existing Grid computing tools, more specifically the [Globus Toolkit|http://www.globus.org], to enable access to remote and disparate data and analytical resources. As a user of caGrid, you will likely not need to know the details of Grid computing and Grid computing tools. These details are hidden from the caGrid user by higher level tools and middleware components provided by the core infrastructure. For the purposes of getting started with caGrid, it suffices to say that by using caGrid one can create an environment where resources are located at multiple institutions but can be accessed securely across institutional boundaries. Such an environment is referred to as a "Grid".
\\
h2. Service-Oriented
caGrid is a service-oriented system. In a service-oriented system, each resource is made available to the (Grid) environment as a service. A service wraps the functionality of the resources in a set of well-defined interfaces. These interfaces, and the associated client side application programming interfaces, are used by client applications to interact with the resource. For example, a Gene expression database, stored in a relational database system, may be wrapped as a service with two operations: query and insert. The query operation allows a client program to issue queries for the Gene data. The insert operation can be used to insert data into the database. With a service-oriented interface, the client program does not directly interact with the relational database system. Note that by providing a service interface, a service developer can change the implementation (hidden to the user). For example, a service developer can upgrade the service to use multiple threads in response to tighter performance requirements.
Most SOA systems employ Web Services technologies as the underlying platform. Web Services provides access to services via standard web protocols. caGrid uses the [Web Services Resource Framework (WSRF)|http://www.globus.org/wsrf/] standards.
\\
The WSRF draws from the Web Services standards but extends them with such concepts as stateful services, service lifetime, service context, etc. These extensions enables the implementation of more efficient and richer services for scientific application scenarios. The caGrid infrastructure provides the Introduce toolkit for service providers to easily implement service stubs and service interfaces for their resources. The Introduce toolkit also provides support for client application developers to interact with remote services using high-level Java language APIs. You can find more information about Service-Oriented Architecture, Grid Computing, and the WSRF standards in the following references:
* Service-Oriented Architecture: [http://en.wikipedia.org/wiki/Service-oriented_architecture]
* Grid Computing: [http://www.globus.org/alliance/publications/papers/anatomy.pdf]
* Web Services Resource Framework (WSRF): [http://www.globus.org/wsrf]
h2. Model Driven Architecture
caGrid draws from Model Driven Architecture. The model driven architecture (MDA) paradigm has gained popularity in recent years. This paradigm promotes the use of object-oriented design practices and rich metadata in order to facilitate implementation of interoperable systems. caGrid adopts a Model Driven Architecture approach to enable interoperability through object-oriented abstractions, common data elements, and controlled vocabularies. That is, client and service APIs in caGrid are object-oriented. These objects, in turn, are defined using common data elements and controlled vocabularies registered on the Grid. For example, the names of an object's fields are terms from the controlled vocabularies. In addition, the type of a field (Integer, String, etc.) matches the type specified in a common data element. The benefit of this approach is that resources are defined in one location (the vocabulary or common data element) and used to generate all Grid artifacts, preventing any issues with re-modeling (the same) data at each Grid layer. A caGrid data service abstracts data as objects. Similarly, an analytical resource (e.g., an analysis program) implemented as a caGrid analytical service provides methods that input objects and return objects.
\\
While the caGrid infrastructure builds on several complex frameworks and standards, caGrid provides a suite of high-level tools and graphical user interfaces that make it easy to use. Most of the details of the underlying standards and frameworks and lower level middleware tools are hidden from the user. These tools and GUIs are covered extensively in caGrid tutorials, presented next.
h1. Hands On with caGrid
----
So far we have provided introductory background information on caGrid. It is now time to start developing with caGrid. In the following section we will outline some of the basic steps to start using caGrid and provide links to relevant tutorials.
h2. Download and Install caGrid
As mentioned above, this Getting Started guide is part of a larger quick start. Take a moment to refer to the beginning steps of the [downloads:caGrid 1.4 Installation Quick Start] if you have not yet installed caGrid.
h2. Create Services Using caGrid
There are several [tutorials|caGrid14:Tutorials] available for caGrid 1.4. With these tutorials, you can begin service development and also explore advanced features of the Introduce service development toolkit.
h1. Next Steps
----
Congratulations, you have completed a set of tutorials directing you through the major caGrid components and are now a caGrid Service Developer!
h2. Additional Reading
For more detailed technical information on caGrid, please read the [caGrid Technical Overview|caGrid 1.45 Technical Overview].
h2. Continue Exploring caGrid
From here on, you will determine what you'll need from caGrid to build more complete Grid applications. A good place to start looking for information is in the [caGrid 1.4 documentation|cagrid14:Home]. Also peruse the [knowledgebase|knowledgebase14:Home] for articles on effectively developing with caGrid, troubleshooting, and more.
h2. Community Projects
We encourage you to explore [Community Projects|projects:Home]. You will find information on many projects to help you accomplish your goals, including projects that are included in the official caGrid release.
h2. Learn about Others Who Use caGrid
Learn about [Grid Communities|community:Home] that use caGrid. The Community Training Grid is a caGrid deployment specifically provided for you to develop and test services without worrying about impacting production Grids.
h2. User Support
The larger caGrid user community is available to help you use caGrid. Learn more about [support resources|support:Home].





