Access Keys:
Skip to content (Access Key - 0)

Knowledgebase

Version 1 by Joe George
on Apr 10, 2012 14:37.

compared with
Current by Joe George
on Apr 19, 2012 14:38.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (8)

View Page History
{anchor:_Toc206153105}
----
h1. caGrid 1.45 Technical Overview
----

{cagridtoc:exclude=caGrid 1.45 Technical Overview}

h1. {anchor:_Toc206325221}Introduction
\\ !worddavd34e5b7f935db77eacadf22d61816334.png|height=456,width=530!\*Figure 3.\* GAARDS Security Infrastructure
The main components of GAARDS are: *Dorian* service for the provisioning and management of Grid users accounts. *Grid Trust Service (GTS)* for maintaining and provisioning a federated trust fabric consisting of trusted certificate authorities, allowing Grid services to make authentication decisions against the most recent information. *Grid Grouper* for a group-based authorization solution for the Grid. *Authentication Service* for issuing SAML assertions for existing credential providers so they may easily integrate with Dorian and other Grid credential providers. *Credential Delegation Service* for a client (the delegator) to express a delegation policy, entitling a prescribed collection of other grid entities (the delegates) to assume the delegator's identity for a limited time.
In order for users and applications to communicate with secure services, they need Grid credentials. Obtaining Grid credentials requires having a Grid User Account. GAARDS provides an account management and identity provider service, Dorian, and two mechanisms for registering for a Grid user account: 1) registering directly with [Dorian|dorian145:home] and 2) having an existing user account in another trusted security domain (e.g., a participating institution's security domain). In order to use an existing user account to obtain Grid credentials, the existing credential provider must be registered with GAARDS as a Trusted Identity Provider. Figure 3 illustrates an example process for obtaining Grid credentials. In this example, the user first authenticates with his/her institution's credential provider and obtains an [SAML|http://www.oasis-open.org/committees/security/] assertion as proof of authentication. The user then uses the SAML assertion provided to obtain Grid credentials from GAARDS. Assuming the institution's credential provider is registered with GAARDS as a trusted identity provider and the user's account is in good standing, GAARDS Dorian service will issue Grid credentials to the user. (*Note:* users with an account through Dorian can contact Dorian directly to obtain Grid credentials). After obtaining Grid credentials, users may invoke secure Grid operations.
Upon receiving Grid credentials from a user, a secure service authenticates the user to ensure the credentials are valid. Part of Grid authentication is verifying that Grid credentials are issued by a trusted Grid credential provider (e.g., Dorian). The [Grid Trust Service (GTS)|GTS:home] of GAARDS maintains the official list of trusted credential providers. This list is known as the "trust fabric". Credential providers are registered as trusted Certificate Authorities (CAs). Trusted CAs periodically publish updated information to the GTS. Grid services authenticate Grid credentials against the trusted CAs (shown in Figure 3).
Once the user has successfully authenticated, a secure Grid service can perform an authorization check to determine if a user is authorized to invoke requested service operations. It is important to note that all authorization decisions are made by the service itself, but GAARDS implements services and tools to support common authorization mechanisms. The GAARDS infrastructure provides two authorization options, which can each be used independently or together to implement authorization policies for a service. Other authorization mechanisms also can be employed in conjunction with GAARDS. The first authorization option is the [Grid Grouper|GridGrouper145:home] service. Grid Grouper provides a group-based authorization solution for the Grid whereby Grid services and applications enforce authorization policy based on a group membership check. The [caCORE Common Security Module (CSM)|http://ncicb.nci.nih.gov/infrastructure/cacore_overview/csm] supports centralized authorization checks. These checks are "centralized" because CSM is deployed specifically for a service that performs the authorization check. CSM policies are constructed by specifying read/write access to protected elements; Grid services using CSM defer authorization checks to CSM. Based on the access control policy maintained in CSM, it decides whether or not a user is authorized. Furthermore, Grid Grouper and CSM can be used together. For example, access control policies specified in CSM can be based on membership to groups in Grid Grouper.
In order to support Grid workflows (a workflow is a group of coordinated services that together provide a desired analysis or other end result), users require the ability to allow another user or service to perform work on their behalf. The [Credential Delegation Service (CDS)|CDS145:home] allows a user (the delegator) to express a delegation policy, entitling a prescribed collection of other grid entities (the delegates) to assume the delegator's identity for a limited time. With GAARDS, a user can login to Dorian and then invoke Grid workflows by delegating the user's credential to the CDS; services involved in the Grid workflow retrieve the user's credential to perform work for the user.

h1. {anchor:_Toc206325226}caGrid Architecture Components

caGrid Query Language (CQL) is a custom object-oriented query language. CQL provides a common query language for all data services deployed to the Grid. That is, all Grid queries are expressed in CQL and each caGrid-compliant data service is required to be able to consume CQL queries. CQL is designed to be simple so that service developers can easily implement specialized query processors for different types of backend databases. Use of a common query language across all data services in the environment facilitates federated query of multiple services. caGrid provides a federated query processing service. Federated Query Processor (FQP) takes DCQL (an extended form of CQL) as input to perform federated query. DCQL queries are broken into composite CQL queries and passed to individual data services.
Constructing a CQL query is straightforward and can be driven by a data service's domain model. First, a user creating a CQL query decides which class in the model they want to retrieve. Then the user specifies restrictions indicating which instances of that class (representing actual data stored in the backing datastore) the user wants to receive. For example, the user can target a "Person" class and specify a restriction requiring the "age" attribute to equal to "5". CQL supports association traversal so the user can compose a query that will restrict results based upon attributes in associated objects. An example is requiring the "Institution" class associated with the Person to have the name of "The Ohio State University". CQL was created to query the object model itself. The implementation takes care of performing a query on the back-end data store, which is hidden from the Grid. For additional examples of and further details about CQL, please see the developer wiki [here|dataservices145:Developers Guide].
caGrid provides higher level APIs (with Java binding) to create and submit queries. On the wire, a CQL query is expressed in an XML document conforming to a well-defined schema with the URL "http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery".


\\
caGrid supports the transfer of large amounts of data around the Grid. Currently, there are two options: 1) [GridFTP|http://dev.globus.org/wiki/GridFTP], and 2) the [caGrid Transfer|transfer145:Home] service. GridFTP is a high-performance FTP server extended to support authentication using X.509 certificates. It is both robust and offers excellent performance, but its installation is difficult (e.g., there is no Windows version of GridFTP). The Transfer service included in caGrid is much simpler to use and offers excellent performance. Transfer supports WS-Notification, allowing interested parties to receive updates on the status of data transfers. The Introduce Toolkit offers a caGrid Transfer extension that leverages the Transfer service.

h2. {anchor:_Toc206325238}Web Integration

\\
The Index Service is the white and yellow pages of the Grid. All services participating in a Grid should, but are not technically required to, advertise to the Index Service. Typically, there is one Index Service per Grid. However, Index Services can be linked to form a federation of Index Services. For the purposes of [Advertisement|metadata145:Advertisement] and [Discovery|metadata145:Discovery], caGrid leverages the Globus-provided Index Service. The Index Service implements the standard WS-ServiceGroup specification. When services are added to the service group, they specify what and how metadata should be accessed from them, and the Index Service performs this aggregation. Clients can then query this aggregated information using standard Resource Property operations. caGrid services are expected to maintain soft-state registration to a well-known Index Service instance, specifying polling of standard caGrid standard service metadata. Traversing (querying) an Index Service is performed via XPath query. Interested parties can subscribe to changes in the Index Service contents. An example subscription would be notification of new services advertising to the Index service. For more information on the Index Service, see the [Globus documentation|http://www.globus.org/toolkit/docs/4.0/info/]).

Service metadata typically advertises information about the deployed service (e.g., what organization deployed the service, where the service is located). Service metadata is represented as XML values stored simply as <key, value> pairs. Standardization and agreement upon what service metadata will be advertised is a key step in deploying services to the Grid. The common service metadata contains information about the service-providing cancer center, such as the point of contact for the service and the name of the institution providing the service. Data Services provide an additional "domain model" metadata, which details the domain model, including associations and inheritance information, from which the objects being exposed by the service are drawn. These metadata standards leverage the data models registered in caDSR and link them to the underlying semantic concepts registered in EVS. The common service metadata for analytical services details the objects, used as input and output of the services operations, using the same format as the data service metadata. In this way, all services fully define the domain objects they expose by referencing the data model registered in caDSR and identify their underlying semantic concepts by referencing the information in EVS. caGrid provides a series of high-level APIs and user applications for performing lookup on service metadata that greatly facilitate the discovery process. A Grid user can query the Grid to find services that, for example, use a Gene object as input. As an additional step, service metadata publishes the concepts used in the service to support service discovery by concept. The advertisement and discovery process is illustrated in Figure 2. A practical application of using service metadata is the caGrid portal: [http://cagrid-portal.nci.nih.gov/|http://cagrid-portal.nci.nih.gov/].
Last edited by
Joe George (395 days ago)
Adaptavist Theme Builder Powered by Atlassian Confluence