Table of Contents
caBIG aims to bring together disparate data and analytic resources into a "World Wide Web of cancer research." This will be achieved through common standards and software frameworks for the federation of these resources into "grid" services. Many of the tasks in the collection and analysis of cancer-related data on the grid involve the use of workflow. Here, we define workflow as the connecting of services to solve a problem that each individual service could not solve. caGrid implements workflow by providing a grid service for submitting and running workflows that are composed of other grid services. The next chapters describe the architecture and APIs for interacting with caGrid workflow.
Taverna, a part of myGrid project, is an application that helps in building and executing workflows to the users who are not necessarily experts in web services and programming. It provides access to a range of services with programmatic interfaces, primarily the molecular biology tools and databases available on the web, especially as web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis.
The Taverna Workbench allows users to construct complex analysis workflows from components located on both remote and local machines, run these workflows on their own data and visualise the results. To support this core functionality it also allows various operations on the components themselves such as discovery and description and the selection of personalised libraries of components previously discovered to be useful to a particular application. Aside from creating and editing, the most useful thing to be able to do with a workflow is to run it. The Taverna workbench provides an enactment engine for scientific workflows expressed in Scufl (Simple Conceptual Unified Flow language).The workflows can be run from the GUI provided by the Taverna workbench or they can executed programmatically. Since Taverna version 1.4 it has been possible to execute workflow through the WorkflowLauncher helper class. The WorkflowLauncher class provides API that can used to programmatically execute a workflow.
Scufl (Simple Conceptual Unified Flow Language) is a simple workflow language, with a focus on bioinformatics, that has been developed on the Taverna project involving myGrid, in collaboration with the European Bioinformatics Institute and the Human Genome Mapping Project. The Workflow Service described in this document uses a Scufl based workflow definition file as an input.
The Scufl based workflow description file above is represented in the Figure below:
The Workflow component leverages the same infrastructure stack as the caGrid toolkit (GT4, Tomcat, Java, Ant, and Introduce) with the addition of the Taverna workflow engine. The TavernWorkflowService is a standard Introduce-built grid service that allows a workflow instance to be created from a workflow definition file. An EPR is returned to a WorkflowManagementService resource that can be used to start, cancel, and destroy the created workflow. The WorkflowManagementService is layered on top of the Taverna Workflow Engine, which provides the primary functionality for running the Taverna-defined workflows. See Figure 2 for an overview of this architecture.
The following actions are performed when a user invokes start on the workflow management service:
- Read the workflow definition file.
- The input arguments to the workflow are declared as an array of xsd:any. They are parsed and cast to the types that they are meant to be.
- Read the input parameters.
- Initiate a workflow launcher – that is, load the workflow definition file.
Where args is the array holding the input arguments.
- Invoke the workflow (synchronous) and get output
- The admin service responds with deployment summary reporting success if the workflow is deployed successfully.
- To start the workflow, a message is sent to the receiving partnerLink in the workflow.
- After the workflow successfully executes the results are returned to the client app/user by a call to getWorkflowOutput