Workflow Modeling using Taverna 2.1.2
| |
|
|
| |
Contents |
|
| |
|
|
A Taverna workflow is made up of a combination of:
- Input and Output;
- Activities;
- XML splitters, which aggregate/split the input/output data for the activities;
- Data links; and
- Control links.
The purpose of this section is to use a sample workflow to demonstrate the workflow modeling steps needed to create a workflow in Taverna 2.1.2. The sample workflow is made up of the following items:
- Two activities - queryProject and queryClass
- One input - cqlClause
- One output port - classInformation
- Some XML splitters to process the input/output.
- A beanshell script activity to transform the output of queryProject to the input of queryClass.
| NOTE: This guide does not intend to discuss workflow concepts in detail, only to provide information on how to model a workflow for caGrid through the Taverna 2.1.2 interface. Users who need more information on Taverna workflow components are encouraged to review the Taverna Users Manual |
Intended Purpose of the Sample Workflow
The purpose of the workflow built in this section is as follows: Step 1, use a CQL clause to query the CaDSRDataService to get a list of projects related to a context 'caBIG'; and Step 2, use the first project object retrieved in Step 1 and find all of the packages in that project.
| NOTE: CQL is the query language used by caGrid to access its data services. Conceptually it is similar to SQL, which is used to access data from a relational database. For more information regarding CQL, see CQL. |
Because the input and output data of these two steps do not fit exactly, we must add a beanshell activity to transform the output of queryProject into properly formatted input for the queryClass activity. In addition, we add several XML splitter activities that can help compose complex XML elements or extract child elements automatically.
In the figures shown for the example, the different types of items in the workflow are indicated with different colors. The XML splitters are shown in purple, the activities (which represent caGrid services) are shown in green, the beanshell activities are shown in yellow, and the input/output ports are shown in blue.
Adding an Activity
The first step for creating our sample workflow is to add a new activity into an empty workflow. Here we assume users have retrieved the services to be added, as described in the previous section.
To add an activity to a workflow:
- In the Taverna Workbench window, under the Available services node, double-click the service you want to query to open the list of available operations for that service.
- Find the operation you want to use in your workflow, and click-and-drag the operation into the Workflow diagram pane (right side) of the Taverna Workbench window.
Once an operation appears in the Workflow diagram pane, you can click the Display all processor ports button at the top of the diagram pane to see all of the input/output ports for the activity. You can also right click the activity within the pane and use the Rename Processor option to rename the activity if appropriate. In our example, we have renamed the query activity to queryProject in order to be more descriptive of the activity.

The figure above shows the addition of the queryProject activity into the workflow. We have also selected to Display all processor ports for the activity.
As the figure shows, the queryProject activity comes from a WSRF service, which means there is an additional EndPointReference port.
| NOTE: First-time users of the workflow functionality can ignore this additional port; it will not influence the invocation of this operation. |
Adding an XML Splitter
In Taverna, while it is possible to directly provide the XML data needed by WSDL services, some users may find that some XML data elements are too verbose to handle.
Taverna provides XML splitters, which interrogate the data structure and present the user with the internal data elements. One XML splitter can be used to resolve the input XML data structure at a single level, so multiple splitters might be needed if the XML data contains multiple-level complex types.
For example, the XML element parameters of the input for the queryProject activity contains a <CQLQuery> node as its sub-element. By adding two consecutive XML splitters in the input port of the queryProject activity, the user can directly input the CQL clause value for the element <CQLQuery/>.
To add an XML splitter to the workflow:
1. In the Workflow diagram pane, click on the appropriate activity to display the activity details in the Details tab located at the bottom of the left pane of the Workbench window.
| NOTE: The bottom left pane of the Workbench window is also referred to as the "contextual view" as it provides the context for the workflow diagram located in the diagram pane of the window. |
2. Click Add input XML splitter, located at the bottom of the contextual view Details tab.

3. In the Add output XML splitter dialog box that appears, select the input port to which the XML splitter is to be added.

4. Click OK. The XML splitter appears in the Workflow diagram pane of the Workbench window.

In our example, the XML splitter is added to the queryProject activity, and once added, appears in the workflow drawing area with a data link to the activity. Using the same process, we then added another XML splitter on top of the newly added XML splitter.

This finishes the addition of activities for Step 1.
Again using the same processes as detailed above, we added a second activity to our workflow by dragging the query operation available from the CaDSRDataService to the Workflow diagram window. We renamed the activity queryClass to better describe our intentions for the operation. We then added the same two XML splitters as were added to the queryProject activity.
When finished, our Workflow diagram appears as shown in the figure below.

Our purpose for these actions is to connect the output of queryProject to the input of queryClass. Ultimately the goal of our workflow is to first retrieve a project from CaDSR, and then retrieve all of the UML classes from the project. However, the output format of queryProject does not exactly fit the input of queryClass. Taverna provides the beanshell activity for users to embed a snippet of Java program in order to do customized data transformation. A beanshell activity can be added from the Service Templates node under the Available services list, using the same method used to add the CaGrid activities detailed above.
In the figure below, a beanshell activity with the name Beanshell is added to the workflow.

Once the beanshell activity is added, we must configure it. To do this, we add an input port (in) and an output port, respectively.
To configure a beanshell activity:
1. Choose the beanshell activity in the workflow diagram and then click Edit beanshell script located at the bottom of the Details tab of the contextual view.

2. In the Workflow Beanshell dialog box that appears, click the Input ports tab to activate it, and enter a Name for the port you are creating. In our example we named the Input port in.

3. Click Apply.
4. Click the Output ports tab to activate it, and enter a Name for the port you are creating. In our example we named the Output port out.

5. Click Apply.
6. Click the Script tab to activate it. This tab allows you to add the Java code necessary to transform data.

7. When finished adding your code, click Apply.
The code snippet we used for our example is provided below. The purpose of our sample is to extract the project information and build a query clause to get class information. You may copy and paste the snippet to use it if desired.
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
out ="";
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
DocumentBuilder docBuilder;
try {
docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc1 = docBuilder.parse( new InputSource(new StringReader(in)));
Element projectsEL = doc1.getDocumentElement();
XPath xpath1 = XPathFactory.newInstance().newXPath();
String findProjectName =
"//*[local-name()='Project' and namespace-uri()='gme://caDSR.caDSR/4.0/gov.nih.nci.cadsr.umlproject.domain']/@longName";
NodeList projectNode = (NodeList) xpath1.evaluate(findProjectName, projectsEL, XPathConstants.NODESET);
//System.out.println(projectNode.getTextContent());
out ="";
//out = "<CQLQuery xmlns=\"http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery\"><Target name=\"gov.nih.nci.cadsr.umlproject.domain.UMLClassMetadata\"><Association roleName=\"project\" name=\"gov.nih.nci.cadsr.umlproject.domain.Project\"><Group logicRelation=\"AND\"><Attribute name=\"shortName\" value=\""+ projectNode.getTextContent() +"\" predicate=\"EQUAL_TO\"/></Group></Association></Target></CQLQuery>";
System.out.println(projectNode.getLength());
for(int i=0;i<projectNode.getLength()&&i<2;i++){
Node n = projectNode.item(i);
System.out.println(n.getTextContent());
out = "<CQLQuery xmlns=\"http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery\"><Target name=\"gov.nih.nci.cadsr.umlproject.domain.UMLClassMetadata\"><Association roleName=\"project\" name=\"gov.nih.nci.cadsr.umlproject.domain.Project\"><Attribute name=\"longName\" value=\""+ n.getTextContent() +"\" predicate=\"EQUAL_TO\"/></Association></Target></CQLQuery>";
}
System.out.println(out);
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (XPathExpressionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Adding a Data Link
Data links exist between workflow inputs, activities, and workflow outputs. For example, a data link between activity A and B feeds the output from activity A to the input of activity B.
As shown with the addition of the XML splitters in the last section, you can have many data links in your workflow, Data links are added automatically when you add XML splitters for activities. However data links can also be added manually.
For example, In order to feed the output of queryProjects (port parameters) to the input (port in) of Beanshell, we can draw a line between them to add the appropriate data link between the two items.

Similarly, we can add another data link between the output of Beanshell (port out) to the input of queryClass_cqlQuery (port CQLQuery). The figure below shows the completion of these links. Notice that by adding these two data links, we connect our Beanshell activity with other activities in the workflow.

Adding a Control Link
Control links represent the control flow between activities. Control links allow you to control the order in which activities with no data dependency between them are executed. When implemented, control links require that the target activity of a control link cannot start until the source activity completes.
To add a control link:
- Right click on the activity at the end of the link.
- Select Run after from the shortcut menu, and then select the controlling activity from the list.
Adding Input/Output Ports
Workflow input nodes and output nodes are used to create workflow inputs and outputs respectively.
To create input and output nodes, right-click anywhere in the Workflow diagram pane to see the Workflow Input port and Workflow Output port selections available on the shortcut menu.

Once the input and output nodes are created, you can connect them to workflow activities or xml splitters by adding data links between them. The sample workflow shown in the next section below shows our sample with an input and an output node added.
The Sample Workflow
When completed, our sample workflow appears as shown in the figure below.

In our sample, we added an input node, cqlClause, to help the user input the CQL clause to query all caBIG related projects in CaDSRDataService; We also created an output node, classInformation, to store the class information obtained through the workflow.





