Frequently Asked Questions: caBIG-related Questions
caBIG-related Questions
What is caBIG?
There is a nice overview here
, and you can find out more on the caBIG website![]()
What caBIG tools are caGrid compatible?
A growing list of tools is maintained on the caBIG website, here https://cabig.nci.nih.gov/tools![]()
When trying to create or access service metadata, or information from the caDSR grid service, I get an error like "is not a valid XML character", what is the problem?
You might see something like:
java.lang.IllegalArgumentException: The char '0x19' in 'java.io.IOException: java.lang.IllegalArgumentException: The char '0x19' in 'A human being who may be assigned to a study. The study specifies how the subject?s illness will be treated and/or monitored overtime.' is not a valid XML character.' is not a valid XML character. [java] org.apache.axis.components.encoding.AbstractXMLEncoder.encode(AbstractXMLEncoder.java:110) [java] org.apache.axis.utils.XMLUtils.xmlEncodeString(XMLUtils.java:131) [java] org.apache.axis.AxisFault.dumpToString(AxisFault.java:366) [java] org.apache.axis.AxisFault.printStackTrace(AxisFault.java:796) [java] org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:50) [java] org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:333) [java] org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:295) [java] org.apache.log4j.WriterAppender.append(WriterAppender.java:150) [java] org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:221) [java] org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:57) [java] org.apache.log4j.Category.callAppenders(Category.java:187) [java] org.apache.log4j.Category.forcedLog(Category.java:372) [java] org.apache.log4j.Category.log(Category.java:864) [java] org.apache.commons.logging.impl.Log4JLogger.error(Log4JLogger.java:192) [java] org.apache.axis.Message.writeTo(Message.java:525) [java] org.apache.axis.transport.http.AxisServlet.sendResponse(AxisServlet.java:895) [java] org.apache.axis.transport.http.AxisServlet.doPost(AxisServlet.java:767) [java] javax.servlet.http.HttpServlet.service(HttpServlet.java:709) [java] org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:327) [java] javax.servlet.http.HttpServlet.service(HttpServlet.java:802) [java] at org.apache.axis.components.encoding.AbstractXMLEncoder.encode(AbstractXMLEncoder.java:110) [java] at org.apache.axis.utils.XMLUtils.xmlEncodeString(XMLUtils.java:131) [java] at org.apache.axis.utils.DOM2Writer.normalize(DOM2Writer.java:344) [java] at org.apache.axis.utils.DOM2Writer.print(DOM2Writer.java:246) [java] at org.apache.axis.utils.DOM2Writer.print(DOM2Writer.java:208) [java] at org.apache.axis.utils.DOM2Writer.serializeAsXML(DOM2Writer.java:77) [java] at org.apache.axis.utils.DOM2Writer.serializeAsXML(DOM2Writer.java:60) [java] at org.apache.axis.utils.DOM2Writer.nodeToString(DOM2Writer.java:49) [java] at org.apache.axis.utils.XMLUtils.privateElementToString(XMLUtils.java:426) [java] at org.apache.axis.utils.XMLUtils.ElementToString(XMLUtils.java:435) [java] at org.apache.axis.utils.XMLUtils.getInnerXMLString(XMLUtils.java:535) [java] at org.apache.axis.AxisFault.dumpToString(AxisFault.java:384) [java] at org.apache.axis.AxisFault.printStackTrace(AxisFault.java:785) [java] at java.lang.Throwable.printStackTrace(Throwable.java:451) [java] at gov.nih.nci.cagrid.cadsr.portal.discovery.CaDSRTypeDiscoveryComponent$1.run(CaDSRTypeDiscoveryComponent.java:199)
This is caused by characters in information stored in caDSR, that are not valid (in the WS-I profile required encodings) to be passed as XML on the grid. Most often the problematic characters are "smart quotes" introduce by MS Word when editing descriptions. The easiest solution is to identify and remove the characters in caDSR. Below is a program created by Satish Patel that can help identify them. The caDSR team is putting checks in place to identify these characters before they are introduced.
import gov.nih.nci.cadsr.umlproject.domain.Project; import gov.nih.nci.cadsr.umlproject.domain.UMLAttributeMetadata; import gov.nih.nci.cadsr.umlproject.domain.UMLClassMetadata; import gov.nih.nci.system.applicationservice.ApplicationService; import java.util.Collection; import java.util.Iterator; /*\* \* * @author Satish Patel \* \*/ public class ModelProblems { public static void main(String[\] args) { try { //Project name as specified in the caDSR String projectName = "<<project_name>>"; //Project version as specified in caDSR String projectVersion = "<<project_version>>"; //Buffer to hold the problems StringBuilder sb = new StringBuilder(); ApplicationService appService = ApplicationService .getRemoteInstance("http://cabio.nci.nih.gov/cacore31/http/remoteService"); Project searcghProject = new Project(); searcghProject.setVersion(projectVersion); searcghProject.setShortName(projectName); System.out.println("Creating domain model for project: " + searcghProject.getShortName() + " (version:"+ searcghProject.getVersion() + ")"); Collection projectCollection = appService.search(Project.class, searcghProject); //Proceed if only one Project is found in caDSR if(projectCollection.size()!=1){ if(projectCollection.size()>1) System.out.println("Number of projects found in caDSR: "+projectCollection.size()); else System.out.println("Could not obtain the correct project from caDSR"); System.exit(-1); }Project project = (Project)projectCollection.iterator().next(); System.out.println("Got domain model for project: " + project.getShortName() + " (version:"+project.getVersion() + ")"); Collection umlClassCollection = project.getUMLClassMetadataCollection(); System.out.println("Number of classes: " + umlClassCollection.size()); for(Iterator iter=umlClassCollection.iterator();iter.hasNext();-) { UMLClassMetadata classMetadata = (UMLClassMetadata)iter.next(); String classDesc = classMetadata.getDescription(); //Process description of the class if (!isUnicodeCompliant(classDesc)) sb.append(classMetadata.getName()+ " class description: " + classDesc+ "n"); //Process attributes of the class Collection attrMetadataCollection = classMetadata.getUMLAttributeMetadataCollection(); for(Iterator attrIterator=attrMetadataCollection.iterator();attrIterator.hasNext();-) { UMLAttributeMetadata attrMetadata = (UMLAttributeMetadata)attrIterator.next(); String desc = attrMetadata.getDescription(); if (!isUnicodeCompliant(desc)) sb.append(" "\\\\\\\\\\\\\\\\\\+ attrMetadata.getName()\\\\\\\\\\\\\\\\\\+ " attribute description in "\\\\\\\\\\\\\\\\\\+ classMetadata.getName()+" class: "+desc+"\n"); } } //Print result if(sb.length()>0){ System.out.println("---------------------------------"); System.out.println("Bad characters found in following"); System.out.println("---------------------------------"); System.out.println(sb.toString()); }else{ System.out.println("------------------------------"); System.out.println("No Bad characters in the model"); System.out.println("------------------------------"); }} catch (Exception ex) { ex.printStackTrace(); }} private static boolean isUnicodeCompliant(String str){ if (str != null) for (int k = 0; k < str.length(); k++) if(Character.isIdentifierIgnorable(str.charAt(k))) return false; return true; }}
Alternatively, the caGrid cadsr project contains a graphical tool which can locate such errors and generate reports which can be saved as text files. From the top level directory of caGrid, navigate to projects/cadsr, and execute the command 'ant runCadsrModelProblemFinder'.
Does the backend of a data service have to be a database? Can it be Excel or SAS files?
There is no requirement for backend databases. However, there is alot of tooling that is provided by NCICB and caGrid for taking standard relational databases and gridifying them. Excel or SAS files could be used, but you would have to create the programming logic to query them based upon the object-oriented query language that the grid supports.
What is the minimum API requirements with regard to the backend?
There is no API requirements for the backend database. However, if you are using the provided tooling (caCORE SDK, Introduce), then the backend database should be a relational database (e.g. MySQL or Oracle) with a traditional relational model.
At the Annual caBIG meeting there was discussion of silver compliance-accrediting. Does a grid service have to be accredited? Can we make it, following the rules of silver compliance, but not submitting to review?
There is no requirement that a grid service be silver compatible to be used. However, the production Index Service housed at the NCICB will only allow gold-compatible services to advertise. To accommodate others, there will be a sandbox Index Service that will allow any grid service to advertise.
What NCICB/caBIG resources can be leveraged to assist in the creation of the grid service?
The caCORE SDK provides tooling to take a UML model (defined by XMI), create a database (if one doesn't already exist), generate Java objects, and generate Java APIs. The caCORE SDK is command-line based. The caGrid Introduce toolkit then can be used to create a grid service from the caCORE SDK based system. Introduce is a graphical interface that provides wizard-like functionality for building, configuring, and deploying the service. Details on caCORE SDK Support.
When a user searches across the grid to discover information in the caDSR, what is the query actually looking for; concept codes, public IDs, key words, administered items?
There are currently a number of different implications to what you suggest.
Service advertisement
Services register their models in the caDSR. These models are then exposed by grid services through their service-level metadata. This is published to a central Index Service such that people can discover those services. The domain model is a subset of the information in the caDSR, but includes much of the information, such as CUI, class, property, value domain, etc.
Service discovery
All services on the grid that choose so may advertise their metadata (e.g. CDEs, API, etc.). Then, clients (users) can search that metadata to find services to invoke (query). For example, a client can search for all services that expose a particular CDE or portion of a CDE.
Data service querying
Once discovered, clients can query data services based. The query is structured based upon class name, property name, and associations. Value domain is not taken into account.
Federated querying
There is a component in caGrid that performs queries across data services, performing "joins" and aggregating data. CDE equivalence is not taken into account by this service, but tools that invoke the federated query processor can be written to leverage CDE equivalence.
caDSR Service
There is a grid service that allows you to query the caDSR, which is aptly named the caDSR Service. This allows you to extract pretty much all the information from the caDSR. It is based upon the Java client for the caDSR, and exposes more information than is possible to acquire through the service level metadata (a.k.a. discovery).
What is the relationship of the caDSR, EVS, and caGrid metadata and how can I use them?
This is discussed at a high level on the overview page. Additionally, relevant discussion occured on the caGrid user's mailing list, and is reproduced below:
From: caGrid Users discussion Forum CAGRID_USERS-L@LIST.NIH.GOV
On Behalf Of Shanbhag, Krishnakant (NIH/NCI) [E]
Sent: Friday, January 11, 2008 10:18 AM
To: CAGRID_USERS-L@LIST.NIH.GOV
Subject: Re: The coorinated use of caDSR and EVSHere's a potential <<example>> where the use of information provided by caGrid Metadata, caDSR and EVS can be utilized to coordinate a data workflow in a grid environment. There are many more flavors of using it; but this may give you some ideas..
Use Case:
A typical <<use case>> enabled by caGrid infrastructure, could involve a bioinfromatician wanting to get all information about Gene "Brca1" that is provided by grid enabled data services published by service providers on caGrid infrastructure.
In order to implement the above use case, here's a <<one>> potential way that the user may use to get this information programmatically.
Step 1: In order to do any search for <<genes>>, it is important that the user use computable semantics and not depend on textual search. This is where EVS Grid service would help. The user would use the EVS Grid service to get the <<concept code>> corresponding to search term "Gene", see below for code:
EVSGridServiceClient evsClient = new EVSGridServiceClient("http://cagrid-service.nci.nih.gov:8080/wsrf/services/cagrid/EVSGridService"); EVSDescLogicConceptSearchParams dlSearchParams = new EVSDescLogicConceptSearchParams(); dlSearchParams.setVocabularyName("NCI_Thesaurus"); dlSearchParams.setSearchTerm(searchTerm); dlSearchParams.setLimit(1); gov.nih.nci.evs.domain.DescLogicConcept[\] dlConcept = evsClient.searchDescLogicConcept(dlSearchParams);Step 2: Once a set of concept codes are obtained from Step 1, the user could use the caGrid metadata service APIs in caGrid to obtain the list of data services that contain the selected concept code
DiscoveryClient client = new DiscoveryClient(); EndpointReferenceType[\] dsEPRs = client.discoverDataServicesByModelConceptCode(conceptCode);Step 3: The user may now want to determine the specific <<object class>> that is potentially interesting to the user from the list of selected data services. This will involve calling the caGrid Metadata APIs that internally uses the caDSR Grid service. At the end of this step, the user would like to get a list of specific object classes from the various data services that can be further queried for actual data related to genes. This is where primarily the caDSR Grid service becomes useful. The combination of caGrid Metadata APIs and caDSR Grid service allows users to determine the specific attributes that may be common between different <<object classes>> in different data services that are serving the same concept code. This is important because, this knowledge would essentially allow users to develop what we call <<joins>> among multiple data services.
For example, the <GridPIR> service and <caBIO> grid service have both <Gene> class. The user may be interested in doing a query for the <Gene> class in GridPIR that have the same Genes that are available in <<caBIO>> grid service and then constrain it to be of type "Brca1". To do this, DCQL (Distributed caBIG Query) needs to be written with the appropriate join attributes between the two services. The appropriate join attributes are determined computationally by "Common Data Element (CDE)" value (i.e. public ID and Version Number) which can be accessed from the caDSR Grid Service. In the above example, the join attributes for the <<Gene>> class for the two services turns out to be on the attribute "name" and "symbol" respectively. As you see, the attributes have different <<names>>; still they are computationally joinable because they have the same CDE.The actual code to enable the <above> is a little more elaborate than can be written in a few lines. But, you may want to check the following APIS:
DomainModelExposedUMLClassCollection umlClassCollection = domainModel.getExposedUMLClassCollection(); UMLClass. getUmlAttributeCollection() ...Step 4: The user now has information about specific data services, object classes and join Keys. Now, the user can use the Federated query processor service to aggregate data from the selected data objects (e.g. Genes in GridPIR and caBIO) or do a distributed object join base on the selected join keys.
Foll. is an example of DCQL XML that can be generated <<dynamically>> from step 3 using the information in the caGrid Metadata API and caDSR Grid Service to do distributed querying.<?xml version="1.0" encoding="UTF-8"?> <DCQLQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql Distributed_CQL_schema_2.0.xsd" xmlns:cql="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery" xmlns="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql"> <TargetObject name="edu.georgetown.pir.domain.Gene"> <Group logicRelation="AND"> <ForeignAssociation targetServiceURL="http://cabio-gridservice.nci.nih.gov:80/wsrf-cabio/services/cagrid/CaBIOSvc"> <JoinCondition localAttributeName="name" foreignAttributeName="symbol" predicate="EQUAL_TO"/> <ForeignObject name="gov.nih.nci.cabio.domain.Gene"> <Attribute name="symbol" value="BRC%" predicate="LIKE"/> </ForeignObject> </ForeignAssociation> <Attribute name="name" value="BRC%" predicate="LIKE"/> </Group> </TargetObject> <targetServiceURL>[http://141.161.25.20:8080/wsrf/services/cagrid/GridPIR]</targetServiceURL> </DCQLQuery>The user can send the above DCQL to the Federated query processor service and get instance data related to Genes "Brca1".
Are grid queries accessing the metadata in the caDSR as part of their query function, or are they accessing metadata that is from the caDSR that is actually stored in applications/data sets?
Advertisement and Discovery accesses metadata that is extracted from the caDSR at service creation time. It is essentially static once the service is created (though service developers can change it). Queries to the actual services are based upon this metadata.
Is the data type a data element considered when querying across the grid or is it just the object and property (Class/attribute) portion used in the query?
Datatype can be used during discovery, but not querying the actual service for data.
Are queries accessing public ids of CDEs, CUIs in data element concepts? Object Classes? Properties?
CUI, object class, and property all accessible through discovery. Queries for the actual data of the data service are created using the class, property, and associations.
Are queries using keyword searches of terms in CDE names or DEC names?
Discovery generally uses full match of the names. However, there are ways to full text search. Queries to the data service use the full name.
Is it possible for us to establish equivalency of Person/Patient/Participant by the use of alternate names for standard CDEs?
Alternate name is not included in the metadata for the service. However, this information is exposed by the caDSR service, so potentially you could query the caDSR service the actual name and then perform discovery. The caGrid team is investigating whether to expose alternate name in the service-level metadata.
Is a user able to query the grid by a CDE alternate name type (ex. Alternate Name "DICOM Tag" has a value of 0008,0008) and get back all CDEs that have this tag associated with the CDE?
See answer above.
How can I get to Gold compatibility starting from scratch (or from existing system)?
There are basically two aspects to that question. The first is just how to use the tools. The second is meeting the compatibility requirements to actually be classified as "Gold." The tools provided by caGrid can help one create systems which should meet Gold level compatibility, but not every system created by caGrid tools is automatically Gold compatible.
In the simplest case, to just use Introduce you really just need XML schemas which describe your data types, and then you can add operations to the service which use those types. Then you can implement the "stubbed" methods Introduce creates for you (you should find sufficient information on this on the wiki and in the release documents for caGrid). However, that's far from a "Gold" service; you basically just have a grid service. Much of the guidelines around Gold compatibility are oriented around both semantic and syntactic interoperability of grid services, and to meet those requirements one must do additional steps to ensure that they just don't have a "grid service" but that that grid service is interoperable with the rest of caBIG, and that clients of caBIG can appropriately leverage it. That is one of the key aspects which differentiates caBIG/caGrid from other grid efforts. We are currently working on version 3 of the compatibility guidelines, which will provide much more clarity and detail around the Gold level. Some of the details are aimed around leveraging harmonized and curated data types, semantically describing your data types, and publishing/registering appropriate metadata. The existing information is here.
We hope to have version 3 out in the near term.
How does the caDSR relate to XML Schemas used by a service?
The caDSR stores the model and its semantics, but doesn't describe how the data moves over the grid as XML. That's what the XML Schemas are for, and why they are registered in the GME. XML Schemas define datatypes in a particular namespace (to avoid naming collisions of elements/types). The particular namespace used by the schema doesn't really matter as long as it's a valid URI, and is unique (so as to not be confused with other models). The caDSR team is currently working on registering information in caDSR which identifies the relevant namespaces and XML types a given item (Project, Class, etc) is represented as on the grid, but that is not yet available. Therefore, we currently use a naming convention for namespaces (based on things you the caDSR Project name and Context) which allows us to locate the appropriate schemas for a given Project. If you follow that convention, things like the caDSR types browser in Introduce can automatically locate and extract the necessary schemas for Projects you want to use. If you don't follow the convention, that is fine, but you must manually associate the schemas with the models you are using (until such information is programmatically available from the caDSR).
There is an effort going on the identify a minimal set of caDSR datatypes which should be used to construct all domain models, and make sure those datatypes have a formal mapping to a representation in XML Schema. The programming language choices for mapping to and from the XML are then completely irrelevant and can vary from developer to developer. For example, the same grid service may be invoked by multiple clients from multiple programming languages; certainly a Perl client won't actually be using java.lang.Integer. What is important is the registration of the logical structure and semantics, and a formal and consistent mapping to XML Schema.
Is caGrid 0.5 still supported?
Now that the caGrid 1.1 infrastructure has become fully operational, one of the most important thrusts in caGrid 1.1 has been to implement the first set of policies and procedures created by the caBIG™ Security Working Group. One of the security requirements of caGrid is to have both user and host service credentials to be adhering to the initial Levels of Assurance (LOA1).
For instructions on obtaining caGrid grid accounts, see links below:
Additionally, the caGrid security infrastructure enables institutional credential providers to participate in the caGrid trust fabric. Please contact the caBIG™ Security Working Group for guidance on becoming part of the caGrid trust fabric.
caGrid 0.5 De-support: De-support of the caGrid 0.5 "Test bed" infrastructure was announced on October 1, 2007, and completed on January 1, 2008.
caGrid 0.5 User Accounts: Due to LOA1 requirements, it was not practical to migrate caGrid 0.5 user credentials to caGrid 1.1; users must register for a new account on the caGrid 1.1 Dorian Identity Provider.
If you have any questions or problems obtaining these credentials, please do not hesitate to ask for support.
Calling a method on a secure service throws the error: No client transport named 'https' found!
When a call is made to a secure grid service (using https transport), the following error message might be seen:
AxisFault
faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.generalException
faultSubcode:
faultString: No client transport named 'https' found!
faultActor:
faultNode:
faultDetail:
{http://xml.apache.org/axis/}stackTrace:No client transport named 'https' found!
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:170)
at org.apache.axis.client.Call.invokeEngine(Call.java:2727)
at org.apache.axis.client.Call.invoke(Call.java:2710)
at org.apache.axis.client.Call.invoke(Call.java:2386)
at org.apache.axis.client.Call.invoke(Call.java:2309)
at org.apache.axis.client.Call.invoke(Call.java:1766)
at gov.nih.nci.cagrid.data.stubs.bindings.DataServicePortTypeSOAPBindingStub.getServiceSecurityMetadata(DataServicePortTypeSOAPBindingStub.java:1028)
at gov.nih.nci.cagrid.data.client.DataServiceClient.getServiceSecurityMetadata(DataServiceClient.java:129)
at gov.nih.nci.cagrid.introduce.security.client.ServiceSecurityClient.configureStubSecurity(ServiceSecurityClient.java:177)
at gov.nih.nci.cagrid.data.client.DataServiceClient.query(DataServiceClient.java:116)
at org.cagrid.sdk4.example.tutorial.QueryRunner.performQuery(QueryRunner.java:28)
at org.cagrid.sdk4.example.tutorial.QueryRunner.main(QueryRunner.java:50)
The Axis Engine underlying the client API does not have the https transport registered. This can be resolved by adding $GLOBUS_LOCATION to the client's classpath.
What are the differences between the caGrid 1.2 domain model XML file and the data that is in the caDSR?
1) Isn't the "Workflow Status" for CDE's included in the XML?
"Workflow status" for a class is not contained in the domain model. However, you can search for the CDE in the caDSR using the provided publicID and version.
2) Also the names of the classes/attributes are available in the XML, but the OC Names, Property Names , CDE Names do not appear in the XML. Is that true? Do I need to explicitly query caDSR to get these details?
All the concept names and IDs are available, but only the UML names are provided as those are what are used for querying. Similar to the above, if you need additional information about the model you would need to look it up in the caDSR. The DomainModel extract you are referring to is used to support the Data Service discovery and query use cases.
3) Also there seems to be some discrepancy in the ordering of concept codes for Property/OC in what appears on caDSR and what appears in the XML. For example, in a domain model file for caTissueCore(1.2), the property Clinical Status has 2 concept codes in order "Status"(0) and "Clinical"(1)
The ordering is ascending from the primary concept (so 0 is the primary, which is consistent with the caDSR).





