caGrid in Action
Featured Project: caTissue Suite and caBench-to-Bedside (caB2B): Managing and querying for biospecimens on the caGrid
by Rakesh Nagarajan 1, Poornima Govindrao 1, Mukesh Sharma 1, Amy Brink 1, David Mulvihill 1, Sachin Lale 2, Srikanth Adiga 3, Mark Watson 1
1 Washington University in Saint Louis, Alvin J. Siteman Cancer Center
2 Persistent Systems
3 Krishagni Solutions
Human biospecimens are essential for translational biomedical research. They provide materials needed to directly investigate the mechanisms of disease, to identify genes and proteins relevant to disease pathogenesis, to validate biomarkers which can better predict the course of disease, and to develop new and personalized medical therapies. Significant advances in clinical and translational cancer research will depend on a more comprehensive and national-scale approach to the collection, management, storage, and sharing of human biospecimens and biospecimen-related data.
The National Cancer Institute (NCI) has launched the caBIG® (cancer Biomedical Informatics Grid) initiative in order to accelerate cancer research. Under this initiative, various tools are being built or adapted tp collect, analyze, integrate, and disseminate information associated with cancer research and care. The main goal of caBIG® tools is to allow sharing of data in a semantic and syntactically interoperable manner. caTissue Suite is one of the caBIG® tools designed to manage the associated complexities of biospecimen annotation data and critical functionalities needed for operation in a multiple and distributed biorepository environment. caGrid is the underlying network architecture that provides the basis for connectivity between caBIG® tools across cancer research institutions allowing research groups to tap into the rich collection of emerging cancer research data while supporting their individual investigations.
caTissue is a web-based open source application that uses caBIG® principles including role-based security, UML driven architecture, and semantically annotated, reusable data elements that leverage standardized vocabularies and ontologies. At Washington University School of Medicine, we are using caTissue Suite in a full production capacity. A single instance of the system is tracking tumor biospecimens in the Siteman Cancer Center Tissue Procurement (Tumor Bank) Facility. To facilitate data sharing within and across institutions, a deidentified, publicly- and caGrid-accessible mirror instance of the system is maintained and updated daily with production data. Biospecimen data are available to applications on the caGrid and can be queried using applications such as caBench-to-Bedside (caB2B) or the caGrid Portal (Figure 1). caTissue Suite can also be extended to annotate biospecimen data that are not available in the base deployment through a mechanism called Dynamic Extensions (DE).
<!--
/* Font Definitions */
@font-face
@font-face
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
.MsoChpDefault
.MsoPapDefault
@page Section1
div.Section1
-->
| There are no images attached to this page. |
In caTissue Suite v1.1.2, the latest version, previously identified caGrid performance and stability issues were resolved. There were two main causes for poor caGrid query performance. First, Hibernate lazy loading was disabled in many class-to-class associations, leading to poor data retrieval performance. Second, unnecessary data retrieval was identified and removed in the API query filtering business logic. To resolve these issues we took advantage of the fact that CQL always returns a single object (called the target object in CQL) at a time. This means that even though caTissue was internally retrieving the data from the database for all the associated objects, it was returning just the data for the target object to the caGrid. Therefore, we modified the CQL to HQL processor to query explicitly for just the attributes of the target class specified in the CQL. This meant that none of the associated data would be retrieved from the database. This applies only to CQL queries and not for caCORE API-based queries. In the case of the caCORE API-based queries, it was desirable to return all associated objects so that users could traverse through the associated classes. We also modified the protected health information (PHI) filtration logic by adding extra caches and avoiding unnecessary data retrieval which led to faster processing of PHI data. The code changes (Figure 2) are very local to the caGrid query and caCORE query API functionality (i.e. none of the User Interface business logic related code was impacted).
With improved caGrid performance, caTissue Suite can now be more easily implemented to perform queries for biospecimens across institutions using tools such as caB2B. Cancer Bench to Bedside (caB2B) is an open-source, secure federated query tool that allows users to leverage data under caBIG® through a graphical user interface. Its metadata-based query interface supports searching virtually any caGrid data service, such as caTissue Suite. The overarching goal of caB2B is to allow agglomeration of data across multiple instances of the same application or across multiple applications. caB2B provides seamless integration of services for query when new services are introduced on the caGrid. As a result, the caGrid instances of the latest release of caTissue Suite are available for query using caB2B. Users can query multiple caTissue Suite/Core instances using either caB2B Web Application (Figure 3) or the caB2B Client Application. The caB2B Web Application provides preconfigured keyword and form-based queries based on common use cases. Advanced users may create and execute more customized queries using the Client Application. The query component consists of a diagrammatic view that allows the user to create a directed acyclic graph connecting two or more classes of the query. Users may then save the query, execute the query, view the results using various graphical components, and save the results as a 'virtual experiment'.





