caGrid in Action
To see the original article in PDF format, click here
.
caTissue Suite and caBench-to-Bedside (caB2B): Managing and querying for biospecimens on the caGrid
by Rakesh Nagarajan 1, Poornima Govindrao 1, Mukesh Sharma 1, Amy Brink 1, David Mulvihill 1, Sachin Lale 2, Srikanth Adiga 3, Mark Watson 1
1 Washington University in Saint Louis, Alvin J. Siteman Cancer Center
2 Persistent Systems
3 Krishagni Solutions
Human biospecimens are essential for translational biomedical research. They provide materials needed to directly investigate the mechanisms of disease, to identify genes and proteins relevant to disease pathogenesis, to validate biomarkers which can better predict the course of disease, and to develop new and personalized medical therapies. Significant advances in clinical and translational cancer research will depend on a more comprehensive and national-scale approach to the collection, management, storage, and sharing of human biospecimens and biospecimen-related data.
The National Cancer Institute (NCI) has launched the caBIG® (cancer Biomedical Informatics Grid) initiative in order to accelerate cancer research. Under this initiative, various tools are being built or adapted to collect, analyze, integrate, and disseminate information associated with cancer research and care. The main goal of caBIG® tools is to allow sharing of data in a semantic and syntactically interoperable manner. caTissue Suite is one of the caBIG® tools designed to manage the associated complexities of biospecimen annotation data and critical functionalities needed for operation in a multiple and distributed biorepository environment. caGrid is the underlying network architecture that provides the basis for connectivity between caBIG® tools across cancer research institutions allowing research groups to tap into the rich collection of emerging cancer research data while supporting their individual investigations.
caTissue is a web-based open source application that uses caBIG® principles including role-based security, UML driven architecture, and semantically annotated, reusable data elements that leverage standardized vocabularies and ontologies. At Washington University School of Medicine, we are using caTissue Suite in a full production capacity. A single instance of the system is tracking tumor biospecimens in the Siteman Cancer Center Tissue Procurement (Tumor Bank) Facility. To facilitate data sharing within and across institutions, a deidentified, publicly- and caGrid-accessible mirror instance of the system is maintained and updated daily with production data. Biospecimen data are available to applications on the caGrid and can be queried using applications such as caBench-to-Bedside (caB2B) or the caGrid Portal (Figure 1). caTissue Suite can also be extended to annotate biospecimen data that are not available in the base deployment through a mechanism called Dynamic Extensions (DE).
| There are no images attached to this page. |
In caTissue Suite v1.1.2, the latest version, previously identified caGrid performance and stability issues were resolved. There were two main causes for poor caGrid query performance. First, Hibernate lazy loading was disabled in many class-to-class associations, leading to poor data retrieval performance. Second, unnecessary data retrieval was identified and removed in the API query filtering business logic. To resolve these issues we took advantage of the fact that CQL always returns a single object (called the target object in CQL) at a time. This means that even though caTissue was internally retrieving the data from the database for all the associated objects, it was returning just the data for the target object to the caGrid. Therefore, we modified the CQL to HQL processor to query explicitly for just the attributes of the target class specified in the CQL. This meant that none of the associated data would be retrieved from the database. This applies only to CQL queries and not for caCORE API-based queries. In the case of the caCORE API-based queries, it was desirable to return all associated objects so that users could traverse through the associated classes. We also modified the protected health information (PHI) filtration logic by adding extra caches and avoiding unnecessary data retrieval which led to faster processing of PHI data. The code changes (Figure 2) are very local to the caGrid query and caCORE query API functionality (i.e. none of the User Interface business logic related code was impacted).
| There are no images attached to this page. |
| There are no images attached to this page. |





