Infrastructure for managing and analyzing biological ... · Infrastructure for managing and...
Transcript of Infrastructure for managing and analyzing biological ... · Infrastructure for managing and...
Infrastructure for managing and analyzing biological networks derived from collections of plant images
Abhiram Das1, Alexander Bucksch1,2, Joshua S. Weitz1,3
1School of Biology, 2School of Interactive Computing and 3School of Physics and, Georgia Institute of Technology
Introduction Currently available imaging technologies permit rapid acquisition of high-resolution images of spatial networks in biology e.g., leaf venation networks, cardiovascular networks, cortical networks, root networks and ant trails. Hence, scientists interested in spatial networks are increasingly becoming analysis-limited, instead of data-limited. The challenges to analyzing this data include how to: (i) identify a network from image data (whether 2D, 3D or point clouds); (ii) visualize structural properties of large, complex biological networks with meta-data; (iii) utilize a common analysis framework for different networks in different problem domains; (iv) distribute the results and raw data of spatial network analysis to the community. At present, many systems are available for biological image management [5], but they are limited in the sense that they lack automation and analysis capabilities designed for spatial networks. So, we bring together few state of the art tools and technologies to build a web-based platform specifically designed to meet following objectives.
Objectives 1. Store and manage image data of biological spatial networks 2. Extract and store networks from image data using existing
algorithms 3. Aggregate, analyze and visualize networks
Background Recent advances in imaging techniques have produced large collections of images and metadata in different fields of science. In particular, plant phenomics is generating large datasets that warrant approaches for high-throughput data collection, management, processing and analysis. Quantifying plant phenotypes requires the development of novel informatics tools that span image analysis, trait estimation, workflow design and data integration. A number of software tools have been developed recently to estimate plant traits [1-4]. A central concept unifying many of these tools is that estimates of plant traits are derived from reconstruction of networks. However, significant work remains to (i) develop trait estimation algorithms and workflows suitable for field studies; (ii) integrate trait-estimation with formal data integration methods to enable discovery of trends and patterns spanning multiple experiments.
Goals 1. Build an integrated platform to store heterogeneous plant image data, extract phenotypic data
from images, process and analyze the data and map it to genotype. 2. Design ontology content model to store heterogeneous and evolving plant phenotypic data. 3. Provide a platform to plugin new image processing modules and allow end users to build their
own processing pipeline using existing modules. 4. Analyze and visualize biological networks. 5. Integrate QTL and GWAS analysis pipeline to map phenotype to genotype. Challenges 1. Platform should scale to high volume of image data and metadata in TB scale. 2. It should scale to computation of large number of data points. 3. It should have intuitive user interfaces.
References 1. Le Bot J, et al. (2010) DART: a software to analyse root system architecture and development from captured images. Plant and
Soil 326(1-2):261-273. 2. Clark RT, et al. (2011) Three-dimensional root phenotyping with a novel imaging and software platform. Plant Physiology 156(2):
455-465. 3. Galkovskyi T, et al. (2012) GiA Roots: Software for the high throughput analysis of plant root system architecture. BMC Plant
Biology 12(116). 4. Trachsel S, Kaeppler SM, Brown KM, & Lynch JP (2011) Shovelomics: high throughput phenotyping of maize (Zea mays L.)
root architecture in the field. Plant and Soil 341(1-2):75-87. 5. Kvilekval K, Fedorov D, Obara B, Singh A, & Manjunath BS (2010) Bisque: a platform for bioimage analysis and management.
Bioinformatics 26(4):544-552.
Abstract Current Platform and Applications for Plant Image Analysis Proposed Architecture
http://ecotheory.biology.gatech.edu
This work is supported by:
Application 1: Cleared Leaf Image Database
Leaf and Root Image Database
Dru
pal
MySQL
Apache Web Server
PHP
Drupal Core
Custom Modules
Web Services
Standalone Client
JAVA Web
Service Client Image
Processing
Integrated platform for plant image analysis
iRODS data grid (File System)
Islandora
Fedora Commons
Drupal
RDF Repository
Image Processing Workflows
High throughput computing nodes
Image processing module
Graph Database
Cytoscape web
User and Roles
Plan
t Ont
olog
y C
onte
nt M
odel
Web Client iRODS Client
STO
RA
GE
PR
OC
ESS
ING
C
LIE
NT
Application 2: DIRT – Digging Into Root Traits
Goals • Online storage of cleared leaf images and its metadata. • Managing and sharing image collections with trusted community. • Batch upload and download of images. • Ingesting processed images from third-party image processing software. Findings • Hosted on iPlant and accessible at http://www.clearedleavesdb.org/ • Image collections from Smithsonian and University of Western Australia.
System Architecture
Goals • Online storage and trait computation of plant root images. • Share root images and computation result with colleagues. Findings • Currently being used to investigate physiological and genetic relevance of computed traits. • For example water stressed monocot roots has many shallow angle roots.
College of Sciences Research Development Grant
Center for Data Analytics Seed Grant
Less number of shallow angle roots Many shallow angle roots