Infrastructure for managing and analyzing biological ... · Infrastructure for managing and...

1
Infrastructure for managing and analyzing biological networks derived from collections of plant images Abhiram Das 1 , Alexander Bucksch 1,2 , Joshua S. Weitz 1,3 1 School of Biology, 2 School of Interactive Computing and 3 School of Physics and, Georgia Institute of Technology Introduction Currently available imaging technologies permit rapid acquisition of high-resolution images of spatial networks in biology e.g., leaf venation networks, cardiovascular networks, cortical networks, root networks and ant trails. Hence, scientists interested in spatial networks are increasingly becoming analysis-limited, instead of data-limited. The challenges to analyzing this data include how to: (i) identify a network from image data (whether 2D, 3D or point clouds); (ii) visualize structural properties of large, complex biological networks with meta-data; (iii) utilize a common analysis framework for different networks in different problem domains; (iv) distribute the results and raw data of spatial network analysis to the community. At present, many systems are available for biological image management [5], but they are limited in the sense that they lack automation and analysis capabilities designed for spatial networks. So, we bring together few state of the art tools and technologies to build a web-based platform specifically designed to meet following objectives. Objectives 1. Store and manage image data of biological spatial networks 2. Extract and store networks from image data using existing algorithms 3. Aggregate, analyze and visualize networks Background Recent advances in imaging techniques have produced large collections of images and metadata in different fields of science. In particular, plant phenomics is generating large datasets that warrant approaches for high-throughput data collection, management, processing and analysis. Quantifying plant phenotypes requires the development of novel informatics tools that span image analysis, trait estimation, workflow design and data integration. A number of software tools have been developed recently to estimate plant traits [1-4]. A central concept unifying many of these tools is that estimates of plant traits are derived from reconstruction of networks. However, significant work remains to (i) develop trait estimation algorithms and workflows suitable for field studies; (ii) integrate trait-estimation with formal data integration methods to enable discovery of trends and patterns spanning multiple experiments. Goals 1. Build an integrated platform to store heterogeneous plant image data, extract phenotypic data from images, process and analyze the data and map it to genotype. 2. Design ontology content model to store heterogeneous and evolving plant phenotypic data. 3. Provide a platform to plugin new image processing modules and allow end users to build their own processing pipeline using existing modules. 4. Analyze and visualize biological networks. 5. Integrate QTL and GWAS analysis pipeline to map phenotype to genotype. Challenges 1. Platform should scale to high volume of image data and metadata in TB scale. 2. It should scale to computation of large number of data points. 3. It should have intuitive user interfaces. References 1. Le Bot J, et al. (2010) DART: a software to analyse root system architecture and development from captured images. Plant and Soil 326(1-2):261-273. 2. Clark RT, et al. (2011) Three-dimensional root phenotyping with a novel imaging and software platform. Plant Physiology 156(2): 455-465. 3. Galkovskyi T, et al. (2012) GiA Roots: Software for the high throughput analysis of plant root system architecture. BMC Plant Biology 12(116). 4. Trachsel S, Kaeppler SM, Brown KM, & Lynch JP (2011) Shovelomics: high throughput phenotyping of maize (Zea mays L.) root architecture in the field. Plant and Soil 341(1-2):75-87. 5. Kvilekval K, Fedorov D, Obara B, Singh A, & Manjunath BS (2010) Bisque: a platform for bioimage analysis and management. Bioinformatics 26(4):544-552. Abstract Current Platform and Applications for Plant Image Analysis Proposed Architecture http://ecotheory.biology.gatech.edu [email protected] [email protected] [email protected] This work is supported by: Application 1: Cleared Leaf Image Database Leaf and Root Image Database Drupal MySQL Apache Web Server PHP Drupal Core Custom Modules Web Services Standalone Client JAVA Web Service Client Image Processing Integrated platform for plant image analysis iRODS data grid (File System) Islandora Fedora Commons Drupal RDF Repository Image Processing Workflows High throughput computing nodes Image processing module Graph Database Cytoscape web User and Roles Plant Ontology Content Model Web Client iRODS Client STORAGE PROCESSING CLIENT Application 2: DIRT – Digging Into Root Traits Goals Online storage of cleared leaf images and its metadata. Managing and sharing image collections with trusted community. Batch upload and download of images. Ingesting processed images from third- party image processing software. Findings Hosted on iPlant and accessible at http:// www.clearedleavesdb.org/ Image collections from Smithsonian and University of Western Australia. System Architecture Goals Online storage and trait computation of plant root images. Share root images and computation result with colleagues. Findings Currently being used to investigate physiological and genetic relevance of computed traits. For example water stressed monocot roots has many shallow angle roots. College of Sciences Research Development Grant Center for Data Analytics Seed Grant Less number of shallow angle roots Many shallow angle roots

Transcript of Infrastructure for managing and analyzing biological ... · Infrastructure for managing and...

Page 1: Infrastructure for managing and analyzing biological ... · Infrastructure for managing and analyzing biological networks derived from collections of plant images Abhiram Das1, Alexander

Infrastructure for managing and analyzing biological networks derived from collections of plant images

Abhiram Das1, Alexander Bucksch1,2, Joshua S. Weitz1,3

1School of Biology, 2School of Interactive Computing and 3School of Physics and, Georgia Institute of Technology

Introduction Currently available imaging technologies permit rapid acquisition of high-resolution images of spatial networks in biology e.g., leaf venation networks, cardiovascular networks, cortical networks, root networks and ant trails. Hence, scientists interested in spatial networks are increasingly becoming analysis-limited, instead of data-limited. The challenges to analyzing this data include how to: (i) identify a network from image data (whether 2D, 3D or point clouds); (ii) visualize structural properties of large, complex biological networks with meta-data; (iii) utilize a common analysis framework for different networks in different problem domains; (iv) distribute the results and raw data of spatial network analysis to the community. At present, many systems are available for biological image management [5], but they are limited in the sense that they lack automation and analysis capabilities designed for spatial networks. So, we bring together few state of the art tools and technologies to build a web-based platform specifically designed to meet following objectives.

Objectives 1.  Store and manage image data of biological spatial networks 2.  Extract and store networks from image data using existing

algorithms 3.  Aggregate, analyze and visualize networks

Background Recent advances in imaging techniques have produced large collections of images and metadata in different fields of science. In particular, plant phenomics is generating large datasets that warrant approaches for high-throughput data collection, management, processing and analysis. Quantifying plant phenotypes requires the development of novel informatics tools that span image analysis, trait estimation, workflow design and data integration. A number of software tools have been developed recently to estimate plant traits [1-4]. A central concept unifying many of these tools is that estimates of plant traits are derived from reconstruction of networks. However, significant work remains to (i) develop trait estimation algorithms and workflows suitable for field studies; (ii) integrate trait-estimation with formal data integration methods to enable discovery of trends and patterns spanning multiple experiments.

Goals 1.  Build an integrated platform to store heterogeneous plant image data, extract phenotypic data

from images, process and analyze the data and map it to genotype. 2.  Design ontology content model to store heterogeneous and evolving plant phenotypic data. 3.  Provide a platform to plugin new image processing modules and allow end users to build their

own processing pipeline using existing modules. 4.  Analyze and visualize biological networks. 5.  Integrate QTL and GWAS analysis pipeline to map phenotype to genotype. Challenges 1.  Platform should scale to high volume of image data and metadata in TB scale. 2.  It should scale to computation of large number of data points. 3.  It should have intuitive user interfaces.

References 1. Le Bot J, et al. (2010) DART: a software to analyse root system architecture and development from captured images. Plant and

Soil 326(1-2):261-273. 2. Clark RT, et al. (2011) Three-dimensional root phenotyping with a novel imaging and software platform. Plant Physiology 156(2):

455-465. 3. Galkovskyi T, et al. (2012) GiA Roots: Software for the high throughput analysis of plant root system architecture. BMC Plant

Biology 12(116). 4. Trachsel S, Kaeppler SM, Brown KM, & Lynch JP (2011) Shovelomics: high throughput phenotyping of maize (Zea mays L.)

root architecture in the field. Plant and Soil 341(1-2):75-87. 5. Kvilekval K, Fedorov D, Obara B, Singh A, & Manjunath BS (2010) Bisque: a platform for bioimage analysis and management.

Bioinformatics 26(4):544-552.

Abstract Current Platform and Applications for Plant Image Analysis Proposed Architecture

http://ecotheory.biology.gatech.edu

[email protected]

[email protected]

[email protected]

This work is supported by:

Application 1: Cleared Leaf Image Database

Leaf and Root Image Database

Dru

pal

MySQL

Apache Web Server

PHP

Drupal Core

Custom Modules

Web Services

Standalone Client

JAVA Web

Service Client Image

Processing

Integrated platform for plant image analysis

iRODS data grid (File System)

Islandora

Fedora Commons

Drupal

RDF Repository

Image Processing Workflows

High throughput computing nodes

Image processing module

Graph Database

Cytoscape web

User and Roles

Plan

t Ont

olog

y C

onte

nt M

odel

Web Client iRODS Client

STO

RA

GE

PR

OC

ESS

ING

C

LIE

NT

Application 2: DIRT – Digging Into Root Traits

Goals •  Online storage of cleared leaf images and its metadata. •  Managing and sharing image collections with trusted community. •  Batch upload and download of images. •  Ingesting processed images from third-party image processing software. Findings •  Hosted on iPlant and accessible at http://www.clearedleavesdb.org/ •  Image collections from Smithsonian and University of Western Australia.

System Architecture

Goals •  Online storage and trait computation of plant root images. •  Share root images and computation result with colleagues. Findings •  Currently being used to investigate physiological and genetic relevance of computed traits. •  For example water stressed monocot roots has many shallow angle roots.

College of Sciences Research Development Grant

Center for Data Analytics Seed Grant

Less number of shallow angle roots Many shallow angle roots