UCSB Bio-imaging Infrastructure

51
Center for Bioimaging Informatics UCSB Bio-imaging Infrastructure December 2006 Center for Bioimaging Informatics www.bioimage.ucsb.edu Supported by NSF ITR Supported by NSF ITR #0331697 #0331697

description

Center for Bioimaging Informatics. www.bioimage.ucsb.edu Supported by NSF ITR #0331697. UCSB Bio-imaging Infrastructure. December 2006. Projects. Bisque/OME In use at UCSB Bisquik Next generation system (in construction). Motivation. Analysis creates knowledge Image analysis - PowerPoint PPT Presentation

Transcript of UCSB Bio-imaging Infrastructure

Page 1: UCSB Bio-imaging Infrastructure

Center for Bioimaging Informatics

UCSB Bio-imaging Infrastructure

December 2006

Center for Bioimaging Informaticswww.bioimage.ucsb.eduSupported by NSF ITR Supported by NSF ITR #0331697#0331697

www.bioimage.ucsb.eduSupported by NSF ITR Supported by NSF ITR #0331697#0331697

Page 2: UCSB Bio-imaging Infrastructure

December 2006 2

Center for Bioimaging Informatics Projects

• Bisque/OME– In use at UCSB

• Bisquik– Next generation system (in construction)

Page 3: UCSB Bio-imaging Infrastructure

December 2006 3

Center for Bioimaging Informatics Motivation

• Analysis creates knowledge– Image analysis– Querying – Mining

• Available Storage systems are growing• Available Processing increasing

Page 4: UCSB Bio-imaging Infrastructure

December 2006 4

Center for Bioimaging Informatics Challenges

• Diverse users– UCSB Neurosciences institute

• Retinal detachment : Fischer et al • Microtubule dynamics : Feinstein and Wilson

– CMU Murphy Lab– Flybase– Computable Plant

• Dataset challenges– Multiple types and collection techniques– Complex images (5D) and metadata– Acquisition and management of large sets– Security of private data – Sharing of experimental data

Page 5: UCSB Bio-imaging Infrastructure

December 2006 5

Center for Bioimaging Informatics Challenges

• Metadata– Collection– Organization– Sharing/interpretation

• Dataset management– Personal collections– Multiple organizations– Privacy and security

• Analysis design• Analysis integration

Page 6: UCSB Bio-imaging Infrastructure

December 2006 6

Center for Bioimaging Informatics Block System diagram

Image and metadata capture

DBStorage

ImageAnalysis

CollectionAnalysis

Semantic Analysis

Browse SearchContent + metadata

Interactive Enhancement

KnowledgeDiscovery

Ground truthCollection

Page 7: UCSB Bio-imaging Infrastructure

December 2006 7

Center for Bioimaging Informatics

BISQUEBio-Image Semantic Query User Environment

• Current dataset collection• BISQUE functionality

– Data management – Ground truth acquisition– Analysis

• Architecture

Page 8: UCSB Bio-imaging Infrastructure

December 2006 8

Center for Bioimaging Informatics Current collections

• UCSB Retinal Objects (confocal, EM)• UCSB Microtubule Objects(light, AFM)

Type Current Backlog

Rate/y

Expected 4Yrs

2D Images

Retinal EM 500 22000 500 23,000 23K

Retinal confocal P

3000 600 2400 10,000 10K

Retinal confocal Z

600 14000 12000 10,000 50K

Microtubule light 3000 2500 2500 13,000 400K

Microtubule AFM 0 500 1200 5000 30K

Flybase 0 125K 0 125K 125K• Total estimated size in TBs and growing for 1 Lab• Complexity and analysis are the main issues

Page 9: UCSB Bio-imaging Infrastructure

December 2006 9

Center for Bioimaging Informatics

Data management capabilities

• Digital notebook– Direct import process– Reconfigurable (confocal, 4 x microtubule,

etc)

• Web – Web access and browsing– Organize images and metadata– Data sharing environment– Search by metadata or content– Integrated analysis

Page 10: UCSB Bio-imaging Infrastructure

December 2006 10

Center for Bioimaging Informatics Screenshots (Digital Notebook)

Page 11: UCSB Bio-imaging Infrastructure

December 2006 11

Center for Bioimaging Informatics

Data management capabilities

• Digital notebook– Capture experimental/image parameters– Direct Import process– Reconfigurable (confocal, 4 x microtubule,

etc)

• Web – Web access and browsing– Organize images and metadata– Data sharing environment– Search by metadata or content– Integrated analysis

Page 12: UCSB Bio-imaging Infrastructure

December 2006 12

Center for Bioimaging Informatics Screenshots (browsing)

Dataset Browsing

Page 13: UCSB Bio-imaging Infrastructure

December 2006 13

Center for Bioimaging Informatics Screenshots (search)

Page 14: UCSB Bio-imaging Infrastructure

December 2006 14

Center for Bioimaging Informatics Screenshots (search)

Page 15: UCSB Bio-imaging Infrastructure

December 2006 15

Center for Bioimaging Informatics Screenshots (search similar)

Page 16: UCSB Bio-imaging Infrastructure

December 2006 16

Center for Bioimaging Informatics Screenshots (search similar)

Page 17: UCSB Bio-imaging Infrastructure

December 2006 17

Center for Bioimaging Informatics

Screenshots (personal collection)

Data sharing

Page 18: UCSB Bio-imaging Infrastructure

December 2006 18

Center for Bioimaging Informatics Screenshots (5D Viewer)

Page 19: UCSB Bio-imaging Infrastructure

December 2006 19

Center for Bioimaging Informatics Screenshot (5D with tracks)

Page 20: UCSB Bio-imaging Infrastructure

December 2006 20

Center for Bioimaging Informatics Image analysis

• Integrated image analysis– Image enhancement– Cell counter (ImageJ)– Quantify microtubule dynamics– Microtubule tracker (manual, automatic)– Track identification

• Integration in progress– Segmentation and classification– Modeling microtubule dynamics– Relevance feedback improving search

Page 21: UCSB Bio-imaging Infrastructure

December 2006 21

Center for Bioimaging Informatics Screenshots (Image Analysis)

Image J Cell

counter

Page 22: UCSB Bio-imaging Infrastructure

December 2006 22

Center for Bioimaging Informatics System description overview

• Current dataset collection• Current functionality

– Data management – Ground truth acquisition– Analysis

• Architecture

Page 23: UCSB Bio-imaging Infrastructure

December 2006 23

Center for Bioimaging Informatics

Hardware/software infrastructure

• Hardware– 16 node Cluster

• dual Intel Xeon 3GHZ• Gigabit network switch• 2 TB Storage

• Software– Bisque – OME (Apache, Postgresql)– Linux

Page 24: UCSB Bio-imaging Infrastructure

December 2006 24

Center for Bioimaging Informatics

Bisque ArchitectureBio-Image Semantic Query User Environment

Image/metadata

server

WEB

DigitalNotebook

image

search

external internal

BISQUE

XML

Research in image processingand indexing

Cell counterlug-in forImageJ

Research in biology

features

analysis

Distributed computing

cluster

Cell counterlug-in forImageJ

Cell counterlug-in forImageJ

Cell counterplug-in for

ImageJ

metadata XML

Page 25: UCSB Bio-imaging Infrastructure

December 2006 25

Center for Bioimaging Informatics OME Base

• Open Microscopy Environment– “OME is an open source software

project to develop a database-driven system for the quantitative analysis of biological images. OME is a collaborative effort among academic labs and a number of commercial entities”.

• Provides base for image/metadata storage and analysis integration

• Boston: Sorger Lab• Baltimore: Goldberg Lab• Dundee: Swedlow Lab• Madison: LOCI

Page 26: UCSB Bio-imaging Infrastructure

December 2006 26

Center for Bioimaging Informatics Bisque extensions

• Ongoing extensions with OME– Content based search– Analysis integration with OME

• Segmentation• Cell Counting• MT identification and Analysis • Etc.

– Front end dataset and analysis support– Schema additions for image types and

analysis– (Uncertainty modeling and queries)

Page 27: UCSB Bio-imaging Infrastructure

December 2006 27

Center for Bioimaging Informatics Bisque

• Built 1st generation w/ >5000 5-D images– Integrated several useful analyses– Being used internally– External Interest

• Immediate future– Continued development

• analysis integration• Dataset integration

– Remote deployments– Integration with other projects

• Flybase : Integration of schema/datasets in progress• Computableplant.org

Page 28: UCSB Bio-imaging Infrastructure

December 2006 28

Center for Bioimaging Informatics Projects

• Bisque/OME– In use at UCSB– http://hammer.ece.ucsb.edu/bisque

• guest/bioimage

• Bisquik– Next generation system (in construction)

Page 29: UCSB Bio-imaging Infrastructure

December 2006 29

Center for Bioimaging Informatics Bisque/OME lessons learned

• Getting the “correct” metadata is hard– Correct may need to change or be

reinterpreted– Sql schema difficult to change

• Getting the “right” name is difficult– Different terms used in different labs make

data integration painful

• Analysis integration needs to be easy– Researcher balk at learning complex software

• Users expect a rich experience– Google, flickr, etc have raised the bar

Page 30: UCSB Bio-imaging Infrastructure

December 2006 30

Center for Bioimaging Informatics New project: Bisquik

• DoughDB: Flexible metadata• Ontology support• Rich user experience

– Web 2.0 : Ajax, SVG – Web based tools: DN, Graphical Annotator

• Programming toolkit– Smaller is better

• Distributed data store• Support for large scale computation• Expected early 2007

Page 31: UCSB Bio-imaging Infrastructure

December 2006 31

Center for Bioimaging Informatics Block Components

Blob/ImageServer

MetadataRenderer

DoughDB

MetadataAnnotation

Analysis Engine

Permission

Ontology

Remote Access

Web UI

Page 32: UCSB Bio-imaging Infrastructure

December 2006 32

Center for Bioimaging Informatics Tagging Screen Shot

Page 33: UCSB Bio-imaging Infrastructure

December 2006 33

Center for Bioimaging Informatics Motivation:

• Current metadata model is inflexible• Adding new experimental images

requires:– Changes to Digital notebook– Changes to Bisque interface– Changes to OME/postgres

• Shouldn’t this be easier?– “Find images tagged with rod-opsin”– “Create a region and specify the object”– “Add experimental metadata to this dataset”

Page 34: UCSB Bio-imaging Infrastructure

December 2006 34

Center for Bioimaging Informatics Bisquik requirements

• Add new tag/value pairs to any db object– (Foo,2)– (visible-cell, rod)

• Allow multiple tags with same value– (visible-cell, rod)– (visible-cell, muller)

• Support fine-grained tag permission/visibility– Tags have creators and access control

• Support update semantics & preserve history– Timestamp tags– No deletes (except under certain conditions)

Page 35: UCSB Bio-imaging Infrastructure

December 2006 35

Center for Bioimaging Informatics Bisquik: DoughDB

image OID1

Feature f1

[ … ]Name GH1020

Foo 2

pixels [OID2 OID3]

image OID1

data Server://…

Pixel-type raw

OID1OID4

OID2

Page 36: UCSB Bio-imaging Infrastructure

December 2006 36

Center for Bioimaging Informatics Bisquik queries

• Find objects (images) with tag “foo”– Select a.foo

• Find images with name “GV100”– Select a.name = “gv100”

• Find images with cellcount < 100– Select a.cellcount < 100

• Find images with a region similar to r1 based on feature f1– Select r.image = i and l2(r.f1, r1[ … ]) < 10

Page 37: UCSB Bio-imaging Infrastructure

December 2006 37

Center for Bioimaging Informatics DoughDB key features

• No classes or types only tag/value pairs– Open ended data model

• Pair values have owner and acl• Preserves history of annotations• SQL like query language • Simple keyword queries

Page 38: UCSB Bio-imaging Infrastructure

December 2006 38

Center for Bioimaging Informatics Bisquik ontology support

• “ontology is a data model that represents a domain and is used to reason about the objects in that domain and the relations between them”

• Unstructured tag/value– Great for taggers– Unhappy searchers

• Different labs use different terms for the same object.

Page 39: UCSB Bio-imaging Infrastructure

December 2006 39

Center for Bioimaging Informatics Bisquik ontology support

• Dictionary of terms and relations• Require (or strongly suggest) that tags

and value are defined before use• Drop into ontology editor when new

values and tags are encountered.• Integrated into search system

– Permit (or offer) ‘alias’ ‘part-of’ ‘related-to’ searches

Page 40: UCSB Bio-imaging Infrastructure

December 2006 40

Center for Bioimaging Informatics Analysis Engine

• Analysis Components• Analysis Applications• Glue language: python• Component connection library

– Memory for single node– MPI for cluster based execution

• Execution engine– Automatic component placement– Resources and efficient communication

Page 41: UCSB Bio-imaging Infrastructure

December 2006 41

Center for Bioimaging Informatics Analysis Engine

• Interfaced with DoughDB• All input/output are Pairs or objects• Application executions recorded for data

provenance

Page 42: UCSB Bio-imaging Infrastructure

December 2006 42

Center for Bioimaging Informatics Bisquik interface

• http://oib.ece.ucsb.edu/bisquik• Some bisque functionality• Flickr-like interface for region tagging• Pair tagging and keyword tagging• Metadata renderers

– Textual (tag)– Graphical (regions, geometry, graph data)– Analysis oriented (histogram)

Page 43: UCSB Bio-imaging Infrastructure

December 2006 43

Center for Bioimaging Informatics Region Tagging screenshot

Page 44: UCSB Bio-imaging Infrastructure

December 2006 44

Center for Bioimaging Informatics Bisquik Metadata Annotation

• Unified offline (Digital Notebook) and online manipulation.

• Easy to build annotation forms• Allow “schema” modification “in field”• Permit annotation templates to be

shared between DN and Bisquik• Graphical geometry annotator

Page 45: UCSB Bio-imaging Infrastructure

December 2006 45

Center for Bioimaging Informatics Blob Image server

• Extensible server for write-once objects– Pixels, Features

• Pluggable transforms/operations– Thumbnails, slices – pixel transforms (watermarks)– Graphical metadata renderers

Page 46: UCSB Bio-imaging Infrastructure

December 2006 46

Center for Bioimaging Informatics Remote Access

• All basic services are web accessible:– Soap and WSDL– DoughDB, Image server,

• DoughDB pairs have unique 64 bit IDs– Split between machine ID/Pair ID– All pairs are addressable– Query engine processes foreign pairs

Page 47: UCSB Bio-imaging Infrastructure

December 2006 47

Center for Bioimaging Informatics Bisquik new hardware

• 10 TB disk mirrored array (20TB)• Image Server• Database (backup) server• 8-16 Query/Analysis nodes

Image Server

DB/Backup Server

10TB 10TB

Query Index

1TB

Query Index

1TB

Query Index

1TB

Query Index

1TB

Query Index

1TB

Query Index

1TB

Query Index

1TB

Query Index

1TB

Page 48: UCSB Bio-imaging Infrastructure

December 2006 48

Center for Bioimaging Informatics Status

• Web UI – Uploading, Tagging, simple searches

• DoughDB– Single node storage and queries

• Analysis Engine– In design

Page 49: UCSB Bio-imaging Infrastructure

December 2006 49

Center for Bioimaging Informatics Conclusion

• Bisque Team: August, Jiejun, Melissa,Kris

• Bisquik Team:– Interface: August Jiejun Jaechok– Analysis Engine: Kris, Mellisa, Dmitry– Ontology: David(summer),

Verleen(summer), Kris– DoughDB: Kris,Vebjorn

Page 50: UCSB Bio-imaging Infrastructure

December 2006 50

Center for Bioimaging Informatics Biowall

Page 51: UCSB Bio-imaging Infrastructure

December 2006 51

Center for Bioimaging Informatics Biowall

• 5x4 1600x1200 displays – (8000x4800 pixels)– 38 MegaPixels

• 1 Head node + 5 slaves• Distributed multihead X (X proxy)• New simplified viewer

– OpenEV image server – Local client – Web/Database client

• 3D visualizations• Working with very large screen areas