Bioimage Informatics for Experimental Biology* · 2012-04-26 · HCS: high content screening...

ANRV376-BB38-16 ARI 27 March 2009 11:9

Bioimage Informaticsfor Experimental Biology∗

Jason R. Swedlow,1 Ilya G. Goldberg,2

Kevin W. Eliceiri,3 and the OME Consortium4

1Wellcome Trust Centre for Gene Regulation and Expression, College of Life Sciences,University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom;email: [email protected] Informatics and Computational Biology Unit, Laboratory of Genetics, NationalInstitute on Aging, IRP, NIH Biomedical Research Center, Baltimore MD 21224;email: [email protected] for Optical and Computational Instrumentation, University of Wisconsin atMadison, Madison, Wisconsin 53706; email: [email protected]://openmicroscopy.org/site/about/development-teams

Annu. Rev. Biophys. 2009. 38:327–46

The Annual Review of Biophysics is online atbiophys.annualreviews.org

This article’s doi:10.1146/annurev.biophys.050708.133641

Copyright c© 2009 by Annual Reviews.All rights reserved

1936-122X/09/0609-0327$20.00

∗The U.S. Government has the right to retain anonexclusive, royalty-free license in and to anycopyright covering this paper.

Key Words

microscopy, file formats, image management, image analysis, imageprocessing

AbstractOver the past twenty years there have been great advances in light mi-croscopy with the result that multidimensional imaging has driven a rev-olution in modern biology. The development of new approaches of dataacquisition is reported frequently, and yet the significant data manage-ment and analysis challenges presented by these new complex datasetsremain largely unsolved. As in the well-developed field of genome bioin-formatics, central repositories are and will be key resources, but there is acritical need for informatics tools in individual laboratories to help man-age, share, visualize, and analyze image data. In this article we presentthe recent efforts by the bioimage informatics community to tacklethese challenges, and discuss our own vision for future development ofbioimage informatics solutions.

327

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

Contents

INTRODUCTION . . . . . . . . . . . . . . . . . . 328FLEXIBLE INFORMATICS

FOR EXPERIMENTALBIOLOGY . . . . . . . . . . . . . . . . . . . . . . . . 329Proprietary File Formats . . . . . . . . . . . 329Experimental Protocols . . . . . . . . . . . . 330Image Result Management . . . . . . . . . 330Remote Image Access . . . . . . . . . . . . . . 330Image Processing and Analysis . . . . . . 330Distributed Processing . . . . . . . . . . . . . 330Image Data and Interoperability . . . . 330

TOWARDS BIOIMAGEINFORMATICS . . . . . . . . . . . . . . . . . . 331

BUILDING BY AND FORTHE COMMUNITY . . . . . . . . . . . . . 331

DELIVERING ON THE PROMISE:STANDARDIZED FILEFORMATS VERSUS “JUST PUTIT IN A DATABASE” . . . . . . . . . . . . . 332

OME: A COMMUNITY-BASEDEFFORT TO DEVELOP IMAGEINFORMATICS TOOLS . . . . . . . . . 333

THE FOUNDATION: THE OMEDATA MODEL . . . . . . . . . . . . . . . . . . . 334

OME FILE FORMATS. . . . . . . . . . . . . . . 334

SUPPORT FOR DATATRANSLATION—BIO-FORMATS . . . . . . . . . . . . . . . . . . . . . . . . 334Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334Modularity . . . . . . . . . . . . . . . . . . . . . . . . 335Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . 335Extensibility . . . . . . . . . . . . . . . . . . . . . . . 335

DATA MANAGEMENTAPPLICATIONS: OMEAND OMERO . . . . . . . . . . . . . . . . . . . . 336

OMERO ENHANCEMENTS:BETA3 AND BETA4 . . . . . . . . . . . . . . 338OMERO.blitz . . . . . . . . . . . . . . . . . . . . . 338Structured Annotations. . . . . . . . . . . . . 339OMERO.search . . . . . . . . . . . . . . . . . . . 339OMERO.java . . . . . . . . . . . . . . . . . . . . . . 339OMERO.editor . . . . . . . . . . . . . . . . . . . . 339OMERO.web. . . . . . . . . . . . . . . . . . . . . . 339OMERO.scripts . . . . . . . . . . . . . . . . . . . 340OMERO.fs . . . . . . . . . . . . . . . . . . . . . . . . 340

WORKFLOW-BASED DATAANALYSIS: WND-CHARM . . . . . . 340

USABILITY . . . . . . . . . . . . . . . . . . . . . . . . . 342SUMMARY AND FUTURE

IMPACT . . . . . . . . . . . . . . . . . . . . . . . . . . 342

HCS: high contentscreening

INTRODUCTION

Modern imaging systems have enabled a newkind of discovery in cellular and developmen-tal biology. With spatial resolutions runningfrom millimeters to nanometers, analysis of celland molecular structure and dynamics is nowroutinely possible across a range of biologicalsystems. The development of fluorescent re-porters, most notably in the form of geneti-cally encoded fluorescent proteins (FPs), com-bined with increasingly sophisticated imagingsystems has enabled direct study of molecularstructure and dynamics (6, 52). Cell and tissueimaging assays have scaled to include all threespatial dimensions, a temporal component, andthe use of spectral separation to measure mul-tiple molecules such that a single image is now

a five-dimensional structure—space, time, andchannel. High content screening (HCS) andfluorescence lifetime, polarization, and corre-lation are all examples of new modalities thatfurther increase the complexity of the mod-ern microscopy dataset. However, multidimen-sional data acquisition generates a significantdata problem: A typical four-year project gen-erates hundreds of gigabytes of images, per-haps on many different proprietary data ac-quisition systems, making hypothesis-drivenresearch dependent on data management,visualization, and analysis.

Bioinformatics is a mature science thatforms the cornerstone of much of modernbiology. Modern biologists routinely use ge-nomic databases to inform their experiments.

328 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

In fact these databases are well-crafted multi-layered applications that include defined datastructures, application programming interfaces(APIs), and use standardized user interfaces toenable querying, browsing, and visualization ofthe underlying genome sequences. These fa-cilities serve as a great model of the sophis-tication necessary to deliver complex, hetero-geneous datasets to bench biologists. However,most genomic resources work on the basis of de-fined data structures with defined formats andknown identifiers that all applications can ac-cess (they also employ expert staff to monitorsystems and databases, a resource that is rarelyavailable in individual laboratories). There is nosingle agreed data format, but a defined num-ber are used in various applications, depend-ing on the exact application (e.g., FASTA andEMBL files). These files are accessed through anumber of defined software libraries that trans-late data into defined data structures that can beused for further analysis and visualization. Be-cause a relatively small number of sequence datageneration and collation centers exist, standardshave been relatively easy to declare and support.Nonetheless, a key to the successful use of thesedata was the development of software applica-tions, designed for use by bench biologists aswell as specialist bioinformaticists, that enabledquerying and discovery based on genomic dataheld by and served from central data resources.

Given this paradigm, the same facilityshould in principle be available for all biologicalimaging data (as well as proteomics and, soon,deep sequencing). In contrast to centralizedgenomics resources, in most cases, these meth-ods are used for defined experiments in indi-vidual laboratories or facilities, and the numberof image datasets recorded by a single postdoc-toral fellow (hundreds to thousands) can easilyrival the number of genomes that have beensequenced to date. For the continued develop-ment and application of experimental biologyimaging methods, it will be necessary to investin and develop informatics resources that pro-vide solutions for individual laboratories anddepartmental facilities. Is it possible to deliverflexible, powerful, and usable informatics tools

Applicationprogramminginterface (API): aninterface providingone software programor library easy accessto its functionalitywith full knowledge ofthe underlying code ordata structures

to manage a single laboratory’s data that arecomparable to that used to deliver genomic se-quence applications and databases to the wholecommunity? Why can’t the tools used in ge-nomics be immediately adapted to imaging?Are image informatics tools from other fieldsappropriate for biological microscopy? In thisarticle, we address these questions, discuss therequirements for successful image informaticssolutions for biological microscopy, and con-sider the future directions that these applica-tions must take to deliver effective solutions forbiological microscopy.

FLEXIBLE INFORMATICS FOREXPERIMENTAL BIOLOGY

Experimental imaging data are by their verynature heterogeneous and dynamic. The chal-lenge is to capture the evolving nature of anexperiment in data structures that by their verynature are specifically typed and static, for laterrecall, analysis, and comparison. Achieving thisgoal in imaging applications means solving anumber of problems.

Proprietary File Formats

There are over 50 different proprietary fileformats (PFFs) used in commercial and aca-demic image acquisition software packages forlight microscopy (34). This number signifi-cantly increases if electron microscopy, newHCS systems, tissue imaging systems, andother new modes of imaging modalities areincluded. Regardless of the specific applica-tion, almost all store data in their own PFFs.Each of these formats includes the binary data(i.e., the values in the pixels) and the meta-data (i.e., the data that describes the binarydata). Metadata include physical pixel sizes,time stamps, spectral ranges, and any othermeasurements or values required to fully de-fine the binary data. Because of the hetero-geneity of microscope imaging experiments,there is no agreed upon community specifi-cation for a minimal set of metadata (see be-low). Regardless, the binary data and metadata

www.annualreviews.org • Biological Image Informatics 329

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

Proprietary fileformats (PFFs):image file data formatsdefined and used byindividual entities

combined form the full output of the micro-scope imaging system, and each software appli-cation must contend with the diversity of PFFsand continually update its support for changingformats.

Experimental Protocols

Sample preparation, data acquisition meth-ods and parameters, and analysis workflow allevolve during the course of a project, and thereare invariably differences in approach even be-tween laboratories doing similar work. Thisevolution reflects the natural progression of sci-entific discovery. Recording this evolution (e.g.,“What exposure time did I use in the exper-iment last Wednesday?”) and providing flexi-bility for changing metadata, especially whennew metadata must be supported, are criticalrequirements for any experimental data man-agement system.

Image Result Management

Many experiments only use a single micro-scope, but the visualization and analysis of im-age data associated with a single experimentcan generate many additional derived files ofvarying formats. Typically, these are stored ona hard disk using arbitrary directory struc-tures. Thus an experimental result typicallyreflects the compilation of many different im-ages, recorded across multiple runs of an exper-iment and associated processed images, analysisoutputs, and result spreadsheets. Keeping thesedisparate data linked so that they can be recalledand examined at a later time is a critical require-ment and a significant challenge.

Remote Image Access

Image visualization requires significant com-putational resources. Many commercial image-processing tools use specific graphics CPUhardware (and thus depend on the accompa-nying driver libraries). Moreover, they often donot work well when analyzing data across a net-work connection to data stored on a remote

file system. As work patterns move to wire-less connections and more types of portabledevices, remote access to image visualizationtools, coupled with the ability to access andrun powerful analysis and processing, will berequired.

Image Processing and Analysis

Substantial effort has gone into the develop-ment of sophisticated image processing andanalysis tools. In genome informatics, the link-age of related but distinct resources [e.g.,WormBase (48) and FlyBase (13)] is possi-ble due to the availability of defined inter-faces that different resources use to provideaccess to underlying data. This facility is crit-ical to enable discovery and collaboration—any algorithm developed to ask a specific ques-tion should address all available data. This isespecially critical as new image methods aredeveloped—an existing analysis tool should notbe made obsolete just because a new file formathas been developed that it does not read. Whenscaled across the large number of analysis tooldevelopers, this is an unacceptable code main-tenance burden.

Distributed Processing

As the sizes and numbers of images increase,access to larger computing facilities will be rou-tinely required by all investigators. Grid-baseddata processing is now available for specificanalyses of genomic data, but the burden ofmoving many gigabytes of data even for a singleexperiment means that distributed computingmust also be made locally available, at least ina form that allows laboratories and facilities toaccess their local clusters or to leverage an in-vestment in multi-CPU, multi-core machines.

Image Data and Interoperability

Strategic collaboration is one of thecornerstones of modern science and fun-damentally consists of scientists sharingresources and data with each other. Biological

330 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

imaging is composed of several specializedsubdisciplines—experimental image acquisi-tion, image processing, and image data mining.Each requires its own domain of expertise andspecialization, which is justified because eachpresents unsolved technical challenges as wellas ongoing scientific research. For a groupspecializing in image analysis to make the bestuse of its expertise, it needs to have access toimage data from groups specializing in acquisi-tion. Ideally, this data should comprise currentresearch questions and not historical imagerepositories that may no longer be scientificallyrelevant. Similarly, groups specializing in datamining and modeling must have access toimage data and to results produced by imageprocessing groups. Ultimately, this drives thedevelopment of useful tools for the communityand certainly results in synergistic collabora-tions that enhance each group’s advances.

TOWARDS BIOIMAGEINFORMATICS

The delivery of solutions for the problems de-tailed above requires the development of a newemerging field known as bioimage informatics(45), which includes the infrastructure and ap-plications that enable discovery of insight usingsystematic annotation, visualization, and analy-sis of large sets of images of biological samples.For applications of bioimage informatics inmicroscopy, we include HCS, in which imagesare collected from arrayed samples and treatedwith large sets of siRNAs or small molecules(46), as well as large sets of time-lapse images(26), collections of fixed and stained cells ortissues (10, 18), and even sets of generatedlocalization patterns (59) that define specificcollections of localization for reference or foranalysis. The development and implementationof successful bioimage informatics tools provideenabling technology for biological discovery inseveral different ways:

� Management: keeping track of data fromlarge numbers of experiments

� Sharing with defined collaborators: al-lowing groups of scientists to compare

Bioimageinformatics:infrastructureincludingspecifications,software, andinterfaces to supportexperimentalbiological imaging

images and analytic tools with one an-other

� Remote access: ability to query, analyze,and visualize without having to connect toa specific file system or use specific videohardware on the user’s computer or mo-bile device

� Interoperability: interfacing of visualiza-tion and analysis programs with any set ofdata, without concern for file format

� Integration of heterogeneous data types:collection of raw data files, analysis re-sults, annotations, and derived figuresinto a single resource that is easily search-able and browsable.

BUILDING BY AND FORTHE COMMUNITY

Given these requirements, how should animage informatics solution be developed anddelivered? It certainly will involve the devel-opment, distribution, and support of softwaretools that must be acceptable to bench biolo-gists and must work with all the existing com-mercial and academic data acquisition, visual-ization, and analysis tools. Moreover, it mustsupport a broad range of imaging approachesand, if at all possible, include the newest modal-ities in light and electron microscopy, supportextensions into clinical research familiar withmicroscopy (e.g., histology and pathology), andprovide the possibility of extension into modal-ities that do not use visible light (MRI, CT,ultrasound). Because many commercial imageacquisition and analysis software packages arealready established as critical research tools, alldesign, development, and testing must assumeand expect integration and interoperability. Ittherefore seems prudent to avoid a traditionalcommercial approach and make this type of ef-fort community led, using open source mod-els that are now well defined. This does notexclude the possibility of successful commer-cial ventures being formed to provide bioimageinformatics solutions to the experimental bio-logical community, but a community-led, opensource approach will be best placed to provide


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

interfaces between all existing academic andcommercial applications.

DELIVERING ON THE PROMISE:STANDARDIZED FILE FORMATSVERSUS “JUST PUT ITIN A DATABASE”

In our experience, there are a few commonlysuggested solutions for biological imaging. Thefirst is a common, open file format for micro-scope imaging. A number of specifications forfile formats have been presented, including ourown (2, 14). Widespread adoption of standard-ized image data formats has been successfulin astronomy (FITS), crystallography (PDB),and clinical imaging (DICOM), where eithermost of the acquisition software is developedby scientists or a small number of commercialmanufacturers adopt a standard defined by theimaging community. Biological microscopy is ahighly fractured market, with at least 40 inde-pendent commercial providers. This combinedwith rapidly developing technology platformsacquiring new kinds of data has stymied effortsat establishing a commonly used data standard.

Against this background, it is worth ask-ing whether defining a standardized format forimaging is at all useful and practical. Standard-ized file formats and minimum data specifica-tions have the advantage of providing a singleor, perhaps more realistically, a small numberof data structures for the community to con-tend with. These facilitate interoperability—visualization and analysis tools developed byone lab may be easily used by another. Thisis an important step for collaboration and al-lows data exchange—moving a large multidi-mensional file from one software applicationto another, or from one lab or center to an-other. However, standardized formats only sat-isfy some of the requirements defined above andprovide none of the search, query, remote ac-cess, or collaboration facilities discussed above,and thus are only a partial solution. However,the expression of a data model in a standardizedfile format, and especially the development ofsoftware that reads and writes that format, is a

useful exercise. It tests the modeling concepts,relationships, and requirements (e.g., “If an ob-jective lens is specified, should the numericalaperture be mandatory?”) and provides a rel-atively easy way for the community to access,use, and comment on the data relationships de-fined by the project. This is an important com-ponent of data modeling and standardizationand should not be minimized. Moreover, whilenot providing most of the functionality definedabove, standardized formats have the practicalvalue of providing a medium for the publish-ing and release of data to the scientific commu-nity. Unlike gene sequence and protein struc-ture data, there is no requirement for releaseof images associated with published results, butthe availability of standardized formats mayfacilitate this.

To provide some of the data manage-ment features described above, labs might useany number of commercial database products(e.g., Microsoft Access®, FileMaker Pro®) tobuild customized local databases on commer-cial foundations. This is certainly a potentialsolution for individual laboratories, but to date,these local database efforts have not simultane-ously dedicated themselves to addressing inter-operability, allowing broad support for alterna-tive analysis and visualization tools that werenot specifically supported when the databasewas built. Perhaps most importantly, single labefforts often emphasize specific aspects of theirown research (e.g., the data model supportsindividual cell lines, but not yeast or wormstrains), and the adaptability necessary to sup-port a range of disciplines across biological re-search, or even their own evolving repertoireof methods and experimental systems, is notincluded.

In no way does this preclude the develop-ment of local or targeted bioimage informaticssolutions. In genomics, several community-initiated informatics projects focused onspecific resources support the various biologicalmodel systems (13, 50, 57). It seems likely thatsimilar projects will grow up around specificbioimage informatics projects, following themodels of the Allen Brain Atlas, E-MAGE, and

332 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

the Cell Centered Database (CCDB) (10, 18,21). In genomics, there is underlying interoper-ability between specialized sources—ultimatelyall of the sequence data as well as the specializedannotation exist in common repositories andformats (e.g., GenBank). Common repositoriesare not yet feasible for multidimensional imagedata, but there will be value in linking throughthe gene identifiers themselves, ontological an-notations, or perhaps, localization maps or setsof phenotypic features, once these are standard-ized (59). Once these links are made to imagesstored in common formats, distributed storagemay effectively accomplish the same thing ascentralized storage.

Several large-scale bioinformatics projectsrelated to interoperability between large bi-ological information datasets have emerged,including caBIG, which focuses on cancer re-search (7); BIRN, which focuses on neurobi-ology with a substantial imaging component(4); BioSig (44), which provides tools for large-scale data analysis; and myGrid, which focuseson simulation, workflows, and in silico exper-iments (25). Projects specifically involved inlarge-scale imaging infrastructure include theProtein Subcellular Location Image Database(PSLID) (16, 24), Bisque (5), CCDB (9, 21),and our own, the Open Microscopy Environ-ment (OME) (31, 55). All these projects wereinitiated to support the specific needs of the bi-ological systems and experiments in each of thelabs driving the development of each project.For example, studies in neuroscience depend ona proper specification for neuroanatomy so thatany image and resulting analysis can be prop-erly oriented with respect to the physiologicalsource. In this case, an ontological frameworkfor neuroanatomy is then needed to supportand compare the results from many differentlaboratories (21). A natural progression is a re-source that enables sharing of specific images,across many different resolution scales, that areas well defined as possible (9). PSLID is an alter-native repository that provides a well-annotatedresource for subcellular localization by fluores-cence microscopy. In all cases these projects arethe result of dedicated, long-term collaboration

CCDB: CellCentered Database

PSLID: ProteinSubcellular LocationImage Database

OME: OpenMicroscopyEnvironment

between computer scientists and biologists, in-dicating that the challenges presented by thisinfrastructure development represent the stateof the art not only in biology but in computingas well. Many if not most of these projects makeuse of at least some common software and datamodels, and although full interoperability is notsomething that can be claimed today, key mem-bers of these projects regularly participate inthe same meetings and working groups. In thefuture, it should be possible for these projectsto interoperate to enable, for example, OMEsoftware to upload to PSLID or CCDB.

OME: A COMMUNITY-BASEDEFFORT TO DEVELOP IMAGEINFORMATICS TOOLS

Since 2000, the Open Microscopy Consortiumhas been working to deliver tools for image in-formatics for biological microscopy. Our origi-nal vision (55), to provide software tools to en-able interoperability between as many imagedata storage, analysis, and visualization appli-cations as possible, remains unchanged. How-ever, the project has evolved and grown sinceits founding to encompass a much broader ef-fort and now includes subprojects dedicated todata modeling (37), file format specification andconversion (34, 35), data management (27), andimage-based machine learning (29). The Con-sortium (28) also maintains links with many aca-demic and commercial partners (32). While thechallenges of running and maintaining a largerConsortium are real, the major benefits are syn-ergies and feedback that develop when our ownproject has to use its own updates to data modelsand file formats. Within the Consortium, thereis substantial expertise in data modeling andsoftware development, and we have adopted aseries of project management tools and prac-tices to make the project as professional as pos-sible, within the limits of working within aca-demic laboratories. Moreover, our efforts occurwithin the context of our own image-based re-search activities. We make no pretense that thissamples the full range of potential applicationsfor our specifications and software, just that our


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

XML: extensiblemarkup language

ideas and work are actively tested and refinedbefore release to the community. Most impor-tantly, the large Consortium means that wecan interact with a larger community, gather-ing requirements and assessing acceptance andnew directions from a broad range of scientificapplications.

THE FOUNDATION: THE OMEDATA MODEL

Since its inception in 2000, the OME Consor-tium has dedicated itself to developing a spec-ification for the metadata associated with theacquisition of a microscope image. Initially, ourgoal was to specify a single data structure thatwould contain spatial, temporal, and spectralcomponents [often referred to as Z, C, T, whichtogether form a 5D image (1)]. This has evolvedinto specifications for the other elements of thedigital microscope system including objectivelenses, fluorescence filter sets, illumination sys-tems, and detectors. This effort has been greatlyaided by many discussions about configurationsand specifications with commercial imaging de-vice manufacturers (32). This work is ongoing,with our current focus being the delivery ofspecifications for regions-of-interest [based onexisting specifications from the geospatial com-munity (42)] and a clear understanding of whatdata elements are required to properly define adigital microscope image. This process is mostefficient when users or developers request up-dates to the OME data model—the project’sWeb site (37) accepts requests for new ormodified features and fixes.

OME FILE FORMATS

The specification of an open, flexible file for-mat for microscope imaging provides a tool fordata exchange between distinct software appli-cations. It is certainly the lowest level of inter-operability, but for many situations it sufficesin its provision of readable, defined structuredimage metadata. OME’s first specification casta full 5D image—binary and metadata—in anXML (extensible markup language) file (14).

Although conceptually sound, a more prag-matic approach is to store binary data as TIFFimages and then link image metadata repre-sented as OME-XML by including it within theTIFF image header or as a separate file (35). Toensure that these formats are in fact defined, wehave delivered an OME-XML and OME-TIFFfile validator (36) that can be used by developersto ensure files follow the OME-XML specifica-tion. As of this writing five commercial compa-nies support these file formats in their softwarewith a “Save as. . .” option, thus enabling exportof image data and metadata to a vendor-neutralformat.

SUPPORT FOR DATATRANSLATION—BIO-FORMATS

PFFs are perhaps the most common infor-matics challenge faced by bench biologists.Despite the OME-XML and OME-TIFF spec-ifications, PFFs will continue to be the dom-inant source of raw image for visualizationand analysis applications for some time. Be-cause all software must contend with PFFs,the OME Consortium has dedicated its re-sources to developing a software library thatcan convert PFFs to a vendor-neutral datastructure—OME-XML. This led to the devel-opment, release, and continued maintenance ofBio-Formats, a standalone Java library for read-ing and writing life sciences image formats. Thelibrary is general, modular, flexible, extensible,and accessible. The project originally grew outof efforts to add support for file formats tothe LOCI VisBio software (40, 49) for visu-alization and analysis of multidimensional im-age data, when we realized that the communitywas in acute need of a broader solution to theproblems created by myriad incompatible mi-croscopy formats.

Utility

Over the years we have repeatedly observedsoftware packages reimplement support forthe same microscopy formats [i.e., ImageJ(17), MIPAV (22), BioImageXD (3), and many

334 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

commercial packages]. The vast majority ofthese efforts focus exclusively on adaptationof formats into each program’s specific inter-nal data model; Bio-Formats (34), in contrast,unites popular life sciences file formats under abroad, evolving data specification provided bythe OME data model. This distinction is criti-cal: Bio-Formats does not adapt data into struc-tures designed for any specific visualization oranalysis agenda, but rather expresses each for-mat’s metadata in an accessible data model builtfrom the ground up to encapsulate a wide rangeof scientifically relevant information. We knowof no other effort within the life sciences withas broad a scope as Bio-Formats and dedicatedtoward delivering the following features.

Modularity

The architecture of the Bio-Formats library issplit into discrete, reusable components thatwork together but are fundamentally separa-ble. Each file format reader is implementedas a separate module extending a commonIFormatReader interface; similarly, each fileformat writer module extends a commonIFormatWriter interface. Both reader andwriter modules utilize the Bio-FormatsMetadataStore API to work with metadatafields in the OME Data Model. Shared logic forencoding and decoding schemes (e.g., JPEGand LZW) are structured as part of the Bio-Formats codec package, so that future readersand writers that need those same algorithmscan leverage them without reimplementingsimilar logic or duplicating any code.

When reading data from a dataset, Bio-Formats provides a tiered collection of readermodules for extracting or restructuring var-ious types of information from the dataset.For example, a client application can instructBio-Formats to compute minimum and maxi-mum pixel values using a MinMaxCalculator,combine channels with a ChannelMerger, splitthem with a ChannelSeparator, or reorderdimensional axes with a DimensionSwapper.Performing several such operations can be ac-complished merely by stacking the relevant

reader modules one on top of the other.Several auxiliary components are also provided;the most significant are a caching packagefor intelligent management of image planes inmemory when storage requirements for the en-tire dataset would be too great, and a suite ofgraphical components for common tasks suchas presenting the user with a file chooser dialogbox or visualizing hierarchical metadata in a treestructure.

Flexibility

Bio-Formats has a flexible metadata API, builtin layers over the OME Data Model itself. Atthe lowest level, the OME Data Model is ex-pressed as an XML schema, called OME-XML,that is continually revised and expanded to sup-port additional metadata fields. An intermedi-ate layer known as the OME-XML Java libraryis produced using code generation techniques,which provides direct access to individual meta-data fields in the OME-XML hierarchy. TheBio-Formats metadata API, which provides asimplified, flattened version of the OME DataModel for flexible implementation by the de-veloper, leverages the OME-XML Java librarylayer and is also generated automatically fromunderlying documents to reduce errors in theimplementation.

Extensibility

Adding a new metadata field to the data model isdone at the lowest level, to the data model itselfvia the OME-XML schema. The supportingcode layers—both the OME-XML Java libraryand the Bio-Formats metadata API—are pro-grammatically regenerated to include the addi-tion. The only remaining task is to add a smallamount of code to each file format reader map-ping the original data field into the appropri-ate location within the standardized OME DataModel.

Although the OME Data Model specifi-cally targets microscopy data, in general, theBio-Formats model of metadata extensibil-ity is ideal for adaptation to alternative data


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

OMERO: OMERemote Objects

RDMS: relationaldatabase managementsystem

models unrelated to microscopy. By adopt-ing a similar pattern for the new data model,and introducing code generation layers corre-sponding to the new model, the Bio-Formatsinfrastructure could easily support addi-tional branches of multidimensional scientificimaging data. In the future the Bio-Formatsinfrastructure will provide significant interop-erability between the multiple established datamodels at points where they overlap by estab-lishing a common base layer between them.

Bio-Formats is written in Java so that thecode can execute on a wide variety of targetplatforms, and code and documentation for in-terfacing Bio-Formats with a number of dif-ferent tools including ImageJ, MATLAB, andIDL are available (34). We provide documen-tation on how to use Bio-Formats both as anend user and as a software developer, includ-ing hints on leveraging Bio-Formats from otherprogramming environments such as C++,Python, or a command shell. We have success-fully integrated Bio-Formats with native ac-quisition software written in C++ using ICEmiddleware (58).

DATA MANAGEMENTAPPLICATIONS: OMEAND OMERO

Data management is a critical application formodern biological discovery, and in particularnecessary for biological imaging because of thelarge heterogeneous datasets generated duringdata acquisition and analysis. We define datamanagement as the collation, integration, an-notation, and presentation of heterogeneousexperimental and analytic data in ways that en-able the physical, temporal, and conceptual re-lationships in experimental data to be capturedand represented to users. The OME Consor-tium has built two data management tools—theoriginal OME Server (29) and the recently re-leased OME Remote Objects (OMERO; a portof the basic image data management functional-ity to a Java enterprise application) applicationplatform (30). Both applications are now heavilyused worldwide, but our development focus has

shifted from the OME Server toward OMERO,and that is where most future advances willoccur.

The OME data management applicationsare specifically designed to meet the require-ments and challenges described above, enablingthe storage, management, visualization, andanalysis of digital microscope image data andmetadata. The major focus of this work is noton creating novel analysis algorithms, but in-stead on development of a structure that ulti-mately allows any application to read and useany data associated with or generated fromdigital microscope images.

A fundamental design concept in the OMEdata management applications is the separationof image storage, management, analysis, and vi-sualization functions between a lab’s or imag-ing facility’s server and a client application (e.g.,Web browser or Java user interface). This con-cept mandates the development of two facilities:a server that provides all data management, ac-cess control, and storage, and a client that runson a user’s desktop workstation or laptop andthat provides access to the server and the datavia a standard Internet connection. The key tomaking this strategy work is judicious choice ofthe functionality placed on client and server toensure maximal performance.

The technical design details and principlesof both systems have recently been described(23) and are available online (39). In brief, boththe OME Server and the OMERO platform(Figure 1) use a relational database manage-ment system (RDMS) [PostgreSQL (47)] toprovide all aspects of metadata managementand an image repository to house all the bi-nary pixel data. Both systems then use a mid-dleware application to interact with the RDMSand read and write data from the image repos-itory. The middleware applications include arendering engine that reads binary data fromthe image repository and renders it for dis-play by the client, and if necessary, compressesthe image to reduce the bandwidth require-ments for transferring across a network con-nection to a client. The result is access to high-performance data visualization, management,

336 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

OMERO platform(Beta3 and Beta4)

Data processing/scripting(C, C++,

Python, Matlab)

FirewallFirewall

WAN users(.insight

and.importer)

LAN users(.insight

and.importer)

OME Server 2.6.x

Imagerepository

Relationaldatabase

(PostgreSQL)

Libomeisio

OME-JAVA

Shoola MarinoWeb browseruser interface

Web interface

Apacheand

mod_perl

OME:DBObject

XML-RPCremoting

Imageserver

Dataserver

a b

HTTP

ApacheandCGI

Imagerepository

Relationaldatabase

(PostgreSQL)

Domain logic

Java RMI

Imageserver

Dataserver

WAN users(.web)

Rendering service

OMERO.blitz

NIO connector

Figure 1Architecture of the OME and OMERO servers and client applications. (a) Architecture of the OME Server, built using Perl for most ofthe software code and an Apache Web server. The main client application for the server is a Web browser-based interface. (b) Thearchitecture of the OMERO platform, including OMERO.server and the OMERO clients. OMERO is based on the JBOSS JavaEEframework, but it also includes an alternative remoting architecture called ICE (58). For more details, see Reference 39.

and analysis in a remote setting. Both the OMEServer (Figure 1a) and OMERO (Figure 1b)also provide well-developed data querying fa-cilities to access metadata, annotations, and an-alytics from the RDMS. For user interfaces,the OME Server includes a Web browser-basedinterface that provides access to image, anno-tation, analytics, and visualization and also aJava interface (OME-JAVA) and remote client(Shoola) to support access from remote clientapplications. OMERO includes separate Java-based applications for uploading data to anOMERO server (OMERO.importer), for visu-alizing and managing data (OMERO.insight),and for Web browser-based server administra-tion (OMERO.webadmin).

The OME Server has been installed in hun-dreds of facilities worldwide; however, aftersignificant development effort it became clear

that the application, which we worked on forfive years (2000–2005), had three major flaws.(a) The installation was too complex, and tooprone to failure. (b) Our object-relational map-ping library (“DBObject”) was all custom code,developed by OME, and required significantcode maintenance effort to maintain compat-ibility with new versions of Linux and Perl(Figure 1a). Support for alternative RDMSs(e.g., Oracle®) was possible in principle butrequired significant work. (c) The data trans-port mechanisms available to us in a Perl-based architecture amounted to XML-RPC andSOAP. Although totally standardized and pro-moting interoperability, this mechanism, withits requirement for serialization/deserializationof large data objects, was too slow for work-ing with remote client applications—simplequeries with well-populated databases could


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

take minutes to transfer from server toclient.

With work, problem a became less of an is-sue, but problems b and c remained significantfundamental barriers to delivery of a great im-age informatics application to end users. Forthese reasons, we initiated work on OMERO.In taking on this project, it was clear that thecode maintenance burden needed to be sub-stantially reduced, the system must be simpleto install, and the performance of the remotingsystem must be significantly improved. A ma-jor design goal was the reduction of self-writtencode through the reuse of existing middlewareand tools where possible. In addition, OMEROmust support as broad a range of client applica-tions as possible, enabling the development ofnew user interfaces, as well as a wide range ofdata analysis applications.

We based the initial implementation ofOMERO’s architecture (Figure 1b) on theJavaEE5 specification, as it appeared to havewide uptake, clear specifications, and high per-formance libraries in active development froma number of projects. A full specification anddescription of OMERO.server is available (23).The architecture follows accepted standardsand consists of services implemented as EJB3session beans (53) that make use of Hibernate(15), a high-performance object-relationalmapping solution, for metadata retrievalfrom the RDMS. Connection to clients is viaJava Remote Method Invocation ( Java RMI)(54). All released OMERO remote applica-tions are written in Java and cross-platform.OMERO.importer uses the Bio-Formatslibrary to read a range of file formats andload the data into an OMERO.server, alongwith simple annotations and assignment to theOME Project-Dataset-Image experimentalmodel (for demonstrations, see Reference 33).OMERO.insight includes facilities for manag-ing, annotating, searching, and visualizing datastored in an installation of OMERO.server.OMERO.insight also includes simple lineand region-of-interest measurements andthus supports the simplest forms of imageanalysis.

OMERO ENHANCEMENTS:BETA3 AND BETA4

Through 2007, the focus of the OMEROproject has been on data visualization andmanagement, all the while laying the infras-tructure for data analysis. With the releaseof OMERO3-Beta2, we began adding func-tionality that has the foundation for deliver-ing a fully developed image informatics frame-work. In this section, we summarize the majorfunctional enhancements that are being deliv-ered in OMERO-Beta3 (released June 2008)and OMERO-Beta4 (released February 2009).Further information on all the items describedbelow is available at the OMERO documenta-tion portal (39).

OMERO.blitz

Starting with OMERO-Beta3, we provided in-teroperability with many different program-ming environments. We chose an ICE-basedframework (58) rather than the more popularWeb services–based GRID approaches becauseof the absolute performance requirements wehad for the passage of large binary objects (im-age data) and large data graphs (metadata trees)between server and client. Our experience usingWeb services and XML-based protocols withthe Shoola remote client and the OME Servershowed that Web services, while standardizedin most genomic applications, were inappropri-ate for client-server transfer of the much largerdata graphs we required. Most importantly,the ICE framework provided immediate sup-port for multiple programming environments(C, C++, and Python are critical for our pur-poses) and a built-in distribution mechanism[IceGRID (58)] that we have adapted to deliverOMERO.grid (39), a process distribution sys-tem. OMERO.blitz is three to four times fasterthan Java RMI and we are currently examin-ing migrating our Java API and the OMEROclients from JBOSS to OMERO.blitz. Thisframework provides substantial flexibility—interacting with data in OMERO can be assimple as starting the Python interpreter and

338 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

interacting with OMERO via the console. Mostimportantly, this strategy forms the foundationfor our future work as we can now leverage theadvantages and existing functionality in cross-platform Java, native C and C++, and scriptedPython for rapidly expanding the functionalityin OMERO.

Structured Annotations

Beginning with OMERO-Beta3, users can at-tach any type of data to an image or otherOMERO data container—text, URL, or otherdata files (e.g., .doc, .pdf, .xls, .xml) providingessentially the same flexibility as email attach-ments. The installation of this facility followedfeedback from users and developers concerningthe strategy for analysis management built intothe OME Server. The underlying data modelsupported hard semantic typing in which eachanalysis result was stored in relational tableswith names that could be defined by the user(23, 55). This approach, although conceptuallydesirable, proved too complex and burdensome.As an alternative, OMERO uses Structured An-notations to store any kind of analysis result asuntyped data, defined only by a unique nameto ensure that multiple annotations are eas-ily distinguished. The data are not queryableby standard SQL, but any text-based file canbe indexed and therefore found by users. In-terestingly, Bisque has implemented a similarapproach (5), enabling tags with defined struc-tures that are otherwise completely customizedby the user. In both cases, whether this flexi-ble strategy provides enough structure to man-age large sets of analysis results will have to beassessed.

OMERO.search

As of OMERO-Beta3, OMERO includes atext-indexing engine based on Lucene (19),which can be used to provide indexed-basedsearches for all text-based metadata in anOMERO database. This includes metadataand annotations stored within the OMERO

database and also any text-based documents orresults stored as Structured Annotations.

OMERO.java

As of OMERO-Beta3, we have releasedOMERO.java, which provides access for all ex-ternal Java applications via the OMERO.blitzinterface. As a first test of this facility, we areusing analysis applications written in MATLABas client applications to read from and write toOMERO.server. As a demonstration of the util-ity of this library, we have adapted the popu-lar open source MATLAB-based image analy-sis tool CellProfiler (8) to work as a client ofOMERO, using the MATLAB Java interface.

OMERO.editor

In OMERO-Beta3, we also releasedOMERO.editor, a tool to help experimentalbiologists define their own experimental datamodels and, if desired, use other specifieddata models in their work. It allows users tocreate a protocol template and to populate thiswith experimental parameters. This creates acomplete experimental record in one XML file,which can be used to annotate a microscopeimage or exchanged with other scientists.OMERO.editor supports the definition ofcompletely customized experimental protocolsbut also includes facilities to easily importdefined data models [e.g., MAGE-ML (56)and OME-XML (14)] and support for allontologies included in the Ontology LookupService (11).

OMERO.web

Staring with OMERO-Beta4, we will release aWeb browser-based client for OMERO.server.This new client is targeted specifically totruly remote access (different country, lim-ited bandwidth connections), especially wherecollaboration with other users is concerned.OMERO.web includes all the standard func-tions for importing, managing, viewing, and an-notating image data. However, a new function is


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

the ability to share specific sets of data with an-other user on the system—this allows password-protected access to a specific set of data that caninitiate or continue data sharing between twolab members or two collaborating scientists.OMERO.web also supports a publish function,in which a defined set of data is published tothe world via a public URL. OMERO.web usesthe Python API in OMERO.blitz for access toOMERO.server using the Django framework(12).

OMERO.scripts

In OMERO-Beta4, we will extend the analysisfacility provided by OMERO.java to provide ascripting engine, based on Python scripts andthe OMERO.blitz interface. OMERO.scriptsis a scripting engine that reads and executesfunctions cast in Python scripts. Scripts arepassed to processors specified by OMERO.gridthat can be on the local server or on networkedcomputing facilities. This is the facility that willprovide support for analysis of large image setsor of calculations that require simple linear orbranched workflows.

OMERO.fs

Finally, a fundamental design principle ofOMERO.server is the presence of a single im-age repository for storing binary image datathat is tightly integrated with the server appli-cation. This is the basis of the import model,which is the only way to get image data intoan OMERO.server installation—data are up-loaded to the server, and binary data are storedin the single image repository. In many cases,as the storage space required expands, multiplerepositories must be supported. Moreover, dataimport takes significant time and, especiallywith large datasets, can be prohibitive. A so-lution to this involves using the OMERO.blitzPython API to access the file system search andnotification facilities that are now provided aspart of the Windows, Linux, and OS X op-erating systems. In this scenario, an OMEROclient application, OMERO.fs, sits between the

file system and OMERO.blitz and provides ametadata service that scans user-specified imagefolders or file systems and reads image meta-data into an OMERO relational database usingPFF translation provided by Bio-Formats. Asthe coverage of Bio-Formats expands, this ap-proach means that essentially any data can beloaded into an OMERO.server instance.

WORKFLOW-BASED DATAANALYSIS: WND-CHARM

WND-CHARM is an image analysis algorithmbased on pattern recognition (43). It relies onsupervised machine learning to solve imageanalysis problems by example rather than byusing a preconceived perceptual model of whatis being imaged. An advantage of this approachis its generality. Because the algorithms used toprocess images are not task specific, they canbe used to process any image regardless of theimaging modality or the image’s subject. Sim-ilar to other pattern recognition algorithms,WND-CHARM first decomposes each imageto a set of predefined numeric image descrip-tors. Image descriptors include measures of tex-ture, factors in polynomial decompositions, andvarious statistics of the image as a whole, aswell as measurements and distribution of high-contrast objects in the image. The algorithmsthat extract these descriptors (features) oper-ate on both the original image pixels as wellas transforms of the original pixels (Fourier,wavelet, etc). Together, there are 17 indepen-dent algorithms comprising 53 computationalnodes (algorithms used along specific upstreamdata flows), with 189 links (data flows) produc-ing 1025 numeric values (Figure 2). Althoughthe entire set of features can be modeled as asingle algorithm, this set is by no means com-plete and will grow to include other algorithmsthat extract both more specific and more gen-eral image content. The advantage of modelingthis complex workflow as independently func-tional units is that new units can be easily addedto the existing ones. This workflow model istherefore more useful to groups specializing inpattern recognition. Conversely, a monolithic

340 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

a

Edge statistics

Feature values: 28

Group C

Feature values: 106

Radon transform statistics

Feature values: 12

First four moments

Feature values: 48

Haralick textures

Feature values: 28

Multiscale histogram

Feature values: 24

Tamura textures

Feature values: 6

Gabor textures

Feature values: 7

Object statistics

Feature values: 34

Raw image

Group AHigh-contrast features

Group BPolynomial decompositions

Group CStatistics and textures

Group DStatistics and textures + radon

Wavelettransform

Chebyshevtransform FFT

Group C Group D Group C Group D Group B

Feature vector (1025 feature values)

Group D Group A Group B Group D

Chebyshev-Fourier

statistics

Feature values: 32

Chebyshev statistics

Feature values: 32

Zernike polynomials

Feature values: 72

b

Wavelettransform

Chebyshevtransform

Figure 2Workflows in WND-CHARM. (a) List of feature types calculated by WND-CHARM. (b) Workflow of feature calculations inWND-CHARM. Note that different feature groups use different sets of processing tools.

representation of this workflow is probablymore practical when implemented in a biol-ogy lab that would use a standard set of im-age descriptors applied to various imaging ex-periments. In neither case, however, shouldanyone be particularly concerned with whatformat was used to capture these images, orhow they are represented in a practical imag-ing system. WND-CHARM is an example ofa highly complex image-processing workflowand as such represents an important applica-tion for any system capable of managing work-flows and distributed processing for image anal-ysis. Currently, the fully modularized version of

WND-CHARM runs only on the OME Server.In the near future, the monolithic version ofWND-CHARM (51) will be implemented us-ing OMERO.blitz.

The raw results from a pattern recognitionapplication are annotations assigned to wholeimages or image regions. These annotations areprobabilities (or simply scores) that the imageor region-of-interest belongs to a previously de-fined training set. In a dose-response experi-ment, for example, the training set may con-sist of control doses defining a standard curve,and the experimental images would be assignedan equivalent dose by the pattern recognition


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

algorithm. Whereas the original experimentmay be concerned with characterizing a col-lection of chemical compounds, the same im-age data could be analyzed in the context ofa different set of training images—one definedby RNA interference, for example. When usingthese algorithms our group has found that per-forming these in silico experiments to reprocessexisting image data in different contexts can befruitful.

USABILITY

All the functionality discussed above mustbe built into OMERO.server and thendelivered in a functional, usable fashionwithin the OMERO client applicationsOMERO.importer, OMERO.insight, andOMERO.web. This development effort isachieved by the OMERO development teamand is invariably an iterative process thatrequires testing by our local community, aswell as sampling feedback from the broadercommunity of users. Therefore, the OMEROproject has made software usability a prioritythroughout the project. A key challenge forthe OME Consortium has been to improve thequality of the end user (i.e., the life scientistat their bench) experience. The first versionsof OME software, the OME Server, providedsubstantial functionality but never receivedwide acceptance, despite dedicated work,mostly because its user interfaces were toocomplicated and the developed code, whileopen and available, was too complex for otherdevelopers to adopt and extend. In responseto this failure, we initiated the Usable Imageproject (41) to apply established methods fromthe wider software design community, such asuser-centered design and design ethnography(20), to the OME development process. Ourgoals were to initially improve the usability andaccessibility of the OMERO client software andto provide a paradigm useful for the broadere-science and bioinformatics communities.The result of this combined usability and devel-opment effort has been a high level of successand acceptance of OMERO software. A wholly

unanticipated outcome has been the commit-ment to the user-centered design process byboth users and developers. The investmentin iterative, agile development practice hasproduced rapid, substantial improvementsthat the users appreciate, which in turn makesthem more enthusiastic about the software. Onthe other hand, the developers have reliable,well-articulated requirements that, whenimplemented in software, are rewarded withmore frequent use. This positive-feedbackloop has transformed our development processand made usability analysis a core part ofour development cycles. It has also forced acommitment to the development of usablecode—readable, well-documented, tested, andcontinuously integrated—and the provision ofup-to-date resources defining architecture andcode documentation (38, 39).

SUMMARY AND FUTURE IMPACT

In this article we have focused on the OMEConsortium’s efforts (namely OME-XML,Bio-Formats, OMERO, and WND-CHARM),as we feel they are representative of thecommunity-wide attempts to address many ofthe most pressing challenges in bioimaginginformatics. While OME is committed todeveloping and releasing a complete imageinformatics infrastructure focused on the needsof the end user bench biologist, we are at leastequally committed to the concept that beyondour software, our approach is part of a criticalshift in how the challenges of data analysis,management, sharing, and visualization havebeen traditionally addressed in biology. Inparticular the OME Consortium has putan emphasis on flexibility, modularity, andinclusiveness that targets not only the benchbiologist but also importantly the informaticsdeveloper to help ensure maximum implemen-tation of and penetration into the bioimagingcommunity. Key to this has been a dedicationto allowing the biologist to retain and captureall available metadata and binary data from dis-parate sources, including proprietary ones, tomap these data to a flexible data model, and to

342 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

analyze these data in whatever environment heor she chooses. This ongoing effort requiresan interdisciplinary approach that combinesconcepts from traditional bioinformatics,ethnography, computer science, and datavisualization. It is our intent and hope thatthe bioimage informatics infrastructure thatis developed by the OME Consortium willcontinue to have utility for its principal targetcommunity of experimental bench biologists,and also serve as a collaborative frameworkfor developers and researchers from other

closely related fields who might want toadopt the methodologies and code-basedapproaches for informatics challenges thatexist in other communities. Interdisciplinarycollaboration between biologists, physicists,engineers, computer scientists, ethnographers,and software developers is absolutely necessaryfor the successful maturation of the bioimageinformatics community, and it will play an evenlarger role as this field evolves to fully supportthe continued evolution of imaging in modernexperimental biology.

SUMMARY POINTS

1. Advances in digital microscopy have driven the development of a new field, bioimageinformatics. This field encompasses the storage, querying, management, analysis, andvisualization of complex image data from digital imaging systems used in biology.

2. Although standardized file formats have often been proposed to be sufficient to providethe foundation for bioimage informatics, the prevalence of PFFs and the rapidly evolvingdata structures needed to support new developments in imaging make this impractical.

3. Standardized APIs and software libraries enable interoperability, which is a critical unmetneed in cell and developmental biology.

4. A community-driven development project is best placed to define, develop, release, andsupport these tools.

5. A number of bioimage informatics initiatives are underway, and collaboration and inter-action are developing.

6. The OME Consortium has released specifications and software tools to support bioimageinformatics in the cell and developmental biology community.

7. The next steps in software development will deliver increasingly sophisticated infras-tructure applications and should deliver powerful data management and analysis tools toexperimental biologists.

FUTURE ISSUES

1. Further development of the OME Data Model must keep pace with and include advancesin biological imaging, with a particular emphasis on improving support for image analysismetadata and enabling local extension of the OME Data Model to satisfy experimentalrequirements with good documentation and examples.

2. Development of Bio-Formats to include as many biological image file formats as possibleand extension to include data from non-image-based biological data.


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

3. Continue OMERO development as an image management system with a particular em-phasis on ensuring client application usability and the provision of sophisticated imagevisualization and analysis tools.

4. Support both simple and complex analysis workflow as a foundation for common use ofdata analysis and regression in biological imaging.

5. Drive links between the different bioimage informatics enabling transfer of data betweeninstances of the systems so that users can make use of the best advantages of each.

DISCLOSURE STATEMENT

The authors are not aware of any biases that might be perceived as affecting the objectivity of thisreview.

ACKNOWLEDGMENTS

JRS is a Wellcome Trust Senior Research Fellow and work in his lab on OME is supportedby the Wellcome Trust (Ref 080087 and 085982), BBSRC (BB/D00151X/1), and EPSRC(EP/D050014/1). Work in IGG’s lab is supported by the National Institutes of Health. TheOME work in KWE’s lab is supported by NIH grants R03EB008516 and R01EB005157.

LITERATURE CITED

1. Andrews PD, Harper IS, Swedlow JR. 2002. To 5D and beyond: quantitative fluorescence microscopy inthe postgenomic era. Traffic 3:29–36

2. Berman J, Edgerton M, Friedman B. 2003. The tissue microarray data exchange specification: acommunity-based, open source tool for sharing tissue microarray data. BMC Med. Inform. Decis. Making3:5

3. BioImageXD. 2008. BioImageXD—open source software for analysis and visualization of multidimensionalbiomedical data. http://www.bioimagexd.net/

4. BIRN. 2008. http://www.nbirn.net/5. Centre for Bio-Image Informatics. 2008. Bisque database. http://www.bioimage.ucsb.edu/downloads/

Bisque%20Database6. Bullen A. 2008. Microscopic imaging techniques for drug discovery. Nat. Rev. Drug Discov. 7:54–677. caBIG Community Website. http://cabig.nci.nih.gov/8. Carpenter A, Jones T, Lamprecht M, Clarke C, Kang I, et al. 2006. CellProfiler: image analysis software

for identifying and quantifying cell phenotypes. Genome Biol. 7:R1009. Cell Centered Database. 2008. http://ccdb.ucsd.edu/CCDBWebSite/

10. Christiansen JH, Yang Y, Venkataraman S, Richardson L, Stevenson P, et al. 2006. EMAGE: a spatialdatabase of gene expression patterns during mouse embryo development. Nucleic Acids Res. 34:D637–41

11. Cote RG, Jones P, Martens L, Apweiler R, Hermjakob H. 2008. The Ontology Lookup Service: moredata and better tools for controlled vocabulary queries. Nucleic Acids Res. 36:W372–76

12. Django Project. 2008. http://www.djangoproject.com/13. Drysdale R, FlyBase Consortium. 2008. FlyBase: a database for the Drosophila research community. Methods

Mol. Biol. 420:45–5914. Goldberg IG, Allan C, Burel J-M, Creager D, Falconi A, et al. 2005. The Open Microscopy Environment

(OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging.Genome Biol. 6:R47

15. hibernate.org. 2008. http://www.hibernate.org/

344 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

16. Huang K, Lin J, Gajnak JA, Murphy RF. 2002. Image content-based retrieval and automated interpretationof fluorescence microscope images via the protein subcellular location image database. Proc. 2002 IEEEInt. Symp. Biomed. Imaging 325–28

17. Image J. 2008. http://rsbweb.nih.gov/ij/18. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, et al. 2007. Genome-wide atlas of gene expression

in the adult mouse brain. Nature 445:168–7619. Apache Lucene. 2008. Overview. http://lucene.apache.org/java/docs/20. Macaulay C, Benyon D, Crerar A. 2000. Ethnography, theory and systems design: from intuition to

insight. Int. J. Hum. Comput. Stud. 53:35–6021. Martone ME, Tran J, Wong WW, Sargis J, Fong L, et al. 2008. The Cell Centered Database project:

an update on building community resources for managing and sharing 3D imaging data. J. Struct. Biol.161:220–31

22. MIPAV. 2008. http://mipav.cit.nih.gov/23. Moore J, Allan C, Burel J-M, Loranger B, MacDonald D, et al. 2008. Open tools for storage and man-

agement of quantitative image data. Methods Cell Biol. 85:555–7024. Murphy RF. 2008. Murphy lab—Imaging services—PSLID. http://murphylab.web.cmu.edu/services/

PSLID/25. myGrid. 2008. http://www.mygrid.org.uk/26. Neumann B, Held M, Liebel U, Erfle H, Rogers P, et al. 2006. High-throughput RNAi screening by

time-lapse imaging of live human cells. Nat. Methods 3:385–9027. OME Consortium. 2008. Data management. http://www.openmicroscopy.org/site/documents/data-

management28. OME Consortium. 2008. Development teams. http://openmicroscopy.org/site/about/development-

teams29. OME Consortium. 2008. OME server. http://openmicroscopy.org/site/documents/data-

management/ome-server30. OME Consortium. 2008. OMERO platform. http://openmicroscopy.org/site/documents/data-

management/omero31. OME Consortium. 2008. http://openmicroscopy.org/site32. OME Consortium. 2008. Partners. http://openmicroscopy.org/site/about/partners33. OME Consortium. 2008. Videos for Quicktime. http://openmicroscopy.org/site/videos34. OME Consortium. 2008. OME at LOCI—software—Bio-formats library. http://www.loci.wisc.edu/

ome/formats.html35. OME Consortium. 2008. OME at LOCI—OME-TIFF—overview and rationale. http://www.loci.

wisc.edu/ome/ome-tiff.html36. OME Consortium. 2008. OME validator. http://validator.openmicroscopy.org.uk/37. OME Consortium. 2008. http://ome-xml.org/38. OME Consortium. 2008. Continuous build & integration system for the Open Microscopy Environment project.

http://hudson.openmicroscopy.org.uk39. OME Consortium. 2008. ServerDesign. http://trac.openmicroscopy.org.uk/omero/wiki/ServerDesign40. OME Consortium. 2008. VisBio—introduction. http://www.loci.wisc.edu/visbio/41. OME Consortium. 2008. The usable image project. http://usableimage.org42. OpenGIS®. 2008. Standards and specifications. http://www.opengeospatial.org/standards43. Orlov N, Shamir L, Macura T, Johnston J, Eckley DM, Goldberg IG. 2008. WND-CHARM: multi-

purpose image classification using compound image transforms. Pattern Recognit. Lett. 29:1684–9344. Parvin B, Yang Q, Fontenay G, Barcellos-Hoff MH. 2003. BioSig: An imaging and informatic system for

phenotypic studies. IEEE Trans. Syst. Man Cybern. B 33:814–2445. Peng H. 2008. Bioimage informatics: a new area of engineering biology. Bioinformatics 24:1827–3646. Pepperkok R, Ellenberg J. 2006. High-throughput fluorescence microscopy for systems biology.

Nat. Rev. Mol. Cell Biol. 7:690–9647. PostgreSQL. 2008. http://www.postgresql.org/48. Rogers A, Antoshechkin I, Bieri T, Blasiar D, Bastiani C, et al. 2008. WormBase 2007. Nucleic Acids Res.

36:D612–17


Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

ANRV376-BB38-16 ARI 27 March 2009 11:9

49. Rueden C, Eliceiri KW, White JG. 2004. VisBio: a computational tool for visualization of multidimen-sional biological image data. Traffic 5:411–17

50. Saccharomyces Genome Database. 2008. http://www.yeastgenome.org/51. Shamir L, Orlov N, Eckley DM, Macura T, Johnston J, Goldberg IG. 2008. Wndcharm—an open source

utility for biological image analysis. Source Code Biol. Med. 3:1352. Shaner NC, Campbell RE, Steinbach PA, Giepmans BN, Palmer AE, Tsien RY. 2004. Improved

monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein.Nat. Biotechnol. 22:1567–72

53. Sun Microsystems. 2008. Enterprise Javabeans Technology. http://java.sun.com/products/ejb54. Sun Microsystems. 2008. Remote method invocation home. http://java.sun.com/javase/technologies/core/

basic/rmi/index.jsp55. Swedlow JR, Goldberg I, Brauner E, Sorger PK. 2003. Informatics and quantitative analysis in biological

imaging. Science 300:100–256. Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, et al. 2006. The MGED Ontology: a resource

for semantics-based description of microarray experiments. Bioinformatics 22:866–7357. WormBase. 2008. http://www.wormbase.org/58. ZeroCTM. 2008. http://www.zeroc.com/59. Zhao T, Murphy RF. 2007. Automated learning of generative models for subcellular location: Building

blocks for systems biology. Cytometry A 71:978–90

346 Swedlow et al.

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

AR376-FM ARI 23 March 2009 21:15

Annual Review ofBiophysics

Volume 38, 2009Contents

FrontispieceSunney I. Chan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �xii

A Physical Chemist’s Expedition to Explore the Worldof Membrane ProteinsSunney I. Chan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1

Crystallizing Membrane Proteins for Structure Determination:Use of Lipidic MesophasesMartin Caffrey � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �29

Advances in Imaging Secondary Ion Mass Spectrometryfor Biological SamplesSteven G. Boxer, Mary L. Kraft, and Peter K. Weber � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �53

Controlling Proteins Through Molecular SpringsGiovanni Zocchi � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �75

Electron Crystallography as a Technique to Study the Structureon Membrane Proteins in a Lipidic EnvironmentStefan Raunser and Thomas Walz � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �89

Nuclear Envelope Formation: Mind the GapsBanafshé Larijani and Dominic L. Poccia � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 107

The Interplay of Catalysis and Toxicity by Amyloid Intermediateson Lipid Bilayers: Insights from Type II DiabetesJames A. Hebda and Andrew D. Miranker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 125

Advances in High-Pressure Biophysics: Status and Prospectsof Macromolecular CrystallographyRoger Fourme, Eric Girard, Richard Kahn, Anne-Claire Dhaussy,

and Isabella Ascone � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 153

Imaging Transcription in Living CellsXavier Darzacq, Jie Yao, Daniel R. Larson, Sébastien Z. Causse, Lana Bosanac,

Valeria de Turris, Vera M. Ruda, Timothee Lionnet, Daniel Zenklusen,Benjamin Guglielmi, Robert Tjian, and Robert H. Singer � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 173

v

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

AR376-FM ARI 23 March 2009 21:15

A Complex Assembly Landscape for the 30S Ribosomal SubunitMichael T. Sykes and James R. Williamson � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 197

Mechanical Signaling in Networks of Motor and Cytoskeletal ProteinsJonathon Howard � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 217

Biochemical and Structural Properties of the Integrin-AssociatedCytoskeletal Protein TalinDavid R. Critchley � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 235

Single-Molecule Approaches to Stochastic Gene ExpressionArjun Raj and Alexander van Oudenaarden � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 255

Comparative Enzymology and Structural Biology of RNASelf-CleavageMartha J. Fedor � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 271

Particle-Tracking Microrheology of Living Cells: Principlesand ApplicationsDenis Wirtz � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 301

Bioimage Informatics for Experimental BiologyJason R. Swedlow, Ilya G. Goldberg, Kevin W. Eliceiri, and the OME Consortium � � � � 327

Site-Directed Spectroscopic Probes of ActomyosinStructural DynamicsDavid D. Thomas, David Kast, and Vicci L. Korman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 347

Lessons from Structural GenomicsThomas C. Terwilliger, David Stuart, and Shigeyuki Yokoyama � � � � � � � � � � � � � � � � � � � � � � � � � 371

Structure and Dynamics of Membrane Proteins by Magic AngleSpinning Solid-State NMRAnn McDermott � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 385

Index

Cumulative Index of Contributing Authors, Volumes 34–38 � � � � � � � � � � � � � � � � � � � � � � � � � � � 405

Errata

An online log of corrections to Annual Review of Biophysics articles may be found athttp://biophys.annualreviews.org/errata.shtml

vi Contents

Ann

u. R

ev. B

ioph

ys. 2

009.

38:3

27-3

46. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by C

AR

NE

GIE

-ME

LL

ON

UN

IVE

RSI

TY

on

01/0

8/10

. For

per

sona

l use

onl

y.

Bioimage Informatics for Experimental Biology* · 2012-04-26 · HCS: high content screening...

Documents

Transcript of Bioimage Informatics for Experimental Biology* · 2012-04-26 · HCS: high content screening...