Bioinformatic Tools for Inferring Functional Information from...
Transcript of Bioinformatic Tools for Inferring Functional Information from...
Hindawi Publishing CorporationInternational Journal of Plant GenomicsVolume 2008 Article ID 147563 9 pagesdoi1011552008147563
Review ArticleBioinformatic Tools for Inferring Functional Information fromPlant Microarray Data Tools for the First Steps
Grier P Page and Issa Coulibaly
Department of Biostatistics University of Alabama at Birmingham 1665 University Blvd Ste 327 Birmingham AL 35294-0022 USA
Correspondence should be addressed to Grier P Page gpageuabedu
Received 2 November 2007 Accepted 7 May 2008
Recommended by Gary Skuse
Microarrays are a very powerful tool for quantifying the amount of RNA in samples however their ability to query essentially everygene in a genome which can number in the tens of thousands presents analytical and interpretative problems As a result a varietyof software and web-based tools have been developed to help with these issues This article highlights and reviews some of the toolsfor the first steps in the analysis of a microarray study We have tried for a balance between free and commercial systems We haveorganized the tools by topics including image processing tools (Section 2) power analysis tools (Section 3) image analysis tools(Section 4) database tools (Section 5) databases of functional information (Section 6) annotation tools (Section 7) statisticaland data mining tools (Section 8) and dissemination tools (Section 9)
Copyright copy 2008 G P Page and I Coulibaly This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
1 INTRODUCTION
The primary goal of a microarray study is to generate a listof differentially regulated genes and infer pathways that canprovide insight into the biological question under investiga-tion Due to the very high dimensionality of a microarrayexperiment running to thousand of genes bioinformaticsand statistical tools are essential for the analysis of data Thisreview is written to provide plant investigators with a list oftools and web-based resources designed to help them movefrom an idea or hypothesis to the conduct of the study imageanalysis generation of expression data statistical analysisannotation and then dissemination of the data
The first step in the conduct of a microarray study is theselection of a microarray platform to use For many speciesthere are commercially available arrays from commercialvendors and academic groups Unfortunately arrays are notavailable for all species while arrays can be used in closelyrelated species it is usually better to develop arrays basedupon the sequence of the species being studied Section 2provides a list of tools for generating useful probe sequencesfrom genomic data Once an array has been developed itis critical to collect sufficient samples to run an experimentthat will generate biologically generalizable results Section 3highlights tools for sample size and power analysis for
microarray studies Image analysis tools (Section 4) areused to quantitate the amount of fluorescence for a spotor set of spots Microarray experiments generate copiousamounts of data The storage and distribution of the data areaccomplished by the tools described in Section 5 Databasesof gene annotations are provided in Section 6 Sections7 and 8 describe statistical analysis and annotation toolsThe two grouped together for the same tools often provideboth functions In fact many of the database tools willalso provide analytical and annotation functions as wellFinally in Section 9 we describe web sites for disseminatingmicroarray data and analyses
2 PROBE DESIGN SOFTWARE
Plant scientists conduct their research on a wide variety ofplant taxa Arrays have been developed for a number ofplant species including Arabidopsis Maize Populus RiceBarley Grape Citrus Cotton Medicago Soybean SugarCane Tomato and Wheat While arrays can be used onclosely related species it is often better to design a new arrayfor the species of interest Several tools have been designedto help design probes for spotting or deposition on arraysbased upon genomic sequence data The critical stage is to
2 International Journal of Plant Genomics
have high-quality sequence data The more complete thegenome is the easier it will be to design probes that willnot cross hybridize be subject to SNPs and query the geneaccurately Table 1 lists a number of tools for probe designmany of them are free but a number specific to a single arraymanufacturer
3 POWER ANALYSIS AND SAMPLESIZE CALCULATIONS
One of the keys to a successful microarray study is tocollect enough data (arrays) in order to derive biologicallygeneralizable results The key to this is the statistical powerof a study Power is the probability of being able to detecta significant difference between experiment groups whenone really exists There are several factors involved in powerbut the main one under the control of an investigator is thesample size A study with too few samples may not detectreal differences while too many samples will waste resourcesPower analysis allows the selection of the optimal samplesize While sample sizes for microarrays can be plannedwith traditional statistical power calculation tools such asPS (httpbiostatmcvanderbiltedutwikibinviewMainPowerSampleSize) the unique features of arrays such as thelarge number of tests and the large number of genes that aredifferent between groups have lead to the development ofseveral methods and tools for calculating power and samplesize analysis
31 The Power Atlas
The Power Atlas is a web-based resource to assist inves-tigators in the planning and design of microarray andexpression-based experiments This software currently aimsat estimating the power and sample size for a two groupcomparison based upon pilot data The methods underlyingthe web site are reported in Gadbury et al [1] and thesoftware is described in further detail at Page et al [2] Thetool may be used in two manners one may either uploadonersquos own pilot data or select a pilot dataset from over1 000 public data sets Output includes graphs of powerfor a variety of significance and false discovery rates seehttpwwwpoweratlasorg [2]
32 Significance analysis of microarrays (SAM)
SAM is a free flexible Excel Addin that includes a numberof useful functions for the analysis of microarray data Toolsinclude statistical analysis for discrete quantitative and timeseries data adjustments for multiple testing gene set enrich-ment analysis sample size assessment estimates of FalseDiscovery rate (FDR) and q-value as well as per gene poweranalysis see httpwww-statstanfordedusimtibsSAM [3]
4 IMAGE ANALYSIS SOFTWARE
The purpose of image analysis software is to generate aquantified expression score from the scanned microarrayimages Some of the tools are specific to particular array
types and thus are not appropriate for all array types Thereare a number of tools that are available in this area many ofwhich are expensive We present here tools that are still beingactively supported and developed Additional tools are listedin Table 2
41 Affy
This is a package in Bioconductor for processing Affymetrixarrays A wide variety of image processing normalizationand quality control procedures are available As a notethere are a variety of other image processing tools inBioconductor including PDNN and DCHip that should beconsidered for use as well see httpwwwbioconductororgpackages21biochtmlaffyhtml [4]
42 Affyprobe miner
Affyprobe miner is used to redefine chip definition files(CDFs) for Affymetrix chips to take into account the morerecent genomic sequence information on SNP alternativesplicing changes in the gene model exon structure andother such genomic difference Precomputed CDFs forseveral chips are available for download see httpgaussdbbgeorgetowneduliblabaffyprobeminer [5]
43 Beadarray
This is a function in Bioconductor for reading preprocessedIllumina Bead summary data as well as reconstructingbead-level data using raw TIFF images Methods forquality control and low-level analysis are also providedsee httpwwwbioconductororgpackages21biochtmlbeadarrayhtml [6]
44 Genechip operating software (GCOS)
Affymetrix GCOS automates the control of GeneChip Flu-idics Stations and Scanners In addition GCOS acquiresdata manages sample and experimental information andperforms gene expression data analysis GCOS can quantitateimages using MAS 5 and PLIER see httpwwwaffymetrixcomproductssoftwarespecificgcosaffx
45 Gene pix pro 60
This software has a number of useful features includingimaging spot finding quality control analysis tools visu-alizations and automation capabilities GenePix can displayand process up to four single wavelengths thus four-channel imaging can be used This tool can be integratedwith a web-accessible database GenePix is in some waysthe default industrial standard microarray image analysissoftware because of its early development of couple of outputfile formats lowastgpr and lowastgps that are used by many otherapplications see httpwwwmoleculardevicescom
G P Page and I Coulibaly 3
Table 1 Probe design software packages
Tool and website Cost and functions of the tool
Array Designer httpwwwpremierbiosoftcomdnamicroarrayindexhtml
Design primers and probes for oligo and cDNAexpression microarrays It can also designprobes for SNP detection single exon wholegene tiling and resequencing arrays The soft-ware is not free
ArrayScribe httpwwwnimblegencomproductssoftwarearrayscribehtml
Free but limited to designing NimbleGenArrays The tool can design probes spec-ify mismatches at specific sequence positionsautomatically generate mismatches generatemultiple probes for a gene and design theplacement of spots on an array
eArray httpearraychemagilentcomearraylogindoFree but limited to designing Agilent arraysCan design probes for expression CGH andChiP for any species with genomic sequence
Primer3Plus httpwwwbioinformaticsnlcgi-inprimer3plusprimer3pluscgiFree software that can design probes for expres-sion detection on arrays amplificationcloningand sequencingresequencing
Sarani Oligo Design httpwwwstrandlscomoligodesignhtmlProbe design for expression analysis The soft-ware is not free
Visual OMP httpwwwdnasoftwarecomProductsVisualOMP
Design software for RNA DNA single ormultiple probe design microarrays TaqManassays genotyping single and multiplex PCRsecondary structure simulation sequencinggenotyping
Table 2 Other useful image analysis software packages
Tool name Web site
Able Image Analyser httpablemulabscom
ArrayVision httpwwwimagingresearchcomproductsARVasp
IcononClust httpwwwclondiagcomframephppage=productsswiconoclustindexphp
ImaGene httpwwwbiodiscoverycomindeximagene
Koadarray httpwwwkoadacomkoadarray
Microvigene httpwwwvigenetechcomMicroVigenehtm
ScanAlyze httpranalblgovEisenSoftwarehtm
Spot httpwwwhca-visioncomproductspothtml
46 Nimblescan
This is a NimbleGen product designed for the extractionof feature intensity raw values linkage of the raw inten-sity values with the corresponding probe parameters andgeneration of analysis reports for expression ChIP-chipand resequencing arrays and methylation analysis for Nim-bleGen Arrays see httpwww nimblegencomproductssoftwarenimblescanhtml
47 TM4spotfinder
Spotfinder is part of the larger freely available microarrayanalysis suite TM4 Spotfinder is designed for the rapidreproducible and computer-aided analysis of microarrayimages and the quantification of gene expression Spotfinder
can read paired 16-bit or 8-bit TIFF image files generatedby most microarray scanners Automatic semiautomatic andmanual grid construction and adjustments can be made Twosegmentation methods are available Reusable grid geometryfiles and automatic grid adjustment allow user to analyzelarge quantities of images in a consistent and efficient man-ner Quality control views allow the user to assess systematicbiases in the data see httpwwwtm4orgspotfinderhtml[7 8]
5 DATABASE TOOLS
Microarray experiments generate a huge amount of dataThe handling storing sharing and distribution of the datacan be quite complex As a result a variety of database tools
4 International Journal of Plant Genomics
have been developed for assisting in this aspect of microarraystudies Some of the tools listed below are more thanjust stand-alone database tools and may include extensiveanalysis and visualization functionality as well There area number of database tools with highly different utilityand platform requirements Table 3 outlines the tools andwebsites
6 DATABASES OF FUNCTIONAL INFORMATION
The amount of information about the functions of genes isbeyond what any one person can know Consequently it isuseful to pull in information on what others have discoveredabout genes in order to fully and correctly interpret anexpression study The following tools are databases ofvarious types on information such as published papers genesequences pathways and ontologies that might be useful foran investigator who is interpreting an expression study
61 Agbase
AgBase is a curated open-source web-accessible resourcefor functional analysis of agricultural plant and animal geneproducts Agbase contains databases of Poplar and Pine geneontology terms and annotations as well as several animalsmicrobes and parasites see httpwwwagbase msstateedu)[9 10]
62 Agricola
Agricola is the catalog and index to the collections of theNational Agricultural Library The database covers materialsin all formats and periods dating back to the 15th centuryThe records include all aspects of agriculture and relateddisciplines see httpagricolanalusdagov
63 Eukaryotic gene orthologues (EGO)
EGO is generated by the pair-wise comparison betweenthe tentative consensus (TC) sequences from individualorganisms The reciprocal pairs of the best match areclustered into individual groups and multiple sequencealignments are displayed for each group EGO is veryuseful for connecting homologous genes across species seehttpcompbiodfciharvardedutgiego [11]
64 Ensembl
Ensembl is a joint project between European BioinformaticsInstitute and the Wellcome Trust Sanger Institute to developa software system which produces and maintains automaticannotation on selected eukaryotic genomes Initially devel-oped for vertebrates Ensembl has been adapted for useby several plant groups including legume Gramene andArabidopsis see httpwwwensemblorgindexhtml [12]
65 Entrez gene
Entrez Gene is an NCBIrsquos database for gene-specific in-formation Entrez Gene focuses on the genomes that have
been completely sequenced have an active research com-munity to contribute gene-specific information or thatare scheduled for intense sequence analysis Records areassigned unique stable and tracked integers as identifiersThe content (nomenclature map location gene productsand their attributes markers phenotypes and links tocitations sequences variation details maps expression pro-tein homologs protein domains and external databases) isupdated regularly There is currently at least some gene infor-mation on 113 plant species see httpwwwncbinlmnihgovsitesentrezdb=gene
66 Gene index
The goal of The Gene Index Project is to use the availableEST and gene sequences along with the reference genomesto provide an inventory of likely genes and variants Genesare linked to annotation regarding their functions CurrentlyGI databases have been constructed for 34 plant species(httpcompbiodfciharvardedutgiplanthtml) [13 14]
67 Gene ontology
The objective of GO is to provide controlled vocabulariesfor the description of the molecular function biologicalprocess and cellular component of gene products Theseterms are to be used as attributes of gene products by variouscollaborating databases such as Gramene and TAIR seehttpwwwgeneontologyorg [15]
68 Gramene
Gramene is a curated open-source data resource for ge-nome analysis in the grasses The information stored inthe database is derived from public sources and includesgenomes EST sequencing protein structure and functionanalysis genetic and physical mapping interpretation of bi-ochemical pathways Gene Ontologies gene and QTLlocalization and descriptions of phenotypic characters andmutations Extensive information is provided for OryzaZea Triticum Hordeum Avena Setaria Pennisetum SecaleSorghum Zizania and Brachypodium see httpwwwgrameneorg
69 Kyoto encyclopedia of genes and genomes (KEGG)
KEGG is a database of biological systems consisting of genesand proteins (KEGG GENES) endogenous and exogenoussubstances (KEGG LIGAND) pathways (KEGG PATHWAY)and hierarchies and relationships of biological objects(KEGG BRITE) This database is very rich in data withinformation across hundreds of species including manyplants see httpwwwgenomejpkegg [16ndash18]
610 Plant associated microbe geneontology (PAMGO)
PAMGO is a database of the results of a multiinstitutionalcollaborative effort aimed at developing new GO terms and
G P Page and I Coulibaly 5
Table 3 Database tools
Tool name Web site
Acuity httpwwwmoleculardevicescompagessoftwaregnacuityhtml
Array Results Manager ARM httpwwwbiodiscoverycomindexarm
Arraytrack httpwwwfdagovnctrsciencecenterstoxicoinformaticsArrayTrack [35 36]
BASE 2 httpbasethepluse
caArray httpcaarrayncinihgov
Expressionist httpwwwgenedatacomproductsexpressionistindex enghtml
Gene Array Analyzer Software GAAS httpwwwmedinfopolipolimiitGAAS
GeneDirector httpwwwbiodiscoverycomindexgenedirector
GeneSpring Workgroup httpwwwchemagilentcomscriptspdsasplpage=34668
GeneTraffic httpwwwiobioncomproductsproducts GENETRAFFIChtml
Genowiz httpwwwocimumbiocom
Longhorn Array Database LAD [37] httpwwwlonghornarraydatabaseorg
MaxdLoad2 httpwwwbioinfmanacukmicroarraymaxdindexhtml
PARTISAN arrayLIMS httpwwwclondiagcom
Rosetta Resolver System httpwwwrosettabiocomproductsresolverdefaulthtm
Stanford Microarray Database SMD httpsmd-wwwstanfordedudownload [38]
relationships for gene products implicated in plant-pathogeninteractions GO terms are currently being developed forthe following species Erwinia chrysanthemi Pseudomonassyringae pv tomato and Agrobacterium tumefaciens the fun-gus Magnaporthe grisea the oomycetes Phytophthora sojaeand Phytophthora ramorum and the nematode Meloidogynehapla see httppamgovbivtedu
611 SWISS-PROT
SWISS-PROT is a curated protein sequence database whichprovides high level of annotations such as the descrip-tion of the function of a protein its domains structurepost-translational modifications variants and so forthalong with good integration with other databases seehttpwwwexpasy chsprot
612 TAIR
The Arabidopsis Information Resource (TAIR) maintainsa database of genetic and molecular biology data forArabidopsis thaliana Data available from TAIR includes thecomplete genome sequence along with gene structure geneproduct information metabolism gene expression DNAand seed stocks genome maps genetic and physical markersand publications see httpwwwarabidopsisorg
7 ANNOTATION TOOLS
The databases described in Section 6 can provide data in avariety of forms which makes merging the annotations withthe expression difficult To deal with this heterogeneity anumber of tools have been developed to increase the ease ofannotating genes in expression studies
71 CiteXplore
CiteXplore combines literature search with text mining toolsfor biology Search results are cross referenced to EuropeanBioinformatics Institute applications based on publicationidentifiers Links to full text versions are provided whereavailable see httpwwwebiacukcitexplore
72 Database for annotation visualization andintegrated discovery (DAVID)
DAVID provides a huge set of functional annotation toolsfor investigators to understand biological meaning behinda large list of genes The key is the DAVID Knowledgebasewhich provides a comprehensive high-quality collection ofgene annotation resource the flexibility to cross-referencegene identifiers and heterogeneous annotations from almostall databases The DAVID tools are able to identify enrichedbiological themes particularly GO terms cluster redundantannotation terms visualize genes on Baccarat and KEGGpathway maps display related many-genes-to-many-termson 2D view search for other functionally related genesnot in the list list interacting proteins highlight proteinfunctional domains and motifs redirect to related literaturesand convert gene identifiers from one type to another seehttpdavidabccncifcrfgov [19]
73 MatchMiner
MatchMiner translates between several gene identifier typesfor the same list of hundreds or thousands of genes Thegene identifier types supported by the tool includes GenBankaccession numbers IMAGE clone IDs common gene namesHUGO names gene symbols UniGene clusters FISH-mapped BAC clones Affymetrix identifiers and chromo-some locations MatchMiner can also find the intersection
6 International Journal of Plant Genomics
of two lists of genes specified by different identifiers seehttpdiscoverncinihgovmatchminerindexjsp [20]
74 Medminer
MedMiner searches and organizes the biomedical literatureon genes gene-gene relationships and gene-drug relation-ships It uses GeneCards PubMed and syntactic analysistruncated-keyword filtering of relational and user-controlledsculpting of Boolean queries to generate key sentencesfrom pertinent abstracts Abstracts selected can be auto-matically entered into EndNote see httpdiscoverncinihgovtextminingmainjsp [21]
8 DATA ANALYSIS SOFTWARE
There is an incredible breadth of tools in this area withmany tools providing very slick interfaces and very usefulfunctions however you really do not need any of these toolsMost statistical packages such as SAS SPSS JMP and Rcan be used to analyze microarray data and will do mostof the functions the following tools will do for there arefew statistical methods that are 100 unique to expressionstudies Nonetheless many of the following tools are mucheasier to use and often have better visualization functionsthan the pure statistical programs Typically the tools havebeen designed for ease of use often too easy Regardless of thetool you use strive to understand the function and analysesprovided and the assumption that are made when you chooseto use them for analysis For example in cluster analysis youneed to make a choice of link and weight functions and theclusters that result will be quite different based on methodswhich are chosen There are similar issues to learn andunderstand for all statistical methods and most visualizationmethods Additional tools are listed in Table 4
81 Bioconductor
Bioconductor is a multicenter effort to develop tools in theR programming environment for analyzing genomic dataespecially microarray data There are a large number ofdifferent packages available to conduct many types of anal-yses currently there are over 115 microarray applicationsTools are still in very active development and are all freelyavailable Some of the most relevant tools are affy maanovagenefilter limma mulltest annotate geneplotter marray toname a few A couple of the packages are described elsewherein this document but for more details of specific tools seethe Bioconductor web site see httpwwwbioconductororg[22]
82 Biometric research branch (BRB) arrays tools
BRB ArrayTools is a free integrated package for the visualiza-tion and statistical analysis of DNA microarray gene expres-sion data It functions as an Excel Addin It was developedby professional statisticians experienced in the analysis ofmicroarray data It is probably the best tool available fordiscriminate analysis and has a variety of other statistical and
cluster methods included see httplinusncinihgovBRB-ArrayToolshtml
83 Expression profiler
Expression Profiler is a set of tools for cluster analysis patterndiscovery pattern visualization study and search for geneontology categories The tool also generates sequence logosextracts regulatory sequences studies protein interactionsand links analysis results to external tools and databases seehttpepebiacuk
84 Genepattern
GenePattern puts sophisticated computational methods intothe hands of the biomedical research community A simpleapplication interface gives a broad audience access to agrowing repository of analytic tools for genomic datawhile an API supports computational biologists GenePat-tern is a powerful analysis workflow tool developed tosupport multidisciplinary genomic research programs anddesigned to encourage rapid integration of new techniquessee httpwwwbroadmiteducancersoftwaregenepatternindexhtml [23]
85 GeneXpress
GeneXPress is a visualization and analysis tool for geneexpression data integrating clustering gene annotationand sequence information GeneXPress allows the userto load clustering results and automatically analyze themfor significance of functional groups through correlationwith functional annotations (eg Gene Ontology) and forenrichment of motif binding sites (eg TRANSFAC motifs)see httpgenexpressstanfordedu
86 GEPAS (gene expression pattern analysis suite)
GEPAS is an integrated web-based tool for the analysis ofgene expression data GEPAS includes tools for normaliza-tion many clustering methods supervised analysis differ-ential analysis differential gene expression predictors arrayCGH and functional annotation see httpgepasbioinfocipfes [24 25]
87 High-dimensional biology statistics (HDBStat)
HDBStat is a free java application that allows for thenormalization transformation and statistical analysis ofexpression data HDBStat also has a number of unique qual-ity control procedures included HDBStat has implementedreproducible research design to allow for analysis to bereadily repeated (httpwwwssguabeduhdbstat) [26]
88 JMP genomics
JMP genomics leverages many statistical tools in JMP astatistical analysis package as a result it has over 100 di-fferent analytical procedures that can be run It also includes
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
2 International Journal of Plant Genomics
have high-quality sequence data The more complete thegenome is the easier it will be to design probes that willnot cross hybridize be subject to SNPs and query the geneaccurately Table 1 lists a number of tools for probe designmany of them are free but a number specific to a single arraymanufacturer
3 POWER ANALYSIS AND SAMPLESIZE CALCULATIONS
One of the keys to a successful microarray study is tocollect enough data (arrays) in order to derive biologicallygeneralizable results The key to this is the statistical powerof a study Power is the probability of being able to detecta significant difference between experiment groups whenone really exists There are several factors involved in powerbut the main one under the control of an investigator is thesample size A study with too few samples may not detectreal differences while too many samples will waste resourcesPower analysis allows the selection of the optimal samplesize While sample sizes for microarrays can be plannedwith traditional statistical power calculation tools such asPS (httpbiostatmcvanderbiltedutwikibinviewMainPowerSampleSize) the unique features of arrays such as thelarge number of tests and the large number of genes that aredifferent between groups have lead to the development ofseveral methods and tools for calculating power and samplesize analysis
31 The Power Atlas
The Power Atlas is a web-based resource to assist inves-tigators in the planning and design of microarray andexpression-based experiments This software currently aimsat estimating the power and sample size for a two groupcomparison based upon pilot data The methods underlyingthe web site are reported in Gadbury et al [1] and thesoftware is described in further detail at Page et al [2] Thetool may be used in two manners one may either uploadonersquos own pilot data or select a pilot dataset from over1 000 public data sets Output includes graphs of powerfor a variety of significance and false discovery rates seehttpwwwpoweratlasorg [2]
32 Significance analysis of microarrays (SAM)
SAM is a free flexible Excel Addin that includes a numberof useful functions for the analysis of microarray data Toolsinclude statistical analysis for discrete quantitative and timeseries data adjustments for multiple testing gene set enrich-ment analysis sample size assessment estimates of FalseDiscovery rate (FDR) and q-value as well as per gene poweranalysis see httpwww-statstanfordedusimtibsSAM [3]
4 IMAGE ANALYSIS SOFTWARE
The purpose of image analysis software is to generate aquantified expression score from the scanned microarrayimages Some of the tools are specific to particular array
types and thus are not appropriate for all array types Thereare a number of tools that are available in this area many ofwhich are expensive We present here tools that are still beingactively supported and developed Additional tools are listedin Table 2
41 Affy
This is a package in Bioconductor for processing Affymetrixarrays A wide variety of image processing normalizationand quality control procedures are available As a notethere are a variety of other image processing tools inBioconductor including PDNN and DCHip that should beconsidered for use as well see httpwwwbioconductororgpackages21biochtmlaffyhtml [4]
42 Affyprobe miner
Affyprobe miner is used to redefine chip definition files(CDFs) for Affymetrix chips to take into account the morerecent genomic sequence information on SNP alternativesplicing changes in the gene model exon structure andother such genomic difference Precomputed CDFs forseveral chips are available for download see httpgaussdbbgeorgetowneduliblabaffyprobeminer [5]
43 Beadarray
This is a function in Bioconductor for reading preprocessedIllumina Bead summary data as well as reconstructingbead-level data using raw TIFF images Methods forquality control and low-level analysis are also providedsee httpwwwbioconductororgpackages21biochtmlbeadarrayhtml [6]
44 Genechip operating software (GCOS)
Affymetrix GCOS automates the control of GeneChip Flu-idics Stations and Scanners In addition GCOS acquiresdata manages sample and experimental information andperforms gene expression data analysis GCOS can quantitateimages using MAS 5 and PLIER see httpwwwaffymetrixcomproductssoftwarespecificgcosaffx
45 Gene pix pro 60
This software has a number of useful features includingimaging spot finding quality control analysis tools visu-alizations and automation capabilities GenePix can displayand process up to four single wavelengths thus four-channel imaging can be used This tool can be integratedwith a web-accessible database GenePix is in some waysthe default industrial standard microarray image analysissoftware because of its early development of couple of outputfile formats lowastgpr and lowastgps that are used by many otherapplications see httpwwwmoleculardevicescom
G P Page and I Coulibaly 3
Table 1 Probe design software packages
Tool and website Cost and functions of the tool
Array Designer httpwwwpremierbiosoftcomdnamicroarrayindexhtml
Design primers and probes for oligo and cDNAexpression microarrays It can also designprobes for SNP detection single exon wholegene tiling and resequencing arrays The soft-ware is not free
ArrayScribe httpwwwnimblegencomproductssoftwarearrayscribehtml
Free but limited to designing NimbleGenArrays The tool can design probes spec-ify mismatches at specific sequence positionsautomatically generate mismatches generatemultiple probes for a gene and design theplacement of spots on an array
eArray httpearraychemagilentcomearraylogindoFree but limited to designing Agilent arraysCan design probes for expression CGH andChiP for any species with genomic sequence
Primer3Plus httpwwwbioinformaticsnlcgi-inprimer3plusprimer3pluscgiFree software that can design probes for expres-sion detection on arrays amplificationcloningand sequencingresequencing
Sarani Oligo Design httpwwwstrandlscomoligodesignhtmlProbe design for expression analysis The soft-ware is not free
Visual OMP httpwwwdnasoftwarecomProductsVisualOMP
Design software for RNA DNA single ormultiple probe design microarrays TaqManassays genotyping single and multiplex PCRsecondary structure simulation sequencinggenotyping
Table 2 Other useful image analysis software packages
Tool name Web site
Able Image Analyser httpablemulabscom
ArrayVision httpwwwimagingresearchcomproductsARVasp
IcononClust httpwwwclondiagcomframephppage=productsswiconoclustindexphp
ImaGene httpwwwbiodiscoverycomindeximagene
Koadarray httpwwwkoadacomkoadarray
Microvigene httpwwwvigenetechcomMicroVigenehtm
ScanAlyze httpranalblgovEisenSoftwarehtm
Spot httpwwwhca-visioncomproductspothtml
46 Nimblescan
This is a NimbleGen product designed for the extractionof feature intensity raw values linkage of the raw inten-sity values with the corresponding probe parameters andgeneration of analysis reports for expression ChIP-chipand resequencing arrays and methylation analysis for Nim-bleGen Arrays see httpwww nimblegencomproductssoftwarenimblescanhtml
47 TM4spotfinder
Spotfinder is part of the larger freely available microarrayanalysis suite TM4 Spotfinder is designed for the rapidreproducible and computer-aided analysis of microarrayimages and the quantification of gene expression Spotfinder
can read paired 16-bit or 8-bit TIFF image files generatedby most microarray scanners Automatic semiautomatic andmanual grid construction and adjustments can be made Twosegmentation methods are available Reusable grid geometryfiles and automatic grid adjustment allow user to analyzelarge quantities of images in a consistent and efficient man-ner Quality control views allow the user to assess systematicbiases in the data see httpwwwtm4orgspotfinderhtml[7 8]
5 DATABASE TOOLS
Microarray experiments generate a huge amount of dataThe handling storing sharing and distribution of the datacan be quite complex As a result a variety of database tools
4 International Journal of Plant Genomics
have been developed for assisting in this aspect of microarraystudies Some of the tools listed below are more thanjust stand-alone database tools and may include extensiveanalysis and visualization functionality as well There area number of database tools with highly different utilityand platform requirements Table 3 outlines the tools andwebsites
6 DATABASES OF FUNCTIONAL INFORMATION
The amount of information about the functions of genes isbeyond what any one person can know Consequently it isuseful to pull in information on what others have discoveredabout genes in order to fully and correctly interpret anexpression study The following tools are databases ofvarious types on information such as published papers genesequences pathways and ontologies that might be useful foran investigator who is interpreting an expression study
61 Agbase
AgBase is a curated open-source web-accessible resourcefor functional analysis of agricultural plant and animal geneproducts Agbase contains databases of Poplar and Pine geneontology terms and annotations as well as several animalsmicrobes and parasites see httpwwwagbase msstateedu)[9 10]
62 Agricola
Agricola is the catalog and index to the collections of theNational Agricultural Library The database covers materialsin all formats and periods dating back to the 15th centuryThe records include all aspects of agriculture and relateddisciplines see httpagricolanalusdagov
63 Eukaryotic gene orthologues (EGO)
EGO is generated by the pair-wise comparison betweenthe tentative consensus (TC) sequences from individualorganisms The reciprocal pairs of the best match areclustered into individual groups and multiple sequencealignments are displayed for each group EGO is veryuseful for connecting homologous genes across species seehttpcompbiodfciharvardedutgiego [11]
64 Ensembl
Ensembl is a joint project between European BioinformaticsInstitute and the Wellcome Trust Sanger Institute to developa software system which produces and maintains automaticannotation on selected eukaryotic genomes Initially devel-oped for vertebrates Ensembl has been adapted for useby several plant groups including legume Gramene andArabidopsis see httpwwwensemblorgindexhtml [12]
65 Entrez gene
Entrez Gene is an NCBIrsquos database for gene-specific in-formation Entrez Gene focuses on the genomes that have
been completely sequenced have an active research com-munity to contribute gene-specific information or thatare scheduled for intense sequence analysis Records areassigned unique stable and tracked integers as identifiersThe content (nomenclature map location gene productsand their attributes markers phenotypes and links tocitations sequences variation details maps expression pro-tein homologs protein domains and external databases) isupdated regularly There is currently at least some gene infor-mation on 113 plant species see httpwwwncbinlmnihgovsitesentrezdb=gene
66 Gene index
The goal of The Gene Index Project is to use the availableEST and gene sequences along with the reference genomesto provide an inventory of likely genes and variants Genesare linked to annotation regarding their functions CurrentlyGI databases have been constructed for 34 plant species(httpcompbiodfciharvardedutgiplanthtml) [13 14]
67 Gene ontology
The objective of GO is to provide controlled vocabulariesfor the description of the molecular function biologicalprocess and cellular component of gene products Theseterms are to be used as attributes of gene products by variouscollaborating databases such as Gramene and TAIR seehttpwwwgeneontologyorg [15]
68 Gramene
Gramene is a curated open-source data resource for ge-nome analysis in the grasses The information stored inthe database is derived from public sources and includesgenomes EST sequencing protein structure and functionanalysis genetic and physical mapping interpretation of bi-ochemical pathways Gene Ontologies gene and QTLlocalization and descriptions of phenotypic characters andmutations Extensive information is provided for OryzaZea Triticum Hordeum Avena Setaria Pennisetum SecaleSorghum Zizania and Brachypodium see httpwwwgrameneorg
69 Kyoto encyclopedia of genes and genomes (KEGG)
KEGG is a database of biological systems consisting of genesand proteins (KEGG GENES) endogenous and exogenoussubstances (KEGG LIGAND) pathways (KEGG PATHWAY)and hierarchies and relationships of biological objects(KEGG BRITE) This database is very rich in data withinformation across hundreds of species including manyplants see httpwwwgenomejpkegg [16ndash18]
610 Plant associated microbe geneontology (PAMGO)
PAMGO is a database of the results of a multiinstitutionalcollaborative effort aimed at developing new GO terms and
G P Page and I Coulibaly 5
Table 3 Database tools
Tool name Web site
Acuity httpwwwmoleculardevicescompagessoftwaregnacuityhtml
Array Results Manager ARM httpwwwbiodiscoverycomindexarm
Arraytrack httpwwwfdagovnctrsciencecenterstoxicoinformaticsArrayTrack [35 36]
BASE 2 httpbasethepluse
caArray httpcaarrayncinihgov
Expressionist httpwwwgenedatacomproductsexpressionistindex enghtml
Gene Array Analyzer Software GAAS httpwwwmedinfopolipolimiitGAAS
GeneDirector httpwwwbiodiscoverycomindexgenedirector
GeneSpring Workgroup httpwwwchemagilentcomscriptspdsasplpage=34668
GeneTraffic httpwwwiobioncomproductsproducts GENETRAFFIChtml
Genowiz httpwwwocimumbiocom
Longhorn Array Database LAD [37] httpwwwlonghornarraydatabaseorg
MaxdLoad2 httpwwwbioinfmanacukmicroarraymaxdindexhtml
PARTISAN arrayLIMS httpwwwclondiagcom
Rosetta Resolver System httpwwwrosettabiocomproductsresolverdefaulthtm
Stanford Microarray Database SMD httpsmd-wwwstanfordedudownload [38]
relationships for gene products implicated in plant-pathogeninteractions GO terms are currently being developed forthe following species Erwinia chrysanthemi Pseudomonassyringae pv tomato and Agrobacterium tumefaciens the fun-gus Magnaporthe grisea the oomycetes Phytophthora sojaeand Phytophthora ramorum and the nematode Meloidogynehapla see httppamgovbivtedu
611 SWISS-PROT
SWISS-PROT is a curated protein sequence database whichprovides high level of annotations such as the descrip-tion of the function of a protein its domains structurepost-translational modifications variants and so forthalong with good integration with other databases seehttpwwwexpasy chsprot
612 TAIR
The Arabidopsis Information Resource (TAIR) maintainsa database of genetic and molecular biology data forArabidopsis thaliana Data available from TAIR includes thecomplete genome sequence along with gene structure geneproduct information metabolism gene expression DNAand seed stocks genome maps genetic and physical markersand publications see httpwwwarabidopsisorg
7 ANNOTATION TOOLS
The databases described in Section 6 can provide data in avariety of forms which makes merging the annotations withthe expression difficult To deal with this heterogeneity anumber of tools have been developed to increase the ease ofannotating genes in expression studies
71 CiteXplore
CiteXplore combines literature search with text mining toolsfor biology Search results are cross referenced to EuropeanBioinformatics Institute applications based on publicationidentifiers Links to full text versions are provided whereavailable see httpwwwebiacukcitexplore
72 Database for annotation visualization andintegrated discovery (DAVID)
DAVID provides a huge set of functional annotation toolsfor investigators to understand biological meaning behinda large list of genes The key is the DAVID Knowledgebasewhich provides a comprehensive high-quality collection ofgene annotation resource the flexibility to cross-referencegene identifiers and heterogeneous annotations from almostall databases The DAVID tools are able to identify enrichedbiological themes particularly GO terms cluster redundantannotation terms visualize genes on Baccarat and KEGGpathway maps display related many-genes-to-many-termson 2D view search for other functionally related genesnot in the list list interacting proteins highlight proteinfunctional domains and motifs redirect to related literaturesand convert gene identifiers from one type to another seehttpdavidabccncifcrfgov [19]
73 MatchMiner
MatchMiner translates between several gene identifier typesfor the same list of hundreds or thousands of genes Thegene identifier types supported by the tool includes GenBankaccession numbers IMAGE clone IDs common gene namesHUGO names gene symbols UniGene clusters FISH-mapped BAC clones Affymetrix identifiers and chromo-some locations MatchMiner can also find the intersection
6 International Journal of Plant Genomics
of two lists of genes specified by different identifiers seehttpdiscoverncinihgovmatchminerindexjsp [20]
74 Medminer
MedMiner searches and organizes the biomedical literatureon genes gene-gene relationships and gene-drug relation-ships It uses GeneCards PubMed and syntactic analysistruncated-keyword filtering of relational and user-controlledsculpting of Boolean queries to generate key sentencesfrom pertinent abstracts Abstracts selected can be auto-matically entered into EndNote see httpdiscoverncinihgovtextminingmainjsp [21]
8 DATA ANALYSIS SOFTWARE
There is an incredible breadth of tools in this area withmany tools providing very slick interfaces and very usefulfunctions however you really do not need any of these toolsMost statistical packages such as SAS SPSS JMP and Rcan be used to analyze microarray data and will do mostof the functions the following tools will do for there arefew statistical methods that are 100 unique to expressionstudies Nonetheless many of the following tools are mucheasier to use and often have better visualization functionsthan the pure statistical programs Typically the tools havebeen designed for ease of use often too easy Regardless of thetool you use strive to understand the function and analysesprovided and the assumption that are made when you chooseto use them for analysis For example in cluster analysis youneed to make a choice of link and weight functions and theclusters that result will be quite different based on methodswhich are chosen There are similar issues to learn andunderstand for all statistical methods and most visualizationmethods Additional tools are listed in Table 4
81 Bioconductor
Bioconductor is a multicenter effort to develop tools in theR programming environment for analyzing genomic dataespecially microarray data There are a large number ofdifferent packages available to conduct many types of anal-yses currently there are over 115 microarray applicationsTools are still in very active development and are all freelyavailable Some of the most relevant tools are affy maanovagenefilter limma mulltest annotate geneplotter marray toname a few A couple of the packages are described elsewherein this document but for more details of specific tools seethe Bioconductor web site see httpwwwbioconductororg[22]
82 Biometric research branch (BRB) arrays tools
BRB ArrayTools is a free integrated package for the visualiza-tion and statistical analysis of DNA microarray gene expres-sion data It functions as an Excel Addin It was developedby professional statisticians experienced in the analysis ofmicroarray data It is probably the best tool available fordiscriminate analysis and has a variety of other statistical and
cluster methods included see httplinusncinihgovBRB-ArrayToolshtml
83 Expression profiler
Expression Profiler is a set of tools for cluster analysis patterndiscovery pattern visualization study and search for geneontology categories The tool also generates sequence logosextracts regulatory sequences studies protein interactionsand links analysis results to external tools and databases seehttpepebiacuk
84 Genepattern
GenePattern puts sophisticated computational methods intothe hands of the biomedical research community A simpleapplication interface gives a broad audience access to agrowing repository of analytic tools for genomic datawhile an API supports computational biologists GenePat-tern is a powerful analysis workflow tool developed tosupport multidisciplinary genomic research programs anddesigned to encourage rapid integration of new techniquessee httpwwwbroadmiteducancersoftwaregenepatternindexhtml [23]
85 GeneXpress
GeneXPress is a visualization and analysis tool for geneexpression data integrating clustering gene annotationand sequence information GeneXPress allows the userto load clustering results and automatically analyze themfor significance of functional groups through correlationwith functional annotations (eg Gene Ontology) and forenrichment of motif binding sites (eg TRANSFAC motifs)see httpgenexpressstanfordedu
86 GEPAS (gene expression pattern analysis suite)
GEPAS is an integrated web-based tool for the analysis ofgene expression data GEPAS includes tools for normaliza-tion many clustering methods supervised analysis differ-ential analysis differential gene expression predictors arrayCGH and functional annotation see httpgepasbioinfocipfes [24 25]
87 High-dimensional biology statistics (HDBStat)
HDBStat is a free java application that allows for thenormalization transformation and statistical analysis ofexpression data HDBStat also has a number of unique qual-ity control procedures included HDBStat has implementedreproducible research design to allow for analysis to bereadily repeated (httpwwwssguabeduhdbstat) [26]
88 JMP genomics
JMP genomics leverages many statistical tools in JMP astatistical analysis package as a result it has over 100 di-fferent analytical procedures that can be run It also includes
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
G P Page and I Coulibaly 3
Table 1 Probe design software packages
Tool and website Cost and functions of the tool
Array Designer httpwwwpremierbiosoftcomdnamicroarrayindexhtml
Design primers and probes for oligo and cDNAexpression microarrays It can also designprobes for SNP detection single exon wholegene tiling and resequencing arrays The soft-ware is not free
ArrayScribe httpwwwnimblegencomproductssoftwarearrayscribehtml
Free but limited to designing NimbleGenArrays The tool can design probes spec-ify mismatches at specific sequence positionsautomatically generate mismatches generatemultiple probes for a gene and design theplacement of spots on an array
eArray httpearraychemagilentcomearraylogindoFree but limited to designing Agilent arraysCan design probes for expression CGH andChiP for any species with genomic sequence
Primer3Plus httpwwwbioinformaticsnlcgi-inprimer3plusprimer3pluscgiFree software that can design probes for expres-sion detection on arrays amplificationcloningand sequencingresequencing
Sarani Oligo Design httpwwwstrandlscomoligodesignhtmlProbe design for expression analysis The soft-ware is not free
Visual OMP httpwwwdnasoftwarecomProductsVisualOMP
Design software for RNA DNA single ormultiple probe design microarrays TaqManassays genotyping single and multiplex PCRsecondary structure simulation sequencinggenotyping
Table 2 Other useful image analysis software packages
Tool name Web site
Able Image Analyser httpablemulabscom
ArrayVision httpwwwimagingresearchcomproductsARVasp
IcononClust httpwwwclondiagcomframephppage=productsswiconoclustindexphp
ImaGene httpwwwbiodiscoverycomindeximagene
Koadarray httpwwwkoadacomkoadarray
Microvigene httpwwwvigenetechcomMicroVigenehtm
ScanAlyze httpranalblgovEisenSoftwarehtm
Spot httpwwwhca-visioncomproductspothtml
46 Nimblescan
This is a NimbleGen product designed for the extractionof feature intensity raw values linkage of the raw inten-sity values with the corresponding probe parameters andgeneration of analysis reports for expression ChIP-chipand resequencing arrays and methylation analysis for Nim-bleGen Arrays see httpwww nimblegencomproductssoftwarenimblescanhtml
47 TM4spotfinder
Spotfinder is part of the larger freely available microarrayanalysis suite TM4 Spotfinder is designed for the rapidreproducible and computer-aided analysis of microarrayimages and the quantification of gene expression Spotfinder
can read paired 16-bit or 8-bit TIFF image files generatedby most microarray scanners Automatic semiautomatic andmanual grid construction and adjustments can be made Twosegmentation methods are available Reusable grid geometryfiles and automatic grid adjustment allow user to analyzelarge quantities of images in a consistent and efficient man-ner Quality control views allow the user to assess systematicbiases in the data see httpwwwtm4orgspotfinderhtml[7 8]
5 DATABASE TOOLS
Microarray experiments generate a huge amount of dataThe handling storing sharing and distribution of the datacan be quite complex As a result a variety of database tools
4 International Journal of Plant Genomics
have been developed for assisting in this aspect of microarraystudies Some of the tools listed below are more thanjust stand-alone database tools and may include extensiveanalysis and visualization functionality as well There area number of database tools with highly different utilityand platform requirements Table 3 outlines the tools andwebsites
6 DATABASES OF FUNCTIONAL INFORMATION
The amount of information about the functions of genes isbeyond what any one person can know Consequently it isuseful to pull in information on what others have discoveredabout genes in order to fully and correctly interpret anexpression study The following tools are databases ofvarious types on information such as published papers genesequences pathways and ontologies that might be useful foran investigator who is interpreting an expression study
61 Agbase
AgBase is a curated open-source web-accessible resourcefor functional analysis of agricultural plant and animal geneproducts Agbase contains databases of Poplar and Pine geneontology terms and annotations as well as several animalsmicrobes and parasites see httpwwwagbase msstateedu)[9 10]
62 Agricola
Agricola is the catalog and index to the collections of theNational Agricultural Library The database covers materialsin all formats and periods dating back to the 15th centuryThe records include all aspects of agriculture and relateddisciplines see httpagricolanalusdagov
63 Eukaryotic gene orthologues (EGO)
EGO is generated by the pair-wise comparison betweenthe tentative consensus (TC) sequences from individualorganisms The reciprocal pairs of the best match areclustered into individual groups and multiple sequencealignments are displayed for each group EGO is veryuseful for connecting homologous genes across species seehttpcompbiodfciharvardedutgiego [11]
64 Ensembl
Ensembl is a joint project between European BioinformaticsInstitute and the Wellcome Trust Sanger Institute to developa software system which produces and maintains automaticannotation on selected eukaryotic genomes Initially devel-oped for vertebrates Ensembl has been adapted for useby several plant groups including legume Gramene andArabidopsis see httpwwwensemblorgindexhtml [12]
65 Entrez gene
Entrez Gene is an NCBIrsquos database for gene-specific in-formation Entrez Gene focuses on the genomes that have
been completely sequenced have an active research com-munity to contribute gene-specific information or thatare scheduled for intense sequence analysis Records areassigned unique stable and tracked integers as identifiersThe content (nomenclature map location gene productsand their attributes markers phenotypes and links tocitations sequences variation details maps expression pro-tein homologs protein domains and external databases) isupdated regularly There is currently at least some gene infor-mation on 113 plant species see httpwwwncbinlmnihgovsitesentrezdb=gene
66 Gene index
The goal of The Gene Index Project is to use the availableEST and gene sequences along with the reference genomesto provide an inventory of likely genes and variants Genesare linked to annotation regarding their functions CurrentlyGI databases have been constructed for 34 plant species(httpcompbiodfciharvardedutgiplanthtml) [13 14]
67 Gene ontology
The objective of GO is to provide controlled vocabulariesfor the description of the molecular function biologicalprocess and cellular component of gene products Theseterms are to be used as attributes of gene products by variouscollaborating databases such as Gramene and TAIR seehttpwwwgeneontologyorg [15]
68 Gramene
Gramene is a curated open-source data resource for ge-nome analysis in the grasses The information stored inthe database is derived from public sources and includesgenomes EST sequencing protein structure and functionanalysis genetic and physical mapping interpretation of bi-ochemical pathways Gene Ontologies gene and QTLlocalization and descriptions of phenotypic characters andmutations Extensive information is provided for OryzaZea Triticum Hordeum Avena Setaria Pennisetum SecaleSorghum Zizania and Brachypodium see httpwwwgrameneorg
69 Kyoto encyclopedia of genes and genomes (KEGG)
KEGG is a database of biological systems consisting of genesand proteins (KEGG GENES) endogenous and exogenoussubstances (KEGG LIGAND) pathways (KEGG PATHWAY)and hierarchies and relationships of biological objects(KEGG BRITE) This database is very rich in data withinformation across hundreds of species including manyplants see httpwwwgenomejpkegg [16ndash18]
610 Plant associated microbe geneontology (PAMGO)
PAMGO is a database of the results of a multiinstitutionalcollaborative effort aimed at developing new GO terms and
G P Page and I Coulibaly 5
Table 3 Database tools
Tool name Web site
Acuity httpwwwmoleculardevicescompagessoftwaregnacuityhtml
Array Results Manager ARM httpwwwbiodiscoverycomindexarm
Arraytrack httpwwwfdagovnctrsciencecenterstoxicoinformaticsArrayTrack [35 36]
BASE 2 httpbasethepluse
caArray httpcaarrayncinihgov
Expressionist httpwwwgenedatacomproductsexpressionistindex enghtml
Gene Array Analyzer Software GAAS httpwwwmedinfopolipolimiitGAAS
GeneDirector httpwwwbiodiscoverycomindexgenedirector
GeneSpring Workgroup httpwwwchemagilentcomscriptspdsasplpage=34668
GeneTraffic httpwwwiobioncomproductsproducts GENETRAFFIChtml
Genowiz httpwwwocimumbiocom
Longhorn Array Database LAD [37] httpwwwlonghornarraydatabaseorg
MaxdLoad2 httpwwwbioinfmanacukmicroarraymaxdindexhtml
PARTISAN arrayLIMS httpwwwclondiagcom
Rosetta Resolver System httpwwwrosettabiocomproductsresolverdefaulthtm
Stanford Microarray Database SMD httpsmd-wwwstanfordedudownload [38]
relationships for gene products implicated in plant-pathogeninteractions GO terms are currently being developed forthe following species Erwinia chrysanthemi Pseudomonassyringae pv tomato and Agrobacterium tumefaciens the fun-gus Magnaporthe grisea the oomycetes Phytophthora sojaeand Phytophthora ramorum and the nematode Meloidogynehapla see httppamgovbivtedu
611 SWISS-PROT
SWISS-PROT is a curated protein sequence database whichprovides high level of annotations such as the descrip-tion of the function of a protein its domains structurepost-translational modifications variants and so forthalong with good integration with other databases seehttpwwwexpasy chsprot
612 TAIR
The Arabidopsis Information Resource (TAIR) maintainsa database of genetic and molecular biology data forArabidopsis thaliana Data available from TAIR includes thecomplete genome sequence along with gene structure geneproduct information metabolism gene expression DNAand seed stocks genome maps genetic and physical markersand publications see httpwwwarabidopsisorg
7 ANNOTATION TOOLS
The databases described in Section 6 can provide data in avariety of forms which makes merging the annotations withthe expression difficult To deal with this heterogeneity anumber of tools have been developed to increase the ease ofannotating genes in expression studies
71 CiteXplore
CiteXplore combines literature search with text mining toolsfor biology Search results are cross referenced to EuropeanBioinformatics Institute applications based on publicationidentifiers Links to full text versions are provided whereavailable see httpwwwebiacukcitexplore
72 Database for annotation visualization andintegrated discovery (DAVID)
DAVID provides a huge set of functional annotation toolsfor investigators to understand biological meaning behinda large list of genes The key is the DAVID Knowledgebasewhich provides a comprehensive high-quality collection ofgene annotation resource the flexibility to cross-referencegene identifiers and heterogeneous annotations from almostall databases The DAVID tools are able to identify enrichedbiological themes particularly GO terms cluster redundantannotation terms visualize genes on Baccarat and KEGGpathway maps display related many-genes-to-many-termson 2D view search for other functionally related genesnot in the list list interacting proteins highlight proteinfunctional domains and motifs redirect to related literaturesand convert gene identifiers from one type to another seehttpdavidabccncifcrfgov [19]
73 MatchMiner
MatchMiner translates between several gene identifier typesfor the same list of hundreds or thousands of genes Thegene identifier types supported by the tool includes GenBankaccession numbers IMAGE clone IDs common gene namesHUGO names gene symbols UniGene clusters FISH-mapped BAC clones Affymetrix identifiers and chromo-some locations MatchMiner can also find the intersection
6 International Journal of Plant Genomics
of two lists of genes specified by different identifiers seehttpdiscoverncinihgovmatchminerindexjsp [20]
74 Medminer
MedMiner searches and organizes the biomedical literatureon genes gene-gene relationships and gene-drug relation-ships It uses GeneCards PubMed and syntactic analysistruncated-keyword filtering of relational and user-controlledsculpting of Boolean queries to generate key sentencesfrom pertinent abstracts Abstracts selected can be auto-matically entered into EndNote see httpdiscoverncinihgovtextminingmainjsp [21]
8 DATA ANALYSIS SOFTWARE
There is an incredible breadth of tools in this area withmany tools providing very slick interfaces and very usefulfunctions however you really do not need any of these toolsMost statistical packages such as SAS SPSS JMP and Rcan be used to analyze microarray data and will do mostof the functions the following tools will do for there arefew statistical methods that are 100 unique to expressionstudies Nonetheless many of the following tools are mucheasier to use and often have better visualization functionsthan the pure statistical programs Typically the tools havebeen designed for ease of use often too easy Regardless of thetool you use strive to understand the function and analysesprovided and the assumption that are made when you chooseto use them for analysis For example in cluster analysis youneed to make a choice of link and weight functions and theclusters that result will be quite different based on methodswhich are chosen There are similar issues to learn andunderstand for all statistical methods and most visualizationmethods Additional tools are listed in Table 4
81 Bioconductor
Bioconductor is a multicenter effort to develop tools in theR programming environment for analyzing genomic dataespecially microarray data There are a large number ofdifferent packages available to conduct many types of anal-yses currently there are over 115 microarray applicationsTools are still in very active development and are all freelyavailable Some of the most relevant tools are affy maanovagenefilter limma mulltest annotate geneplotter marray toname a few A couple of the packages are described elsewherein this document but for more details of specific tools seethe Bioconductor web site see httpwwwbioconductororg[22]
82 Biometric research branch (BRB) arrays tools
BRB ArrayTools is a free integrated package for the visualiza-tion and statistical analysis of DNA microarray gene expres-sion data It functions as an Excel Addin It was developedby professional statisticians experienced in the analysis ofmicroarray data It is probably the best tool available fordiscriminate analysis and has a variety of other statistical and
cluster methods included see httplinusncinihgovBRB-ArrayToolshtml
83 Expression profiler
Expression Profiler is a set of tools for cluster analysis patterndiscovery pattern visualization study and search for geneontology categories The tool also generates sequence logosextracts regulatory sequences studies protein interactionsand links analysis results to external tools and databases seehttpepebiacuk
84 Genepattern
GenePattern puts sophisticated computational methods intothe hands of the biomedical research community A simpleapplication interface gives a broad audience access to agrowing repository of analytic tools for genomic datawhile an API supports computational biologists GenePat-tern is a powerful analysis workflow tool developed tosupport multidisciplinary genomic research programs anddesigned to encourage rapid integration of new techniquessee httpwwwbroadmiteducancersoftwaregenepatternindexhtml [23]
85 GeneXpress
GeneXPress is a visualization and analysis tool for geneexpression data integrating clustering gene annotationand sequence information GeneXPress allows the userto load clustering results and automatically analyze themfor significance of functional groups through correlationwith functional annotations (eg Gene Ontology) and forenrichment of motif binding sites (eg TRANSFAC motifs)see httpgenexpressstanfordedu
86 GEPAS (gene expression pattern analysis suite)
GEPAS is an integrated web-based tool for the analysis ofgene expression data GEPAS includes tools for normaliza-tion many clustering methods supervised analysis differ-ential analysis differential gene expression predictors arrayCGH and functional annotation see httpgepasbioinfocipfes [24 25]
87 High-dimensional biology statistics (HDBStat)
HDBStat is a free java application that allows for thenormalization transformation and statistical analysis ofexpression data HDBStat also has a number of unique qual-ity control procedures included HDBStat has implementedreproducible research design to allow for analysis to bereadily repeated (httpwwwssguabeduhdbstat) [26]
88 JMP genomics
JMP genomics leverages many statistical tools in JMP astatistical analysis package as a result it has over 100 di-fferent analytical procedures that can be run It also includes
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
4 International Journal of Plant Genomics
have been developed for assisting in this aspect of microarraystudies Some of the tools listed below are more thanjust stand-alone database tools and may include extensiveanalysis and visualization functionality as well There area number of database tools with highly different utilityand platform requirements Table 3 outlines the tools andwebsites
6 DATABASES OF FUNCTIONAL INFORMATION
The amount of information about the functions of genes isbeyond what any one person can know Consequently it isuseful to pull in information on what others have discoveredabout genes in order to fully and correctly interpret anexpression study The following tools are databases ofvarious types on information such as published papers genesequences pathways and ontologies that might be useful foran investigator who is interpreting an expression study
61 Agbase
AgBase is a curated open-source web-accessible resourcefor functional analysis of agricultural plant and animal geneproducts Agbase contains databases of Poplar and Pine geneontology terms and annotations as well as several animalsmicrobes and parasites see httpwwwagbase msstateedu)[9 10]
62 Agricola
Agricola is the catalog and index to the collections of theNational Agricultural Library The database covers materialsin all formats and periods dating back to the 15th centuryThe records include all aspects of agriculture and relateddisciplines see httpagricolanalusdagov
63 Eukaryotic gene orthologues (EGO)
EGO is generated by the pair-wise comparison betweenthe tentative consensus (TC) sequences from individualorganisms The reciprocal pairs of the best match areclustered into individual groups and multiple sequencealignments are displayed for each group EGO is veryuseful for connecting homologous genes across species seehttpcompbiodfciharvardedutgiego [11]
64 Ensembl
Ensembl is a joint project between European BioinformaticsInstitute and the Wellcome Trust Sanger Institute to developa software system which produces and maintains automaticannotation on selected eukaryotic genomes Initially devel-oped for vertebrates Ensembl has been adapted for useby several plant groups including legume Gramene andArabidopsis see httpwwwensemblorgindexhtml [12]
65 Entrez gene
Entrez Gene is an NCBIrsquos database for gene-specific in-formation Entrez Gene focuses on the genomes that have
been completely sequenced have an active research com-munity to contribute gene-specific information or thatare scheduled for intense sequence analysis Records areassigned unique stable and tracked integers as identifiersThe content (nomenclature map location gene productsand their attributes markers phenotypes and links tocitations sequences variation details maps expression pro-tein homologs protein domains and external databases) isupdated regularly There is currently at least some gene infor-mation on 113 plant species see httpwwwncbinlmnihgovsitesentrezdb=gene
66 Gene index
The goal of The Gene Index Project is to use the availableEST and gene sequences along with the reference genomesto provide an inventory of likely genes and variants Genesare linked to annotation regarding their functions CurrentlyGI databases have been constructed for 34 plant species(httpcompbiodfciharvardedutgiplanthtml) [13 14]
67 Gene ontology
The objective of GO is to provide controlled vocabulariesfor the description of the molecular function biologicalprocess and cellular component of gene products Theseterms are to be used as attributes of gene products by variouscollaborating databases such as Gramene and TAIR seehttpwwwgeneontologyorg [15]
68 Gramene
Gramene is a curated open-source data resource for ge-nome analysis in the grasses The information stored inthe database is derived from public sources and includesgenomes EST sequencing protein structure and functionanalysis genetic and physical mapping interpretation of bi-ochemical pathways Gene Ontologies gene and QTLlocalization and descriptions of phenotypic characters andmutations Extensive information is provided for OryzaZea Triticum Hordeum Avena Setaria Pennisetum SecaleSorghum Zizania and Brachypodium see httpwwwgrameneorg
69 Kyoto encyclopedia of genes and genomes (KEGG)
KEGG is a database of biological systems consisting of genesand proteins (KEGG GENES) endogenous and exogenoussubstances (KEGG LIGAND) pathways (KEGG PATHWAY)and hierarchies and relationships of biological objects(KEGG BRITE) This database is very rich in data withinformation across hundreds of species including manyplants see httpwwwgenomejpkegg [16ndash18]
610 Plant associated microbe geneontology (PAMGO)
PAMGO is a database of the results of a multiinstitutionalcollaborative effort aimed at developing new GO terms and
G P Page and I Coulibaly 5
Table 3 Database tools
Tool name Web site
Acuity httpwwwmoleculardevicescompagessoftwaregnacuityhtml
Array Results Manager ARM httpwwwbiodiscoverycomindexarm
Arraytrack httpwwwfdagovnctrsciencecenterstoxicoinformaticsArrayTrack [35 36]
BASE 2 httpbasethepluse
caArray httpcaarrayncinihgov
Expressionist httpwwwgenedatacomproductsexpressionistindex enghtml
Gene Array Analyzer Software GAAS httpwwwmedinfopolipolimiitGAAS
GeneDirector httpwwwbiodiscoverycomindexgenedirector
GeneSpring Workgroup httpwwwchemagilentcomscriptspdsasplpage=34668
GeneTraffic httpwwwiobioncomproductsproducts GENETRAFFIChtml
Genowiz httpwwwocimumbiocom
Longhorn Array Database LAD [37] httpwwwlonghornarraydatabaseorg
MaxdLoad2 httpwwwbioinfmanacukmicroarraymaxdindexhtml
PARTISAN arrayLIMS httpwwwclondiagcom
Rosetta Resolver System httpwwwrosettabiocomproductsresolverdefaulthtm
Stanford Microarray Database SMD httpsmd-wwwstanfordedudownload [38]
relationships for gene products implicated in plant-pathogeninteractions GO terms are currently being developed forthe following species Erwinia chrysanthemi Pseudomonassyringae pv tomato and Agrobacterium tumefaciens the fun-gus Magnaporthe grisea the oomycetes Phytophthora sojaeand Phytophthora ramorum and the nematode Meloidogynehapla see httppamgovbivtedu
611 SWISS-PROT
SWISS-PROT is a curated protein sequence database whichprovides high level of annotations such as the descrip-tion of the function of a protein its domains structurepost-translational modifications variants and so forthalong with good integration with other databases seehttpwwwexpasy chsprot
612 TAIR
The Arabidopsis Information Resource (TAIR) maintainsa database of genetic and molecular biology data forArabidopsis thaliana Data available from TAIR includes thecomplete genome sequence along with gene structure geneproduct information metabolism gene expression DNAand seed stocks genome maps genetic and physical markersand publications see httpwwwarabidopsisorg
7 ANNOTATION TOOLS
The databases described in Section 6 can provide data in avariety of forms which makes merging the annotations withthe expression difficult To deal with this heterogeneity anumber of tools have been developed to increase the ease ofannotating genes in expression studies
71 CiteXplore
CiteXplore combines literature search with text mining toolsfor biology Search results are cross referenced to EuropeanBioinformatics Institute applications based on publicationidentifiers Links to full text versions are provided whereavailable see httpwwwebiacukcitexplore
72 Database for annotation visualization andintegrated discovery (DAVID)
DAVID provides a huge set of functional annotation toolsfor investigators to understand biological meaning behinda large list of genes The key is the DAVID Knowledgebasewhich provides a comprehensive high-quality collection ofgene annotation resource the flexibility to cross-referencegene identifiers and heterogeneous annotations from almostall databases The DAVID tools are able to identify enrichedbiological themes particularly GO terms cluster redundantannotation terms visualize genes on Baccarat and KEGGpathway maps display related many-genes-to-many-termson 2D view search for other functionally related genesnot in the list list interacting proteins highlight proteinfunctional domains and motifs redirect to related literaturesand convert gene identifiers from one type to another seehttpdavidabccncifcrfgov [19]
73 MatchMiner
MatchMiner translates between several gene identifier typesfor the same list of hundreds or thousands of genes Thegene identifier types supported by the tool includes GenBankaccession numbers IMAGE clone IDs common gene namesHUGO names gene symbols UniGene clusters FISH-mapped BAC clones Affymetrix identifiers and chromo-some locations MatchMiner can also find the intersection
6 International Journal of Plant Genomics
of two lists of genes specified by different identifiers seehttpdiscoverncinihgovmatchminerindexjsp [20]
74 Medminer
MedMiner searches and organizes the biomedical literatureon genes gene-gene relationships and gene-drug relation-ships It uses GeneCards PubMed and syntactic analysistruncated-keyword filtering of relational and user-controlledsculpting of Boolean queries to generate key sentencesfrom pertinent abstracts Abstracts selected can be auto-matically entered into EndNote see httpdiscoverncinihgovtextminingmainjsp [21]
8 DATA ANALYSIS SOFTWARE
There is an incredible breadth of tools in this area withmany tools providing very slick interfaces and very usefulfunctions however you really do not need any of these toolsMost statistical packages such as SAS SPSS JMP and Rcan be used to analyze microarray data and will do mostof the functions the following tools will do for there arefew statistical methods that are 100 unique to expressionstudies Nonetheless many of the following tools are mucheasier to use and often have better visualization functionsthan the pure statistical programs Typically the tools havebeen designed for ease of use often too easy Regardless of thetool you use strive to understand the function and analysesprovided and the assumption that are made when you chooseto use them for analysis For example in cluster analysis youneed to make a choice of link and weight functions and theclusters that result will be quite different based on methodswhich are chosen There are similar issues to learn andunderstand for all statistical methods and most visualizationmethods Additional tools are listed in Table 4
81 Bioconductor
Bioconductor is a multicenter effort to develop tools in theR programming environment for analyzing genomic dataespecially microarray data There are a large number ofdifferent packages available to conduct many types of anal-yses currently there are over 115 microarray applicationsTools are still in very active development and are all freelyavailable Some of the most relevant tools are affy maanovagenefilter limma mulltest annotate geneplotter marray toname a few A couple of the packages are described elsewherein this document but for more details of specific tools seethe Bioconductor web site see httpwwwbioconductororg[22]
82 Biometric research branch (BRB) arrays tools
BRB ArrayTools is a free integrated package for the visualiza-tion and statistical analysis of DNA microarray gene expres-sion data It functions as an Excel Addin It was developedby professional statisticians experienced in the analysis ofmicroarray data It is probably the best tool available fordiscriminate analysis and has a variety of other statistical and
cluster methods included see httplinusncinihgovBRB-ArrayToolshtml
83 Expression profiler
Expression Profiler is a set of tools for cluster analysis patterndiscovery pattern visualization study and search for geneontology categories The tool also generates sequence logosextracts regulatory sequences studies protein interactionsand links analysis results to external tools and databases seehttpepebiacuk
84 Genepattern
GenePattern puts sophisticated computational methods intothe hands of the biomedical research community A simpleapplication interface gives a broad audience access to agrowing repository of analytic tools for genomic datawhile an API supports computational biologists GenePat-tern is a powerful analysis workflow tool developed tosupport multidisciplinary genomic research programs anddesigned to encourage rapid integration of new techniquessee httpwwwbroadmiteducancersoftwaregenepatternindexhtml [23]
85 GeneXpress
GeneXPress is a visualization and analysis tool for geneexpression data integrating clustering gene annotationand sequence information GeneXPress allows the userto load clustering results and automatically analyze themfor significance of functional groups through correlationwith functional annotations (eg Gene Ontology) and forenrichment of motif binding sites (eg TRANSFAC motifs)see httpgenexpressstanfordedu
86 GEPAS (gene expression pattern analysis suite)
GEPAS is an integrated web-based tool for the analysis ofgene expression data GEPAS includes tools for normaliza-tion many clustering methods supervised analysis differ-ential analysis differential gene expression predictors arrayCGH and functional annotation see httpgepasbioinfocipfes [24 25]
87 High-dimensional biology statistics (HDBStat)
HDBStat is a free java application that allows for thenormalization transformation and statistical analysis ofexpression data HDBStat also has a number of unique qual-ity control procedures included HDBStat has implementedreproducible research design to allow for analysis to bereadily repeated (httpwwwssguabeduhdbstat) [26]
88 JMP genomics
JMP genomics leverages many statistical tools in JMP astatistical analysis package as a result it has over 100 di-fferent analytical procedures that can be run It also includes
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
G P Page and I Coulibaly 5
Table 3 Database tools
Tool name Web site
Acuity httpwwwmoleculardevicescompagessoftwaregnacuityhtml
Array Results Manager ARM httpwwwbiodiscoverycomindexarm
Arraytrack httpwwwfdagovnctrsciencecenterstoxicoinformaticsArrayTrack [35 36]
BASE 2 httpbasethepluse
caArray httpcaarrayncinihgov
Expressionist httpwwwgenedatacomproductsexpressionistindex enghtml
Gene Array Analyzer Software GAAS httpwwwmedinfopolipolimiitGAAS
GeneDirector httpwwwbiodiscoverycomindexgenedirector
GeneSpring Workgroup httpwwwchemagilentcomscriptspdsasplpage=34668
GeneTraffic httpwwwiobioncomproductsproducts GENETRAFFIChtml
Genowiz httpwwwocimumbiocom
Longhorn Array Database LAD [37] httpwwwlonghornarraydatabaseorg
MaxdLoad2 httpwwwbioinfmanacukmicroarraymaxdindexhtml
PARTISAN arrayLIMS httpwwwclondiagcom
Rosetta Resolver System httpwwwrosettabiocomproductsresolverdefaulthtm
Stanford Microarray Database SMD httpsmd-wwwstanfordedudownload [38]
relationships for gene products implicated in plant-pathogeninteractions GO terms are currently being developed forthe following species Erwinia chrysanthemi Pseudomonassyringae pv tomato and Agrobacterium tumefaciens the fun-gus Magnaporthe grisea the oomycetes Phytophthora sojaeand Phytophthora ramorum and the nematode Meloidogynehapla see httppamgovbivtedu
611 SWISS-PROT
SWISS-PROT is a curated protein sequence database whichprovides high level of annotations such as the descrip-tion of the function of a protein its domains structurepost-translational modifications variants and so forthalong with good integration with other databases seehttpwwwexpasy chsprot
612 TAIR
The Arabidopsis Information Resource (TAIR) maintainsa database of genetic and molecular biology data forArabidopsis thaliana Data available from TAIR includes thecomplete genome sequence along with gene structure geneproduct information metabolism gene expression DNAand seed stocks genome maps genetic and physical markersand publications see httpwwwarabidopsisorg
7 ANNOTATION TOOLS
The databases described in Section 6 can provide data in avariety of forms which makes merging the annotations withthe expression difficult To deal with this heterogeneity anumber of tools have been developed to increase the ease ofannotating genes in expression studies
71 CiteXplore
CiteXplore combines literature search with text mining toolsfor biology Search results are cross referenced to EuropeanBioinformatics Institute applications based on publicationidentifiers Links to full text versions are provided whereavailable see httpwwwebiacukcitexplore
72 Database for annotation visualization andintegrated discovery (DAVID)
DAVID provides a huge set of functional annotation toolsfor investigators to understand biological meaning behinda large list of genes The key is the DAVID Knowledgebasewhich provides a comprehensive high-quality collection ofgene annotation resource the flexibility to cross-referencegene identifiers and heterogeneous annotations from almostall databases The DAVID tools are able to identify enrichedbiological themes particularly GO terms cluster redundantannotation terms visualize genes on Baccarat and KEGGpathway maps display related many-genes-to-many-termson 2D view search for other functionally related genesnot in the list list interacting proteins highlight proteinfunctional domains and motifs redirect to related literaturesand convert gene identifiers from one type to another seehttpdavidabccncifcrfgov [19]
73 MatchMiner
MatchMiner translates between several gene identifier typesfor the same list of hundreds or thousands of genes Thegene identifier types supported by the tool includes GenBankaccession numbers IMAGE clone IDs common gene namesHUGO names gene symbols UniGene clusters FISH-mapped BAC clones Affymetrix identifiers and chromo-some locations MatchMiner can also find the intersection
6 International Journal of Plant Genomics
of two lists of genes specified by different identifiers seehttpdiscoverncinihgovmatchminerindexjsp [20]
74 Medminer
MedMiner searches and organizes the biomedical literatureon genes gene-gene relationships and gene-drug relation-ships It uses GeneCards PubMed and syntactic analysistruncated-keyword filtering of relational and user-controlledsculpting of Boolean queries to generate key sentencesfrom pertinent abstracts Abstracts selected can be auto-matically entered into EndNote see httpdiscoverncinihgovtextminingmainjsp [21]
8 DATA ANALYSIS SOFTWARE
There is an incredible breadth of tools in this area withmany tools providing very slick interfaces and very usefulfunctions however you really do not need any of these toolsMost statistical packages such as SAS SPSS JMP and Rcan be used to analyze microarray data and will do mostof the functions the following tools will do for there arefew statistical methods that are 100 unique to expressionstudies Nonetheless many of the following tools are mucheasier to use and often have better visualization functionsthan the pure statistical programs Typically the tools havebeen designed for ease of use often too easy Regardless of thetool you use strive to understand the function and analysesprovided and the assumption that are made when you chooseto use them for analysis For example in cluster analysis youneed to make a choice of link and weight functions and theclusters that result will be quite different based on methodswhich are chosen There are similar issues to learn andunderstand for all statistical methods and most visualizationmethods Additional tools are listed in Table 4
81 Bioconductor
Bioconductor is a multicenter effort to develop tools in theR programming environment for analyzing genomic dataespecially microarray data There are a large number ofdifferent packages available to conduct many types of anal-yses currently there are over 115 microarray applicationsTools are still in very active development and are all freelyavailable Some of the most relevant tools are affy maanovagenefilter limma mulltest annotate geneplotter marray toname a few A couple of the packages are described elsewherein this document but for more details of specific tools seethe Bioconductor web site see httpwwwbioconductororg[22]
82 Biometric research branch (BRB) arrays tools
BRB ArrayTools is a free integrated package for the visualiza-tion and statistical analysis of DNA microarray gene expres-sion data It functions as an Excel Addin It was developedby professional statisticians experienced in the analysis ofmicroarray data It is probably the best tool available fordiscriminate analysis and has a variety of other statistical and
cluster methods included see httplinusncinihgovBRB-ArrayToolshtml
83 Expression profiler
Expression Profiler is a set of tools for cluster analysis patterndiscovery pattern visualization study and search for geneontology categories The tool also generates sequence logosextracts regulatory sequences studies protein interactionsand links analysis results to external tools and databases seehttpepebiacuk
84 Genepattern
GenePattern puts sophisticated computational methods intothe hands of the biomedical research community A simpleapplication interface gives a broad audience access to agrowing repository of analytic tools for genomic datawhile an API supports computational biologists GenePat-tern is a powerful analysis workflow tool developed tosupport multidisciplinary genomic research programs anddesigned to encourage rapid integration of new techniquessee httpwwwbroadmiteducancersoftwaregenepatternindexhtml [23]
85 GeneXpress
GeneXPress is a visualization and analysis tool for geneexpression data integrating clustering gene annotationand sequence information GeneXPress allows the userto load clustering results and automatically analyze themfor significance of functional groups through correlationwith functional annotations (eg Gene Ontology) and forenrichment of motif binding sites (eg TRANSFAC motifs)see httpgenexpressstanfordedu
86 GEPAS (gene expression pattern analysis suite)
GEPAS is an integrated web-based tool for the analysis ofgene expression data GEPAS includes tools for normaliza-tion many clustering methods supervised analysis differ-ential analysis differential gene expression predictors arrayCGH and functional annotation see httpgepasbioinfocipfes [24 25]
87 High-dimensional biology statistics (HDBStat)
HDBStat is a free java application that allows for thenormalization transformation and statistical analysis ofexpression data HDBStat also has a number of unique qual-ity control procedures included HDBStat has implementedreproducible research design to allow for analysis to bereadily repeated (httpwwwssguabeduhdbstat) [26]
88 JMP genomics
JMP genomics leverages many statistical tools in JMP astatistical analysis package as a result it has over 100 di-fferent analytical procedures that can be run It also includes
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
6 International Journal of Plant Genomics
of two lists of genes specified by different identifiers seehttpdiscoverncinihgovmatchminerindexjsp [20]
74 Medminer
MedMiner searches and organizes the biomedical literatureon genes gene-gene relationships and gene-drug relation-ships It uses GeneCards PubMed and syntactic analysistruncated-keyword filtering of relational and user-controlledsculpting of Boolean queries to generate key sentencesfrom pertinent abstracts Abstracts selected can be auto-matically entered into EndNote see httpdiscoverncinihgovtextminingmainjsp [21]
8 DATA ANALYSIS SOFTWARE
There is an incredible breadth of tools in this area withmany tools providing very slick interfaces and very usefulfunctions however you really do not need any of these toolsMost statistical packages such as SAS SPSS JMP and Rcan be used to analyze microarray data and will do mostof the functions the following tools will do for there arefew statistical methods that are 100 unique to expressionstudies Nonetheless many of the following tools are mucheasier to use and often have better visualization functionsthan the pure statistical programs Typically the tools havebeen designed for ease of use often too easy Regardless of thetool you use strive to understand the function and analysesprovided and the assumption that are made when you chooseto use them for analysis For example in cluster analysis youneed to make a choice of link and weight functions and theclusters that result will be quite different based on methodswhich are chosen There are similar issues to learn andunderstand for all statistical methods and most visualizationmethods Additional tools are listed in Table 4
81 Bioconductor
Bioconductor is a multicenter effort to develop tools in theR programming environment for analyzing genomic dataespecially microarray data There are a large number ofdifferent packages available to conduct many types of anal-yses currently there are over 115 microarray applicationsTools are still in very active development and are all freelyavailable Some of the most relevant tools are affy maanovagenefilter limma mulltest annotate geneplotter marray toname a few A couple of the packages are described elsewherein this document but for more details of specific tools seethe Bioconductor web site see httpwwwbioconductororg[22]
82 Biometric research branch (BRB) arrays tools
BRB ArrayTools is a free integrated package for the visualiza-tion and statistical analysis of DNA microarray gene expres-sion data It functions as an Excel Addin It was developedby professional statisticians experienced in the analysis ofmicroarray data It is probably the best tool available fordiscriminate analysis and has a variety of other statistical and
cluster methods included see httplinusncinihgovBRB-ArrayToolshtml
83 Expression profiler
Expression Profiler is a set of tools for cluster analysis patterndiscovery pattern visualization study and search for geneontology categories The tool also generates sequence logosextracts regulatory sequences studies protein interactionsand links analysis results to external tools and databases seehttpepebiacuk
84 Genepattern
GenePattern puts sophisticated computational methods intothe hands of the biomedical research community A simpleapplication interface gives a broad audience access to agrowing repository of analytic tools for genomic datawhile an API supports computational biologists GenePat-tern is a powerful analysis workflow tool developed tosupport multidisciplinary genomic research programs anddesigned to encourage rapid integration of new techniquessee httpwwwbroadmiteducancersoftwaregenepatternindexhtml [23]
85 GeneXpress
GeneXPress is a visualization and analysis tool for geneexpression data integrating clustering gene annotationand sequence information GeneXPress allows the userto load clustering results and automatically analyze themfor significance of functional groups through correlationwith functional annotations (eg Gene Ontology) and forenrichment of motif binding sites (eg TRANSFAC motifs)see httpgenexpressstanfordedu
86 GEPAS (gene expression pattern analysis suite)
GEPAS is an integrated web-based tool for the analysis ofgene expression data GEPAS includes tools for normaliza-tion many clustering methods supervised analysis differ-ential analysis differential gene expression predictors arrayCGH and functional annotation see httpgepasbioinfocipfes [24 25]
87 High-dimensional biology statistics (HDBStat)
HDBStat is a free java application that allows for thenormalization transformation and statistical analysis ofexpression data HDBStat also has a number of unique qual-ity control procedures included HDBStat has implementedreproducible research design to allow for analysis to bereadily repeated (httpwwwssguabeduhdbstat) [26]
88 JMP genomics
JMP genomics leverages many statistical tools in JMP astatistical analysis package as a result it has over 100 di-fferent analytical procedures that can be run It also includes
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
G P Page and I Coulibaly 7
Table 4 Other useful statistical analysis and data-mining tools
Tool name Web site
Amiada (analyzing microarray data) httpdambebiouottawacaamiadaasp [39]
ArrayAssist Enterprise httpwwwstratagenecom
caGEDA httpbioinformaticsupmceduGE2GEDAhtml
Cluster httpranalblgovEisenSoftwarehtm
dChip httpwwwdchiporg
GeneMaths XT httpwwwapplied-mathscomgenemathsgenemathshtm
INCLUSive httphomesesatkuleuvenbesimdnaBiolSoftwarehtml
J-Express Pro httpwwwmolminecomsoftwarehtm
MAExplorer httpmaexplorersourceforgenet
NIA Array analysis httplgsungrcnianihgovANOVA
Onto-Tools httpvortexcswayneeduprojectshtm
Probe Profiler httpwwwcorimbiacomPagesProductOverviewhtm
TableView httpccgbumnedusoftwarejavaappsTableView
Venn Mapper httpwwwgatcplatformnlvennmapperindexphp
extensive visualization tools Scripts can be written forthe development of standard analytical procedures seehttpwwwjmpcomsoftwaregenomics
89 Onto-tools
Onto-Tools are a series of freely available tools for theanalysis of microarray data Tools are available for arraydesign (onto-design) gene class testing (onto-express) com-paring the content of arrays (onto-compare) mapping geneinformation across databases (onto-translate) annotation(onto-miner) and pathway analysis (pathway-express) seehttpwwwvortexcswayneedu [27]
810 Partek genomic suite
Partek Genomics Suite can be used for gene expression anal-ysis exon expression analysis chromosomal copy numberanalysis and promoter tiling array analysis and analysisof SNP arrays Partek includes a large number of statis-tical visualization and annotation tools that can be tiedtogether using workflow tools for rapid repetition of analysisand for reproducible research see httpwwwpartekcomsoftware
811 Rmaanova
Maanova stands for MicroArray ANalysis Of VAriance Itprovides a complete work flow for microarray data analysisincluding data-quality checks and visualization data trans-formation ANOVA model fitting for both fixed andmixed effects models statistical tests including permu-tation tests confidence interval with bootstrapping andcluster analysis Rmaanova is available in BioconductorRrefer to httpwwwjaxorgstaffchurchilllabsitesoftwareRmaanovaindexhtml [28]
812 SAM (significant analysis of microarrays)
SAM can be used on any type of array data oligo or cDNAarrays SNP arrays protein arrays and so forth Both para-metric and nonparametric tests are available for correlatingexpression data to clinical parameters including treatmentdiagnosis categories survival time paired data quantita-tive (eg tumor volume) and one-class SAM can alsoimplement imputation methods for missing data via near-est neighbor algorithm see httpwww-statstanfordedusimtibsSAM
813 TM4
The TM4 suite of tools consists of four major applicationsMicroarray Data Manager (MADAM) TIGR SpotfinderMicroarray Data Analysis System (MIDAS) and Multiex-periment Viewer (MeV) as well as a MySQL database allof which are freely available Although these software toolswere developed for spotted two-color arrays many of thecomponents can be easily adapted to work with single-colorformats such as filter arrays and GeneChips see httpwwwtm4orgindexhtml
9 DISSEMINATION
Early in the use of microarray in research it became commonpractice for many journals to require investigators to submitexpression data for publication in a public database Thissharing of data has allowed the mining of these rich resourcesthat many investigators have used to help their research Anumber of the public databases exist that contain and acceptplant data
91 ArrayExpress
ArrayExpress is a public repository for microarray datawhich is aimed at storing MIAME-compliant data in
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
8 International Journal of Plant Genomics
accordance with MGED recommendations This database isa bit less biomedical in focus than GEO with a good repre-sentation of plant expression data see httpwwwebiacukarrayexpress [29 30]
92 GEO
Gene Expression Omnibus is a gene expressionmolecularabundance repository supporting MIAME compliant datasubmissions and a curated online resource for gene expres-sion data browsing query and retrieval This is supported bythe US National Library of Medicine but contains a goodamount of plant expression data see httpwwwncbinlmnihgovprojectsgeo [31 32]
93 NASC (nottingham arabidopsisstock center) arrays
NASC runs a database of its own arrays as well as other datathat has been deposited in the database The database pri-marily contains Arabidopsis array data see httpaffymetrixarabidopsisinfo [33]
94 Plant expression database (PlexDB)
PLEXdb is a unified public resource for gene expression forplants and plant pathogens PLEXdb serves as a portal tointegrate gene expression profile data sets with structuralgenomics and phenotypic data Data from seven speciesis contained in the database see httpwwwplexdborgindexphp [34]
10 CONCLUSIONS
We hope this listing of tools which only dip the surface ofthe possible tools will assist you in conducting analyzingand interpreting expression studies We suggest exploringseveral tools in an area and understanding the principles ofthe methods implemented before settling on one or a few touse regularly By exploring several tools you will understandthe potential of the various tools how easy (or difficult) theyare to use and determine what you really want and need foryour microarray analysis
ACKNOWLEDGMENT
The work on this grant was supported by NSF grant 0501890and NIH grant U54 AT100949
REFERENCES
[1] G Gadbury G P Page J Edwards et al ldquoPower analysis andsample size estimation in the age of high dimensional biologyrdquoStatistical Methods in Medical Research vol 13 pp 325ndash3382004
[2] G P Page J W Edwards G L Gadbury et al ldquoThePowerAtlas a power and sample size atlas for microarrayexperimental design and researchrdquo BMC Bioinformatics vol7 article 84 2006
[3] V G Tusher R Tibshirani and G Chu ldquoSignificance analysisof microarrays applied to the ionizing radiation responserdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 98 no 9 pp 5116ndash5121 2001
[4] L Gautier L Cope B M Bolstad and R A Irizarry ldquoAffymdashanalysis of Affymetrix GeneChip data at the probe levelrdquoBioinformatics vol 20 no 3 pp 307ndash315 2004
[5] H Liu B R Zeeberg G Qu et al ldquoAffyProbeMiner aweb resource for computing or retrieving accurately redefinedAffymetrix probe setsrdquo Bioinformatics vol 23 no 18 pp2385ndash2390 2007
[6] M J Dunning M L Smith M E Ritchie and S TavareldquoBeadarray R classes and methods for Illumina bead-baseddatardquo Bioinformatics vol 23 no 16 pp 2183ndash2184 2007
[7] A I Saeed N K Bhagabati J C Braisted et al ldquoTM4microarray software suiterdquo Methods in Enzymology vol 411pp 134ndash193 2006
[8] A I Saeed V Sharov J White et al ldquoTM4 a free open-source system for microarray data management and analysisrdquoBioTechniques vol 34 no 2 pp 374ndash378 2003
[9] F M McCarthy S M Bridges N Wang et al ldquoAgBase aunified resource for functional analysis in agriculturerdquo NucleicAcids Research vol 35 database issue pp D599ndashD603 2007
[10] F M McCarthy N Wang G B Magee et al ldquoAgBase afunctional genomics resource for agriculturerdquo BMC Genomicsvol 7 article 229 2006
[11] Y Lee J Tsai S Sunkara et al ldquoThe TIGR Gene Indicesclustering and assembling EST and know genes and integra-tion with eukaryotic genomesrdquo Nucleic Acids Research vol 33database issue pp D71ndashD74 2005
[12] T J P Hubbard B L Aken K Beal et al ldquoEnsembl 2007rdquoNucleic Acids Research vol 35 database issue pp D610ndashD6172007
[13] J Quackenbush J Cho D Lee et al ldquoThe TIGR GeneIndices analysis of gene transcipt sequences in highly sampledeukaryotic speciesrdquo Nucleic Acids Research vol 29 no 1 pp159ndash164 2001
[14] J Quackenbush F Liang I Holt G Pertea and J UptonldquoThe TIGR Gene Indices reconstruction and representationof expressed gene sequencesrdquo Nucleic Acids Research vol 28no 1 pp 141ndash145 2000
[15] M Ashburner C A Ball J A Blake et al ldquoGene ontologytool for the unification of biologyrdquo Nature Genetics vol 25no 1 pp 25ndash29 2000
[16] M Kanehisa ldquoThe KEGG databaserdquo Novartis FoundationSymposium vol 247 pp 91ndash101 2002
[17] M Kanehisa S Goto S Kawashima and A Nakaya ldquoTheKEGG databases at GenomeNetrdquo Nucleic Acids Research vol30 no 1 pp 42ndash46 2002
[18] M Kanehisa S Goto M Hattori et al ldquoFrom genomicsto chemical genomics new developments in KEGGrdquo NucleicAcids Research vol 34 database issue pp D354ndashD357 2006
[19] G Dennis Jr B T Sherman D A Hosack et al ldquoDAVIDdatabase for annotation visualization and integrated discov-eryrdquo Genome Biology vol 4 no 5 article P3 2003
[20] K J Bussey D Kane M Sunshine et al ldquoMatchMinera tool for batch navigation among gene and gene productidentifiersrdquo Genome Biology vol 4 no 4 article R27 2003
[21] L Tanabe U Scherf L H Smith J K Lee L Hunter andJ N Weinstein ldquoMedMiner an internet text-mining tool forbiomedical information with application to gene expressionprofilingrdquo BioTechniques vol 27 no 6 pp 1210ndash1217 1999
[22] R C Gentleman V J Carey D M Bates et al ldquoBiocon-ductor open software development for computational biology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
G P Page and I Coulibaly 9
and bioinformaticsrdquo Genome Biology vol 5 no 10 articleR80 2004
[23] M Reich T Liefeld J Gould J Lerner P Tamayo and J PMesirov ldquoGenePattern 20rdquo Nature Genetics vol 38 no 5 pp500ndash501 2006
[24] J Herrero F Al-Shahrour R Dıaz-Uriarte et al ldquoGEPASa web-based resource for microarray gene expression dataanalysisrdquo Nucleic Acids Research vol 31 no 13 pp 3461ndash3467 2003
[25] J M Vaquerizas L Conde P Yankilevich et al ldquoGEPAS anexperiment-oriented pipeline for the analysis of microarraygene expression datardquo Nucleic Acids Research vol 33 webserver issue pp W616ndashW620 2005
[26] P Trivedi J W Edwards J Wang et al ldquoHDBStat aplatform-independent software suite for statistical analysis ofhigh dimensional biology datardquo BMC Bioinformatics vol 6article 86 2005
[27] P Khatri P Bhavsar G Bawa and S Draghici ldquoOnto-Toolsan ensemble of web-accessible ontology-based tools for thefunctional design and interpretation of high-throughput geneexpression experimentsrdquo Nucleic Acids Research vol 32 webserver issue pp W449ndashW456 2004
[28] M K Kerr M Martin and G A Churchill ldquoAnalysis ofvariance for gene expression microarray datardquo Journal ofComputational Biology vol 7 no 6 pp 819ndash837 2000
[29] A Brazma H Parkinson U Sarkans et al ldquoArrayExpressmdashapublic repository for microarray gene expression data at theEBIrdquo Nucleic Acids Research vol 31 no 1 pp 68ndash71 2003
[30] H Parkinson M Kapushesky M Shojatalab et alldquoArrayExpressmdasha public database of microarray experimentsand gene expression profilesrdquo Nucleic Acids Research vol 35database issue pp D747ndashD750 2007
[31] T Barrett D B Troup S E Wilhite et al ldquoNCBI GEOmining tens of millions of expression profilesmdashdatabase andtools updaterdquo Nucleic Acids Research vol 35 database issuepp D760ndashD765 2007
[32] T Barrett T O Suzek D B Troup et al ldquoNCBI GEO miningmillions of expression profilesmdashdatabase and toolsrdquo NucleicAcids Research vol 33 database issue pp D562ndashD566 2005
[33] D J Craigon N James J Okyere J Higgins J Jothamand S May ldquoNASCArrays a repository for microarray datagenerated by NASCrsquos transcriptomics servicerdquo Nucleic AcidsResearch vol 32 database issue pp D575ndashD577 2004
[34] L Shen J Gong R A Caldo et al ldquoBarleyBasemdashanexpression profiling database for plant genomicsrdquo NucleicAcids Research vol 33 database issue pp D614ndashD618 2005
[35] W Tong S Harris X Cao et al ldquoDevelopment of publictoxicogenomics software for microarray data managementand analysisrdquo Mutation ResearchFundamental and MolecularMechanisms of Mutagenesis vol 549 no 1-2 pp 241ndash2532004
[36] W Tong X Cao S Harris et al ldquoArray trackmdashsupportingtoxicogenomic research at the US Food and Drug Adminis-tration National Center for Toxicological Researchrdquo Environ-mental Health Perspectives vol 111 no 15 pp 1819ndash18262003
[37] P J Killion G Sherlock and V R Iyer ldquoThe LonghornArray Database (LAD) an open-source MIAME compliantimplementation of the Stanford Microarray Database (SMD)rdquoBMC Bioinformatics vol 4 article 32 2003
[38] J Demeter C Beauheim J Gollub et al ldquoThe StanfordMicroarray Database implementation of new analysis toolsand open source release of softwarerdquo Nucleic Acids Researchvol 35 database issue pp D766ndashD770 2007
[39] X Xia and Z Xie ldquoAMADA analysis of microarray datardquoBioinformatics vol 17 no 6 pp 569ndash570 2001
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology