“ I think you should be more explicit here in step two” Figure omitted because of copyright...

53
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of “ I think you should be more explicit here in step two” Figure omitted because of copyright...

“ I think you should be more explicit here in step two”

Figure omitted because of copyright reasonA printed version can be found at

Leung YF, Lam DSC, Pang CP. The miracle of microarray data analysis. Genome Biol. 2001 Aug 29; 2: 4021.1-4021.2.

~ Normal science consists largely of mopping-up operations. Experimentalists carry out modified versions of experiments that have been carried out may times before ~

Thomas S. Kuhn

The FAQ of biologist:What is the best microarray analysis software?

Different kinds of microarray software Image analysis software Data mining software

– Statistics software • R packages for microarray analysis

SNPs analysis software Database/ LIMS software Public Expression Database Primer design Software for further data mining: annotation,

promoter analysis & pathway reconstruction

Softwares won’t discuss today

Hardware control softwares– Arrayer controlling – ArrayMaker– Scanner controlling/ Image acquisition

A statistics on current microarray softwares

28 Feb 2002 Jan 2001

Image analysis 17 17

Data mining 39

R packages 14

SNP analysis 1

Database/ LIMS 14 4

Public Database 16 8

Accessory 8 -

Further data mining 9 -

Total 116 29

* Extracted from http://ihome.cuhk.edu.hk/~b400559/arraysoft.html

Image analysis software

Spot recognition Segmentation

– Foreground calculation– Background calculation

Spot quality measures

Major Image analysis softwares AIDA array ArrayPro ArrayVision Dapple F-scan GenePix Pro 3.0.5 ImaGene 4.0 Iconoclust Iplab

Lucidea Automated Spotfinder

Phoretix Array3 P-scan QuantArray 3.0 ScanAlyze 2 Spot TIGR Spotfinder UCSF Spot

Examples of common used image analysis software ScanAlyze 2 (Mike Eisen, LBNL) GenePix Pro 3.0.5 (Axon Instruments) QuantArray 3.0 (Packard Instrument) ImaGene 4.0 (Biodiscovery)

Spot recognition

ArrayPro from Media Cybernetics Automate and fast grid, subgrid and spot finding

algorithms

Segmentation

Purpose – classification between foreground and background– Fixed circle– Adaptive circle– Adaptive shape– Histogram method

Segmentation

Using extra dye – DAPI, avoid morphology assumption

UCSF Spot

Spot quality measure E.g. QuantArray 3.0

– Diameter– Spot Area– Footprint– Circularity– Spot Signal/Noise– Spot Uniformity– Background Uniformity– Replicate Uniformity

Problem: lacking rigorous spot quality definition and experimental verification

Future Image analysis software

Rigorous quality mearsures definition Extra dye for better segmentation Automated analysis

Data mining software

Main purposes1. Filtering and normalization2. Statistical inference of differentially

expressed genes3. Identification of biologically meaningful

patterns, i.e. expression profile; expression fingerprint/ signature

4. Visualization5. Other analysis like pathway reconstruction

etcs.

Different categories

Turnkey system Comprehensive software Specific analysis software Extension/ accessory of other software

Major data mining software AIDA Array AMADA ANOVA program for microarray data ArrayMiner arraySCOUT ArrayStat BRB ArrayTools CHIPSpace Cleaver CIT CLUSFAVOR Cluster Cyber T DNA-arrays analysis tools dchip Expression Profiler Expressionist Freeview & FreeOView Gene Cluster

GeneLinker Gold GeneMaths GeneSight GeneSpring Genesis Genetraffic J-Express MAExplorer Partek R cluster Rosetta Resolver SAM SpotFire Decision Site SNOMAD TIGR ArrayViewer TIGR Multiple Experiment Viewer TreeView Xcluster Xpression NTI

Turnkey system Definition: A computer system that has been customized

for a particular application. The term derives from the idea that the end user can just turn a key and the system is ready to go.

For microarray, this includes everything from OS, server software, database, client software, statistics software and even hardware

Examples– Genetraffic (Iobion)

• Using Open Source softwares - LINUX, the R statistical language, PostgreSQL, and Apache Web server

– Rosetta resolver (Rosetta Biosoftware)• Sun Fire server and drive array, Oracle 8i, Rosetta server and client side

software

Turnkey system Advantages

– performance– Security– Support multiple users– Incorporate the experiment and data standards in design

Disadvantages– Expensive– Not suitable for small labs– Require dedicated supporting staff– Close system

Comprehensive software

Definition: Software incorporate many different analyses for different stage in a single package.

Examples– Cluster (Mike Eisen, LBNL)– GeneMaths (Applied Maths)– GeneSight (Biodiscovery)– GeneSpring (Silicon Genetics)

Comprehensive software

Cluster– Filter data

– Adjust data- normalization, log transform etc

– Clustering

– Self-Organizing Maps (SOMs)

– Principle Component Analysis (PCA)

GeneSpring– & Promoter analysis

– Gene annotation with public database information

– Scripting tools

– Access Open DataBase Connectivity (ODBC) databases

Comprehensive software

GeneMaths– & Bootstrap analysis

for clustering

– Fast clustering algoritms

– Access Open DataBase Connectivity (ODBC) database

GeneSight– & confidence analysis

for replicated data

– statistical analysis for significant genes

– Graphical data set builder

Comprehensive software

Advantages– Standardized operation – Generate various analysis easily– Shorter learning curve for biologist– Script language for automated process control– Some brilliant ideas or analysis within

particular software– “False” Sense of security?

Comprehensive software Disadvantages

– Inflexible to latest analysis development– Generate various analysis too easily– Implicit data analysis/ statistics background and

definitions– Proprietary script language– Data compatibility with other softwares– Necessity to design and maintain your own database– Commercial softwares can be expensive!– Adding particular analysis because of marketing

purpose, extra spending on unnecessary functions– Sometimes only available in a few computing platforms

Specific analysis software

Definition: Software performing a few/ one specific analysis

Examples– GeneCluster (Whitehead Institute Centre

for genome research)– INCLUSive - INtegrated CLustering, Upstream

Sequence retrieval and motif Sampler (Katholieke Universiteit Leuven)

– SAM – Significance Analysis of Microarrays (Stanford University)

Specific analysis software

GeneCluster – performing normalization, filter and SOM

Specific analysis software

INCLUSive - INtegrated CLustering, Upstream Sequence retrieval and motif Sampler

SAM – finding statistical significant differentially expressed gene

Specific analysis software

Advantages– Better statistical background reference, usually

with literature support

Disadvantages– Non-standardized environment – java, web,

excel… etc– Data compatibility problem– Data preprocessing problem

Extension/ accessory of other software Definition: extension of other software’s

capability Examples:

– Freeview: Visualization and Optimization of Gene Clustering Dendrograms for Cluster

– ArrayMiner: extension of GeneSpring

Statistics softwares

Excel MATLAB Octave SAS SPSS S-PLUS Statistica R

Statistics softwares

Advantages– Highly flexible– High level, multivariate analyses are either

standard or easily programmable Disadvantages

– Usually command line driven, impossible to learn intuitively (a disadvantage??)

– Require a much better understanding of the statistical data analysis to follow the steps (a disadvantage??)

R-packages

A language and environment for statistical computing and graphics.

Highly compatible to S/ S-plus Open source under GNU General Public

Licence Runs on many UNIX/ Linux/ windows

family and MacOS platform There are growing number of microarray

analysis softwares (packages) written in R

R-packages Dedicated for

microarray analysis– affy– Bioconductor– SMA extension– Cyber T– GeneSOM– Permax– OOMAL (S-Plus)– SMA– YASMA

General packages– cclust

– cluster

– mclust

– multiv

– mva

– …etc!

R-packages

SMA - Statistical Microarray Analysis (Terry Speed, UC Berkeley)

Bioconductor

R-packages

SMA – perform intensity and spatial dependent

normalization – Replicated array data analysis by an empirical

bayes approach

R-packages Result of replicated data output B vs M plot

R-packages

Bioconductor– open source software project to provide infrastructure

in terms of design and software to assist biologists and statisticians for analysing genomic data, with primary emphasis on inference using DNA microarrays

– Most software produced by the Bioconductor project will be in the form of R libraries

• Variation 1: provide basic infrastructure support that will help other developers produce high quality software

• Variation 2: provide innovative methodology for analyzing genomic data

– Provide some form of graphical user interface for selected libraries

– A mechanism for linking together different groups with common goals

Future Data mining software

Standardized, open-source (free) platform?– EMBOSS - European Molecular Biology Open

Software Suite.

More supervised analysis package and pathway prediction package?

Plugin modules – J-express– GeneSpring

Mutation analysis software

Chip based SNP or chromosomal aberration analysis (arrayCGH)

Various forms of protocols, e.g. primer extension, ligase chain reaction, MALDI-TOF-MS, hybridization..etc

Result is in the form of base calling or allelic imbalance

Example – genorama

Definition: large collection of data organized especially for rapid search and retrieval

Two categories– Within laboratory/ institute database; LIMS– Public expression database

Standardized definition of data – Minimum Information About a Microarray Experiment (MIAME)

• Experimental design• Array design• Samples• Hybridizations• Measurements• Normalization controls

Database

Database/ LIMS software

The database within your lab/ institute The quality of in house data management

will affect the quality of final public data repository

Database structure may be relatively simple

Major Database/ LIMS software AMAD ARGUS ArrayDB ArrayInformatics Clonetracker GeNet Genetraffic GeneX MAD

Maxd NOMAD Partisan Array LIMS Phoretix Array2

Database Rosetta Resolver SMD

Public Expression Database

Necessities– Provide raw data to validate published array

result and develop new analysis tools– Further understanding of your data– Compare among different groups, meta-data

mining– Source for specialty array design

Different categories– Generic– Species specific– Disease specific

The importance of data standardization

Major public gene expression databases 3D-GeneExpression

Database ArrayExpress BodyMap ChipDB ExpressDB Gene Expression Omnibus

(GEO) Gene Expression Database

(GXD) Gene Resource Locator

GeneX Human Gene Expression

Index (HuGE Index) RIKEN cDNA Expression

Array Database (READ) RNA Abundance Database

(RAD) Saccharomyces Genome

Database (SGD) Standford Microarray

Database (SMD) TissueInfo yeast Microarray Global

Viewer (yMGV)

Primer/ probe design

Array designer GAP (Genome- wide Automated Primer

finder servers) OligoArray Primer3 ProbeWiz Server

Other useful software for further data mining Data annotation

– DRAGON– Gene Ontology– PubGene– Resourcerer

Promoter analysis– AlignACE– INCLUSive– MEME– Sequence Logo

Pathway reconstruction– GenMAPP– PathFinder

Data annotation– Link GI to a particular name– Literature mining to infer network

Network reconstruction– Cluster + promoter analysis– statistical inference from experimental data

Some suggestions for biologists who are serious in microarray study Communicate or even collaborate with

Statisticians, Mathematicians and bioinformaticians

Learn a high level statistical language, e.g. R Learn programming, e.g. C Learn database, e.g. SQL Learn Linux Revise your statistics, probability and may be even

calculus Lucky…?!

Picture omitted because of copyright reason

Conclusion – the future

A unified open environment for standard analysis and development

The best microarray analysis software?

~ Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone -- as the first step. ~

John. W. Tukey