UCI ICS IGB SISL Scientific Applications of Machine Learning Eric Mjolsness Scientific Inference...

22
UCI ICS IGB SISL Scientific Applications of Machine Learning Eric Mjolsness Scientific Inference Systems Laboratory Donald Bren School of Information and Computer Sciences, and Institute for Genomics and Bioinformatics University of California, Irvine
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    1

Transcript of UCI ICS IGB SISL Scientific Applications of Machine Learning Eric Mjolsness Scientific Inference...

UCI ICS IGB SISL

Scientific Applications of Machine Learning

Eric MjolsnessScientific Inference Systems Laboratory

Donald Bren School of Information and Computer Sciences, and

Institute for Genomics and Bioinformatics

University of California, Irvine

UCI ICS IGB SISL

Scientific Imagery Applications

NGC 7331 - http://photojournal.jpl.nasa.gov/catalog/PIA06322Arabidopsis SAM - Meyerowitz Lab

UCI ICS IGB SISL

Some Basic Machine Learning Distinctions• Supervised vs. unsupervised learning

– Supervised e.g. classification and regression • Feature selection• regression for phenomenological model fitting e.g. GRN’s

– Unsupervised e.g. clustering; may be preprocessor

• Generative vs. Kernal methods– Generative (statistical inference) models– Kernal methods e.g Support Vector Machines

• Vector vs. Relationship data– Vector data: preprocessed image features log I, x, …– Images, time series, shifted spectra - semigroup actions– Sparse graph/relationship data - permutation actions

UCI ICS IGB SISL

Correspondence Problems• Extended sources - map morphologies

– Similar to biological imaging problems– Fewer sources but many pixels

• Moving or changing point sources– E.g. Ida and Dactyl / JPL MLS

• Dense point sources with instrument noise e.g. globular clusters (radial density function)

• Techniques: – soft permutations, geometric transformations via optimization &

continuation– Embedding inside a graph clustering (optimization) algorithm– Multiscale acceleration of optimization

UCI ICS IGB SISL

Mixture Models• Mixture of Gaussians, t-distributions, …

– Can do outlier detection

• Mixture of factor analyzers

• Mixture of time series models• * Problem-specific generative models

– Can formulate with a Stochastic Parameterized Grammar

– Clustering graphs

Frey et al. 1998

Utsugi and Kumagai 2000

UCI ICS IGB SISL

Stochastic Grammars for Data Modeling

UCI ICS IGB SISL

Text & Biology Models

UCI ICS IGB SISL

More Detailed Clustering Grammars

• Clusters generate data

• Priors on cluster centers & variances

• Iterative through levels in a hierarchy

• Recursive through hierarchy

σ

Pr

r y

Pr

r x −

r y ( ) /σ

Pr

DatasetCluster(1,y,σ)Cluster(2,y,σ)Cluster(3,y,σ)Datu ′ m (1,1),x(1,1)( )

Datu ′ m (1,2),x(1,2)( )

Datu ′ m (3,n),x(3,n)( )

Datumx1( )

DatumxN( )

Datumx2( )

1/ N!

UCI ICS IGB SISL

Rock Field Grammargrammar rockfield(){

start → {deposit(a,ya,ca,pa) |a∈A} , distractors

c

a

2/ 2σ 0

2

a∑ + ya

2/ 2σ 0

2

a∑ + pa −p̂

2/ 2σ1

2

a∑ + log2 σ̂ a /

)σ( ) deposit(a,c

a,p

a)→ {patch(a,b,xab,ca,pab) |a∈A,b∈Ba}

patch(a,b,x

ab,c

a,p

a) → {rock xabc,cabc,sabc( ) |a∈A,b∈Ba,c∈Cab}

yab−ya

a∑ 2

/ 2σ̂ a2

p

ab~ f p

a, x

ab−xa( )

c

abc−cab

ab∑ 2

/ 2σ 42 + yabc −yab

ab∑ 2

/ 2σ52

⇒MFT

E = ca

2/ 2σ 0

2

a∑ + ya

2/ 2σ 0

2

a∑ + pa −p̂

2/ 2σ1

2

a∑ + log2 σ̂ a /

)σ( )

+ yab −yaa∑ 2

/ 2σ̂ a2 + Piab ci −ca

2/ 2σ 4

2 + xi −yab

2/ 2σ5

2⎡⎣⎢

⎤⎦⎥iab

{rock x

abc,c

abc,s

abc( ) |a∈ ′A ,b∈ ′Ba,c∈Cab} → {visible_rock xi,ci ,si( ) |i ∈ I}

distractors → {rock x00d,c00d,s00d( ) |d∈D}

c

00dab∑ 2

/ 2σ 02 + x00d

ab∑ 2

/ 2σ 32

s

00d~ sizedistr p̂( )

{} ,,~uniform_permutations()iabciabcabicPPn=∑

x

i= Pi,abcxabc

ab∑

s

abc~ sizedistr p

a( )

UCI ICS IGB SISL

Transcriptional Gene Regulation Networks

• Gene Regulation Network (GRN) model

T

v

Extracellularcommunication

Drosophila eve stripe expression in model (right) and data (left). Green: eve expression, red: kni expression. From [Reinitz and Sharp, Mech. of Devel., 49:133-

158, 1995 ]. [Mjolsness et al. J. Theor. Biol. 152: 429-453, 1991]

UCI ICS IGB SISL

Gene Regulation + Signal Transduction Network

transcriptional regulation targets

receptors

ligands

cell

nucleus

d

dtv

a(t) =

1τ a

g(ua + ha) −λava⎡⎣ ⎤⎦,

where

ua(t) = Tabvb(t) +b∑ I T̂abvb

I (t)b∑

I∈Nbrs∑ + I %Tac

1 %Tcb2vc(t)vb

I

c∑

b∑

I∈Nbrs∑ (t)

T

UCI ICS IGB SISL

Software architectures for systems biology: Sigmoid & Cellerator

UCI ICS IGB SISL

3-tier architecture

Database Access

Model Translation

SigmoidPathway Representation/StorageDatabase

CelleratorSimulation/InferenceEngine

MENU

InteractiveGraphicModel(SVG/Applet)

Graphic Output

OJBAPI

JLinkAPI

PROPERTY

SOAP: Web Service

XML(Object),

Image,via HTTP

UCI ICS IGB SISL

Possible software support

• Machine learning (open source/academic)– CompClust (CIT/JPL):

• Scripting/GUI dichotomy data point; • dataset views

– WEKA data mining– Intel: PNL Probabilistic Networks Library– Future: stochastic grammar modeler

• + autogeneration (as in Cellerator)

• Image processing, data environments – Matlab, IDL, Mathematica, Khoros/VisiQuest, …– NIHImage/ImageJ, …

UCI ICS IGB SISL

Metadata in Systems Biology

• SBML

• Sigmoid UMLQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

UCI ICS IGB SISL

CLV3CLV1

?

Fletcher et al., Science v. 283, 1999 Brand et. al., Science 289, 617-619, (2000)

WUS

UCI ICS IGB SISL

protein concentrations

Y

SAM gene network: Results

wus(init) and L1

X

UCI ICS IGB SISL

SAM: Gene Network Model

Z

wus

clv1

clv3

Xdiffusive

YL1diffusive

UCI ICS IGB SISL

SAM growth imagery

PIN1 cell

walls

QuickTime™ and aTIFF decompressor

are needed to see this picture.

UCI ICS IGB SISL

Venu Gonehal

UCI ICS IGB SISL

Basic Machine Learning Distinctions• Supervised vs. unsupervised learning

– Supervised e.g. classification and regression • Feature selection• regression for phenomenological model fitting e.g. GRN’s

– Unsupervised e.g. clustering; may be preprocessor

• Generative vs. Kernal methods– Generative (statistical inference) models– Kernal methods e.g Support Vector Machines

• Vector vs. Relationship data– Vector data: preprocessed image features log I, x, …– Images, time series, shifted spectra - semigroup actions– Sparse graph/relationship data - permutation actions

UCI ICS IGB SISL

Contacts

• Wayne Hayes, UCI ICS faculty– scientific computing

• UCI ICS Maching Learning– Padhraic Smyth– Pierre Baldi

• Chris Hart, Caltech Biology grad student