Machine Learning & Bioinformatics

20
Machine Learning & Bioinformatics Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

description

Machine Learning & Bioinformatics. Tien-Hao Chang (Darby Chang). PPI. Protein-Protein Interaction. http://www.biomol.de/details/RL/Akt-Signaling-Pathway.jpg. Notes of Akt signaling pathway. Akt is a kinase Kinase act on specific molecules (usually other proteins ) - PowerPoint PPT Presentation

Transcript of Machine Learning & Bioinformatics

Page 1: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 1

Machine Learning & Bioinformatics

Tien-Hao Chang (Darby Chang)

Page 2: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 2

PPIProtein-Protein Interaction

Page 3: Machine  Learning     & Bioinformatics

http://www.biomol.de/details/RL/Akt-Signaling-Pathway.jpg

Page 4: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 4

Notes of Akt signaling pathway Akt is a kinase Kinase act on specific molecules (usually other proteins)

– a type of enzyme, thus a type of protein

Enzyme catalyzes the reaction, but does not change during the

reaction (neither reactant nor product)– like a molecule machine/tool

– a type of protein

Cytokine carry signals between cells– a type of protein

Protein is a class of molecules with specific chemical structure– such naming strategy is widely adopted such as carbohydrate and lipid

Page 5: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 5

Various PPIs By contact type

– physical interaction (complex, transient touch, …)

– genetic association (co-functional, co-expressed, …)

By role– co-work

– work individually (mutually redundant)

– regulate (activate, repress, …)

– act on (catalyze, inhibit, …)

– participate the same pathway (downstream, upstream, …)

Page 6: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 6

Gene?

Page 7: Machine  Learning     & Bioinformatics

http://www.uic.edu/classes/bios/bios100/lectures/geneticsignaling.jpg

Page 8: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 8

Notes of gene expression DNA RNA protein

– DNA is the blueprint, hard to damage thus hard to manipulate

– RNA is the transcript, very similar to DNA and more active

– protein is the final product

Gene is a DNA sequence that can perform specific

functions– usually becomes functional after translating to protein

These terms are sometimes interchangeably– some PPIs are defined by the interactions among the

corresponding DNAs/RNAs

Page 9: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 9

Experimental TechniquesSince there are various PPIs…

Page 10: Machine  Learning     & Bioinformatics

Shoemaker and Panchenko, 2007

Page 11: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 11

Notes of experimental techniques (A) yeast two-hybrid (Y2H) detects interactions between proteins X and Y, where X is

linked to BD domain which binds to upstream activating sequence (UAS) of a promoter (B) mass spectroscopy (MS) identifies polypeptide sequence (C) tandem affinity purification (TAP) purifies protein complexes and removes the

molecules of contaminants (D) gene co-expression analysis produces the correlation matrix where the dark areas

show high correlation between expression levels of corresponding genes (E) protein microarrays (protein chips) can detect interactions between actual proteins

rather than genes: target proteins immobilized on the solid support are probed with a

fluorescently labeled protein (F) synthetic lethality method describes the genetic interaction when two individual,

nonlethal mutations result in lethality when administered together (a-b-)

(all these are high-throughput)

Page 12: Machine  Learning     & Bioinformatics

http://www.informaworld.com/ampp/image?path=/713599661/793610806/tfac_a_300921_o_f0001g.png

We can “see” the interaction

Page 13: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 13

Computational ApproachesWhat we can do, and will do

Page 14: Machine  Learning     & Bioinformatics

Shoemaker and Panchenko, 2007

Page 15: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 15

Notes of computational approaches (A) gene cluster and gene neighborhood methods, different boxes showing

different genes (B) phylogenetic profile method, showing the presence/absence of four

proteins in three genomes (C) Rosetta Stone method (D) sequence co-evolution method looking for the similarity between two

phylogenetic trees/distance matrices (E) classification methods shown with the example of random forest

decision (RFD) method, where five different features/domains are used and

each interacting protein pair is encoded as a string of 0, 1 and 2– the decision trees are constructed based on the training set of interacting protein

pairs and decisions are made if proteins under the question interact or not (‘‘yes’’

for interacting, ‘‘no’’ for non-interacting)

Page 16: Machine  Learning     & Bioinformatics

http://www.grin.com/object/external_document.250856/959745461da5e2c263045729f234e1b6_LARGE.png

A small variation this year

Page 17: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 17

Classification approaches Also called machine learning-based

approaches– classification is so-called “supervised learning”

The most critical step is– to encode a protein pair as a vector

– (to extract appropriate features)

Page 18: Machine  Learning     & Bioinformatics

http://www.sagennext.com/wp-content/uploads/2010/02/Business-Man-and-Woman1.jpg

How do you recognize man and woman?

Page 19: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 19

Notes of feature encoding Know the problem (domain knowledge)

You may not know which feature is important (e.g. hair length vs. eyesight) You may not have the key feature

– e.g. no height when given only mug shots

– e.g. collecting body fat is much difficult

– carefully define the problem and what materials are available

You (usually) may not know the key feature– e.g. suppose that the sex chromosomes are unknown

– depicting the mechanism is much important than just predicting

The key features may change (e.g. hair length) There are always exceptions (e.g. bisexual)

Page 20: Machine  Learning     & Bioinformatics

Machine Learning & Bioinformatics 20

Materials that we can support Biological process Cellular compartment DNA sequence Domain Expression Genomic location No. of references Molecular function

Orthologous information Pathway Protein sequence TATA box Transcription boundaries TF binding TFBS TF knockout expression