Session i overview bioinfo dm and app mmc

12
Data Manipulation: Molecular Online and Server Tools & BioExtract Server Theme: FXN Gene and Pancreatic Cancer. Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24 th 2013 [email protected]

Transcript of Session i overview bioinfo dm and app mmc

Page 1: Session i overview bioinfo dm and app mmc

Data Manipulation: Molecular Online and Server Tools & BioExtract Server

Theme: FXN Gene and Pancreatic Cancer.

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th [email protected]

Page 2: Session i overview bioinfo dm and app mmc

Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Metabolic:

• Sabio-RK (check with Brent)

• KEGG (check with Brent)

• HMDB (hmdb.ca, contact for API)

• SMPDB (http://www.smpdb.ca)

• BioModels

• drugDB

• Brenda (check with Brent)

• [Mathi's project]Protein

• Expazy DB collection (uniprot, )

• PDB

• SBKB

• STRINGGenomic:

• G.E.O.

• GenBank

• GO

• EBI Array Express & Gene AtlasPhenomic:

• PhenomicDB

• Phenoscape

Page 3: Session i overview bioinfo dm and app mmc

Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Active Network Extraction & Analysis

Reactome Functional Interaction network

Disease subnetwork

Extract mutated, overexpressed, undexpressed, expanded/deleted genesAdd Linker

genes

Disease “modules”

Disease gene prediction

Sample classification

Hypothesis generationApply community clustering algorithms

Page 4: Session i overview bioinfo dm and app mmc

Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

p53, SMAD, TGFβ, TNF signaling

KRAS, MAPK signaling

Heterotrimeric G-protein signaling

Rho GTPase signaling

Transcription & translation

Cell cycle

Wnt & Cadherin signaling

Hedgehog signaling

Transcription

Zinc fingers

Ca2+ Signaling

Non-silent mutations• blue – in primary tumour only• green – in xenograft only• red – in primary & xenograft

Pancreatic Cancer Module Map (43 Cases)

Christina Yung / Bioinformatics.ca

Page 5: Session i overview bioinfo dm and app mmc

Data ManipulationMolecular Online Tools: BioExtract Server

Bibliographic Taxonomic

Nucleotide Genomic Protein Metabolic pathway

Molecular Biology

Databases

MEDLINEPubMedEMBASEBIOSISCAB InternationalAGRICOLA

NEWTThe Tree of LifeSpecies 2000IOPIITIS

KEGGEcoCycBRENDAENZYMEBIOMODELREACTOME

INSDCEMBLDDBJNCBIGENBANK

SPGPAceDBHIV-SD EnsemblWormbaseFlyBaseMGDSGDEBI ( Genome server, Karyn’s genome)RGDSPGP

•GOA•ENZYME•INterPro•PDB•Integr8•MEROPS LIGAN•EMP•DCHGR

•PROSITE•PRINT•Pfam•BLOCKS•SBASE

•UniProt/Swiss-Prot•PIR

Pri

mar

y pr

otei

n se

quen

ceS

peci

ali

zed

pro

tein

se

qu

en

ce

Secon

dary an

d stru

cture

protein

Review: Databases

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Page 6: Session i overview bioinfo dm and app mmc

Sequence Type Accession Number

DNA sequence from GENBANk , EMBL or DDBJ 1 letter + 5 digits : U437522 letter + 6 digits : AF462052

GenePept sequence GENBANk , EMBL or DDBJ 3 letter + 5 digits : AAF46449

Protein sequence from SwissProt 1 letter + 5 digits : Q16595

Protein sequence from the Protein Research Foundation 6/7 digits + 1 letter : 2808353A

RefSeq sequence 2 letters + _ + >6 digitsmRNA : NM_******Protein : NP_******

Protein sequence from Protein Data Bank PDB 1 digit + 3 letters : 2EFF

Protein sequence from Molecular Modeling DataBase MMDB ID + >4 digits : MMDB ID 767744

Review: data formatData Manipulation

Molecular Online Tools and BioExtract Server

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

>gi|XXXX |XXX >sp|XXXX |XXXGene Info number Specie referenceAccession number Gene Info number Specie referenceAccession number

Page 7: Session i overview bioinfo dm and app mmc

Data Manipulation Molecular Online Tools: BioExtract Server

Biological sequences and data can be analyzed in many ways with

bioinformatics tools. They can be read, assembled, compared, mapped,

predicted, designed, modeled…

1.Nucleotide and protein sequence searching (blastall, SSEARCH

for fasta local, GLSEEARCH for global)

2.Multiple sequence alignment (clustalW2, Mview, …)

3.Pairwise sequence alignment (Needle for global, LALIGN for

local)

4.Protein functional analysis (SMART, Phobius, interproscan)

5.Functional genomic tools (R-tools, SAIL, EFOtools,)

6.Molecular structure analysis (PDBeFold, QuaternaryStructure,

…)

7.Scientific literature text mining (EBIMed, Whatizit)

8.Sequence translation (Transeq, readseq, Backtranseq,…)

9.Data retrieval and ID mapping (dbfetchm, ENA/SRA, SRS,

PICR)

10.Protein structure prediction tools

11.…

Review: Online Programs & Algorithms

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Page 8: Session i overview bioinfo dm and app mmc

Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

AND = term1 AND term2 must exist in the searched documentsOR = term1 OR term2 must exist NOT = term1 must not be present in any of the displayed documentsALL = term1 must not be present in all of the displayed documents+ term1 = document must contain the term1- term1 = document must not contain term1XXX* = all characters are accepted after the XXXXX?YX = all characters are accepted instead of Y

FXN [AND] gene [NOT] Frataxin all data related with FXN gene except those concerning Frataxin protein ataxia + apraxia + gene all genes related with ataxia and apraxia Ada* [AUTH] all authors whose names begin with Ada

Boolean operators and symbols

Page 9: Session i overview bioinfo dm and app mmc

Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

BLAST (Basic Local Alignment search Tool) : comparing a protein or a DNA sequence to other sequences

FASTA (FAST-ALL): fast protein or nucleotide comparison

Similarity search tools

Page 10: Session i overview bioinfo dm and app mmc

Global match : align all residues of a sequence with all of the other sequence

Local match : find a region in one sequence that matches with the other

Motif match : find matches of a short sequence in one or more region internal to another long sequence, it could be a :

Multiple alignment : a mutual alignment of many sequences

Perfect match

deletions insertionsmismatches

Review: Sequence AnalysisData Manipulation

Molecular Online Tools and BioExtract Server

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Page 11: Session i overview bioinfo dm and app mmc

Review: Sequence AnalysisData Manipulation

Molecular Online Tools and BioExtract Server

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Sequence alignment : assignment of residue-residue

correspondence

Determine phylogenic relationship by analyzing similarity and

homology-Similarity: Observation or measurement of

resemblance and difference

Homology: The sequences and the organisms in which they

occur are descended from a common ancestor Homology must

be an inference from observation of similarity

Determine if a protein (or a gene) is related to a larger group of

proteins

Verify if a mutated residue is conserved within species

Page 12: Session i overview bioinfo dm and app mmc

Context

0. Specification & Aims

.

Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty.

Molecular Online Tools and Server

Keywords: Bio: FXN, Frataxin, pancreatic cancer, CDKN4Math: HMM, Informatics: programing, bioinformatics tools, getting and exporting data

Reduced expression of frataxin is the cause of Friedrich's ataxia (FRDA), a lethal neurodegenerative disease, how about liver cancer?

Aim: The purpose of this lab is to initiate online biological exploration tools of the human model large scale data study (metabolic, proteic, genomic, …). We simulated the application on FXN gene and pancreatic cancer disease. Now we can understand how a researcher can come to identify cross biological knowledge available in data banks.

Acquired skillsOnline and server tools:- Query biological DB (fasta, Html, txt, figure formats)- Sequence tools (protein and gene)Alignment (showalign, clustalw2), similarity, …- Manage data result (select, keep, map, export)- Build and reuse workflow

Biological Hypothesis

FXN on chromosome 9

Frataxin molecule structure (pymol)

Pancreatic cancerPancreas anatomy

?Bio

log

ical

DB

Tools

Resolution Process

T2. Genome exploration: Objective: Use of Ensembl to localize the FXN on the human genome and identify the genes implicate in pancreatic cancer disease.

T3. Sequences manipulation Objective: Find similar sequence using BLAST tools and make an alignment on given sequences.

T2.1. Locate a given gene on human genomeT2.2. Get a genomic sequence from NCBI T2.3. Get the protein data and sequence from EBI T2.4. Save the export sequences data in data folder

T3.1. Find similar sequences using BLAST toolT3.2. Align generated sequences with ClustalW toolT3.3. Visualized result using phylogenic tree on Jalview

T5. BioExtract server Objective: used server tool to optimized data

manipulation process, apply on BioExtract server.

T5.1. Server Initialization T5.2. Pancreatic cancer & Frataxin (FXN) T5.3. Mapping, Alignment T5.4. Workflow save & reused

T4. Protein Data and Structural Biology Knowledge

Objective: To provide protein levels of frataxin study and its connection with pancreatic cancer (functional ad structural data)

T1. Metabolomics Objective: Use metabolic data repository to understand the frataxin protein mechanismT1.1. Finding the Enzyme and Pathway

related to Frataxin using KEGG T1.2. Finding the Reaction involved with Frataxin using Reactome T1.3. Using BRENDA for enzyme data on FrataxinT1.4. Using Collected data for AnalysisT1.5. Redu the process with Pancreatic Cancer Results

T4.1. Structural Knowledge on Frataxin using SBKBT4.2. Using Uniprot for Frataxin Protein Study T4.3. Protein-Protein Interaction using STRINGT4.4. Using same method for Pancreatic Cancer and compare