Post on 02-Apr-2015
07/07/2010
Eines bioinformàtiques i estadístiques per a la investigació biomèdica
Anàlisi de dades amb Ingenuity Pathways
Alex SánchezUnitat d’Estadística i Bioinformàtica
07/07/2010
We are drowning in information and starved for knowledge
John Naisbitt
Who on efficient work is bent,Must choose the fittest instrument.
Goehthe (Fausto)
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 3
Esquema de la presentación
• Más allá de los microarrays…• Ingenuity Pathways Analysis
– Visión general– Componentes– Tipos de estudios
• Ejemplos de uso – Exploración y búsqueda de información– Análisis de datos
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 5
Un experimento con microarrays...
Listas de identificadores (genes, miRNAs, …) seleccionados
07/07/2010
So Where do we go from here? Or, How To Drive A Biologist Crazy?
Ted SlaterTed Slater
Proteomics Center of EmphasisProteomics Center of Emphasis
Pfizer Gobal R&D MichiganPfizer Gobal R&D Michigan
• gi|84939483 • gi|39893845 • gi|27394934 • gi|18890092 • gi|10192893 • gi|11243007 • gi|20119252 • gi|19748300
• gi|44308356 • gi|50021874 • gi|10003001 • gi|27762947 • gi|24537303 • gi|27284958 • gi|37373499 • …
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 7
De las listas a la Biologia
• Enfoque tradicional para el análisis de las listas de genes: de uno en uno – Literatura, bases de datos, ...
• Problema:– Tarea lenta, tediosa y, lo que es peor ...– Ignora posibles interacciones
• Enfoque alternativo: Análisis Funcional o de “Significación Biológica”.
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 8
Los métodos de Análisis Funcional
• Son métodos automáticos para– Identificar procesos biológicos asociados con los
resultados experimentales.– Determinar los temas funcionales comunes a grupos
de genes seleccionados.– Analizar las conexiones entre genes, moléculas y
enfermedades mediante la exploración automática de la literatura para descubrir asociaciones relevantes con los resultados experimentales.
• Facilitan el uso de información auxiliar.• Ayudan a entender los fenómenos biológicos
subyacentes.
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 9
Herramientas de Análisis Funcional
• Docenas de programas en los últimos 10 añoshttp://estbioinfo.stat.ub.es/resources/index.html
• Estudio directo de las listas de genes– Basadas en GO u otras BD (KEGG,...)
• fatiGO, DAVID, GSEA, Babelomics ... [SerbGO]• Ingenuity Pathways Analysis
• Exploracion de relaciones en la literatura– PubMed, Scopus, HighWire, GOPubMed, …– Ingenuity Pathways Analysis
• Estudio de pathways asociados con las listas– Pathway Explorer, GenMapp, – Ingenuity Pathway Analysis
07/07/2010
Cursos y materiales
• CNIO – 4th Course on Functional Analysis of Gene
Expression
• Canadian Bioinformatics Workshop– Interpreting gene lists from omics sets
• EADGENE and SABRE – Post-analyses Workshop
07/07/2010
Ejemplo 1
• The Polycomb group protein EZH2 is involved in progression of prostate cancer (Nature, 419 (10) 624-629)– Varambally et al. (2002) estudian las
diferencias entre cancer de prostata localizado (PCA) y metastático (MET)
• EZH2 sobreexpresado en MET • Los casos de PCA con EZH2 alto peor prognosis
– Sugieren que EZH2 puede • Estar implicado en la progresión PCAMET• Distinguir el PCA benigno del de mal pronóstico.
07/07/2010
Ejemplo 1
• Análisis de microarrays – Listas de genes up (55) y down (438) reg.
• Un análisis funcional permitirá estudiar– Que procesos biológicos (pathways) estan
relacionados con los genes de las listas• Bases de datos de anotaciones
– Que funciones se presentan en las listas con una frecuencia distinta de la de todos los genes estudiados
• Análisis de enriquecimiento
– Las herramientas disponibles en Babelomics son una buena opción para este análisis.
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 15
Los genes se agrupan por funciones
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 16
Las funciones se asocian a pathways
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 17
Los cambios de expresión se proyectan en el pathway
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 19
Ingenuity Pathways Analysis
• Ingenuity Pathways Analysis (IPA) is an all-in-one software application that enables researchers to model, analyze, and understand the complex biological and chemical systems at the core of life science research
07/07/2010
IPA Challenge
Integrate, Interpret, Gain Therapeutic Insight from Experimental Data
Expression Arrays Proteomics Traditional Assays
Experimental Platforms
FASFAS VEGFAVEGFA bevacizumabbevacizumabMolecules
ApoptosisApoptosis AngiogenesisAngiogenesisCellular
Processes
CancerCancerDisease
Processes
Disease/physiological response
Overlapping cellular
processes/pathways
Molecular Interactions
Molecular Perturbation
07/07/2010
IPA Challenge
Gain Rapid Understanding of Experimental Systems
Expression Arrays Proteomics Traditional Assays
Experimental Platforms
FASFAS VEGFAVEGFA bevacizumabbevacizumabMolecules
AngiogenesisAngiogenesisApoptosisApoptosis
Cellular Processes
CancerCancerDisease
Processes
Guide in vivo/in vitro assays
Search for genes implicated in
disease
Identify related cellular
processes/pathways
Generate hypothesis
07/07/2010
Ingenuity Platform
• Findings manually extracted from full text
• Extensive libraries of metabolic and signaling pathways
• Chemical and drug information
• Scalable best-in-class content acquisition processes
• Designed to enable computation
• Consists of biological objects and processes in organized into major branches
• Robust, up-to-date synonym library
• Knowledge infrastructure tools and processes for structuring biological and chemical knowledge
Ingenuity Knowledge Base
Content Ontology
07/07/2010
Ingenuity Knowledge Base: Content
Expert Extraction: Full text from top journals
• Coverage of peer-reviewed journals, plus review articles and textbooks
• Manually extracted by Ph.D. scientists
Import Annotations, Findings:
• OMIM, GO, Entrez Gene
• Tissue and Fluid Expression Location
• Molecular Interactions (e.g. BIND, DIP, TarBase)
Internally curated knowledge:
• Signaling & Metabolic Pathways
• Drug/Target/Disease relationships
• Toxicity Lists
All findings are structured for computation and updated regularly
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 29
Instalación y puesta en marcha
• IPA funciona en línea. – No se instala. Tan sólo se accede a él
• Para utilizarlo se necesita una cuenta– Prueba (15 días).– Acceso (IRHUVH y HVH) mediante reserva
previa a la UEB y en horario de mañana o tarde.
• Funciona en Windows o Mac, pero no en Linux
07/07/2010
My Pathway & Lists
• Build custom libraries of pathways representing mechanism of action and mechanism of toxicity. Create custom, literature-supported signaling pathways with proteins of interest. Store collections of custom pathways and lists for subsequent core, IPA-Tox™, IPA-Biomarker™, or IPA-Metabolomics™ analyses.
• Use the Grow and Connect tools to edit and expand networks based on the molecular relationships most relevant to the project: – Transcriptional networks – Phosphorylation cascades – Protein-Protein or Protein-DNA interaction networks – microRNA-mRNA target networks – Chemical effects on proteins
• Use Search results as building blocks for custom pathways – Identify cross-talk between biological processes and pathways – Understand whether gene lists and signatures are tightly
connected at the molecular level
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 49
Analyze & Interpret data
• IPA Core Analysis• IPA Tox Analysis• IPA Biomarker Analysis• IPA Metabolomic Analysis
07/07/2010
IPA-Biomarker™ Analysis
• IPA-Biomarker identifies the most promising and relevant biomarker candidates within experimental data. – Prioritize molecular biomarker candidates based on
key biological characteristics. – Elucidate mechanism linking potential markers to a
disease or biological process of interest. – Perform analysis across biomarker lists to find
biomarker candidates unique to a disease stage or common across all stages.
– Understand the molecular differences between patient populations.
07/07/2010
IPA.Tox Analysis
• IPA-Tox delivers a focused toxicity and safety assessment of candidate compounds.– Enables assessment of the toxicity and safety of
compounds early in the development process. – Provides expert molecular toxicology data
interpretation to non-expert users. – Reveals clinical pathology endpoints associated with
a dataset. – Generates new hypotheses that may not have been
revealed using traditional toxicology approaches. – Elucidates mechanism of toxicity and identify
potential markers of toxicity.
07/07/2010
IPA-Metabolomics Analysis
• IPA-Metabolomics extracts rich pathway information from metabolomics data. – Overcomes the metabolomics data analysis
challenge by integrating transcriptomics, proteomics, and metabolomics data to enable a complete systems biology approach.
– Provides the critical context necessary to gain insights into cell physiology and metabolism from metabolite data.
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 55
Communicate & Collaborate
• Share• Report• Interactive Pathways• Integrate with other software
07/07/2010
Resumen
• El análisis funcional mejora la comprensión de los fenómenos biológicos mediante el estudio simultáneo de grupos de valores.
• Ingenuity Pathways permite– Explorar – Analizar– Comunicar y compartir
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 58
Ventajas e inconvenientes
Intuitivo y fácil de usar Integración de todas las funcionesMuy potente en humanos y cáncer
No tan potente en otras especies o enfermedades.
No es libre sino que hay que pagarlo No incorpora algoritmos avanzados
potentes como GSEA
07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 60
Networks
• A network is a set of terms (“nodes”) related by a set of relations (“edges”).
• IPA transforms a list of genes into a set of relevant networks based on information maintained in the Ingenuity Pathways Knowledge Base (IPKB)
• This knowledge base has been abstracted into a large network, called the Global Molecular Network, composed of thousands of genes and gene products that interact with each other.
07/07/2010
Networks in IPA
• Purpose: – To show as many interactions between user-
specified molecules in a given dataset and how they might work together at the molecular level
• Why are Ingenuity networks biologically interesting?– Highly-interconnected networks are likely to
represent significant biological function
07/07/2010
Key Terminology
• Focus Molecule:– Molecules that are from uploaded list, pass filters are
applied, and are available for generating networks• Networks:
– Generated de novo based upon input data– Do not have directionality– Contain molecules involved in a variety of Canonical
Pathways• Canonical Pathways (Signaling and Metabolic)
– Are generated prior to data input, based on the literature– Do NOT change upon data input– Do have directionality (proceed “from A to Z”)
• My Pathways and Path Designer Pathways– Custom built pathways manually created based on user
input
07/07/2010
How Networks Are Generated
1. Focus molecules are “seeds”2. Focus molecules with the most
interactions to other focus molecules are then connected together to form a network
3. Non-focus molecules from the dataset are then added
4. Molecules from the Ingenuity’s KB are added
5. Resulting Networks are scored and then sorted based on the score
35 molecules per network for visualization purposes
07/07/2010
Calculation of Score for Networks in IPA
• Based on the Right-tailed Fisher's Exact Test• Used as a means to rank/sort networks so that those with the
most focus molecules are at the top of the list• Takes into account the number of focus molecules in the
network and the size of the network • Not an indication of the quality or biological significance of the
network
07/07/2010
Significance Calculations
• Measures the likelihood that a function is over-represented by the molecules in your dataset
• Expressed as a p-value calculated by using the right-tailed Fisher's Exact Test
• Range indicates most significant low level function to least significant low-level function
07/07/2010
Multiple Testing Correction
•Benjamini-Hochberg method of multiple testing correction
•Calculates False Discovery Rate– Threshold indicates the fraction of false positives
among significant functions
0 0.05 1.0
5% (1/20) may be a false positive
07/07/2010
Which p-value calculation should I use?
•What is the significance of function X relative to the dataset?– Use right-tailed Fisher’s Exact test result
•What is the significance of function X relative to all the other functions in the dataset?– Use Benjamini-Hochberg multiple testing
correction