Introduction to BioConductor

13
Introduction to Introduction to BioConductor BioConductor Friday 23th nov 2007 Friday 23th nov 2007 Ståle Nygård ([email protected]) Ståle Nygård ([email protected]) Course in Statistical methods and Statistical methods and bioinformatics for the analysis of microarray bioinformatics for the analysis of microarray data data

description

Introduction to BioConductor. Friday 23th nov 2007 Ståle Nygård ([email protected]) Course in Statistical methods and bioinformatics for the analysis of microarray data. What is BioConductor?. - PowerPoint PPT Presentation

Transcript of Introduction to BioConductor

Page 1: Introduction to BioConductor

Introduction to BioConductorIntroduction to BioConductor

Friday 23th nov 2007Friday 23th nov 2007Ståle Nygård ([email protected])Ståle Nygård ([email protected])

Course in Statistical methods and Statistical methods and bioinformatics for the analysis of microarray bioinformatics for the analysis of microarray

datadata

Page 2: Introduction to BioConductor

What is BioConductor?What is BioConductor?

An An open sourceopen source and and open developmentopen development software project software project for the analysis and comprehension of for the analysis and comprehension of genomic datagenomic data. .

Started in 2001. The Started in 2001. The core team is based primarily at the is based primarily at the Fred Hutchinson Cancer Research CenterFred Hutchinson Cancer Research Center. .

Is primarily based on the Is primarily based on the RR programming language. programming language. There are two releases of Bioconductor every year. In There are two releases of Bioconductor every year. In addition there are a large number of addition there are a large number of meta-data packagesmeta-data packages available, mainly, but not solely oriented towards available, mainly, but not solely oriented towards different types of microarrays. different types of microarrays.

Page 3: Introduction to BioConductor

Goals of the Bioconductor Goals of the Bioconductor ProjectProject

Provide access to a wide range of powerful statistical Provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data.and graphical methods for the analysis of genomic data.Facilitate the integration of biological metadata in the Facilitate the integration of biological metadata in the analysis of experimental data: e.g. literature data from analysis of experimental data: e.g. literature data from PubMed, annotation data from LocusLink.PubMed, annotation data from LocusLink.Allow the rapid development of extensible, scalable, and Allow the rapid development of extensible, scalable, and interoperable software.interoperable software.Promote high-quality documentaion and reproducible Promote high-quality documentaion and reproducible research research Provide training in computational and statistical methods Provide training in computational and statistical methods for the analysis of genomic data. for the analysis of genomic data.

Page 4: Introduction to BioConductor

Main features of the Bioconductor Main features of the Bioconductor ProjectProject

Use of RUse of R

Documentation and reproducible researchDocumentation and reproducible research

Statistical and graphical methodsStatistical and graphical methods

AnnotationAnnotation

Bioconductor short coursesBioconductor short courses

Open sourceOpen source

Open developmentOpen development

Page 5: Introduction to BioConductor

Use of RUse of R

R and the R package system are the main vehicles for R and the R package system are the main vehicles for designing and releasing software. designing and releasing software.

Page 6: Introduction to BioConductor

Documentation and reproducible Documentation and reproducible researchresearch

Each Each packagepackage contains at least one contains at least one vignettevignette, which is a , which is a document that provides a textual, task-oriented document that provides a textual, task-oriented description of the package's functionality and that can be description of the package's functionality and that can be used interactively. used interactively.

In the future: In the future: looking towards vignettes not specifically looking towards vignettes not specifically tied to a package, but rather demonstrating more tied to a package, but rather demonstrating more complex concepts.complex concepts.

Page 7: Introduction to BioConductor

Bioconductor FAQ: Bioconductor FAQ: http://www.bioconductor.org/docs/faq/index.html#Open%20sourcehttp://www.bioconductor.org/docs/faq/index.html#Open%20source

Book:Book:

Page 8: Introduction to BioConductor

Statistical and graphical Statistical and graphical methodsmethods

Bioconductor analysis packagesBioconductor analysis packages– Preprosessing Affymetrix and cDNA array dataPreprosessing Affymetrix and cDNA array data– Identifying differentially expressed genesIdentifying differentially expressed genes– Graph theoretical analysesGraph theoretical analyses– Plotting genomic dataPlotting genomic data

In addition, R itself provides iIn addition, R itself provides implementations for a broad range of mplementations for a broad range of state-of-the-art statistical and graphical techniques includingstate-of-the-art statistical and graphical techniques including– Linear and non-linear modelingLinear and non-linear modeling– Cluster analysis Cluster analysis – PredictionPrediction– ResamplingResampling– Survival analysisSurvival analysis– Time series analysisTime series analysis

(Screenshots: (Screenshots: http://www.bioconductor.org/whatisit/screenshots/)http://www.bioconductor.org/whatisit/screenshots/)

Page 9: Introduction to BioConductor

AnnotationAnnotation

Bioconductor project provides software for associating genomic data Bioconductor project provides software for associating genomic data in real time to biological metadata from web databases such as in real time to biological metadata from web databases such as GenBank, Locus Link, and PubmedGenBank, Locus Link, and Pubmed ((annotateannotate package). package).Provides functions for incorporating the results in HTML reports with Provides functions for incorporating the results in HTML reports with links to annotation www resourceslinks to annotation www resourcesProvides software tools for assembling and processing genomic Provides software tools for assembling and processing genomic annotation fannotation from databases such as GenBank, the Gene Ontology rom databases such as GenBank, the Gene Ontology Consortium, LocusLink, UniGene, the UCSC Human Genome Consortium, LocusLink, UniGene, the UCSC Human Genome Project (Project (AnnBuilderAnnBuilder package). package). Data packagesData packages are distributed to provide mappings between are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). Customized annotation libraries can also be assembled. Customized annotation libraries can also be assembled.

Page 10: Introduction to BioConductor

Bioconductor short coursesBioconductor short courses

The Bioconductor project has developed a program of The Bioconductor project has developed a program of short courses on software and statistical methods for the short courses on software and statistical methods for the analysis of genomic data. (course materials etc at: analysis of genomic data. (course materials etc at: http://www.bioconductor.org/services/workshops)http://www.bioconductor.org/services/workshops)

Page 11: Introduction to BioConductor

Open sourceOpen source

There are many different reasons why open-source software is There are many different reasons why open-source software is beneficial to the analysis of microarray data and to computational beneficial to the analysis of microarray data and to computational biology in general, because it biology in general, because it – facilitates full access to algorithms and their impementationfacilitates full access to algorithms and their impementation– enables to fix bugs and extend and improve the supplied software enables to fix bugs and extend and improve the supplied software – encourages good scientific computing and statistical practice by encourages good scientific computing and statistical practice by

providing appropriate tools and instructionproviding appropriate tools and instruction– provides a workbench of tools that allow researchers to explore and provides a workbench of tools that allow researchers to explore and

expand the methods used to analyze biological data expand the methods used to analyze biological data – ensures that the international scientific community is the owner of the ensures that the international scientific community is the owner of the

software tools needed to carry out research software tools needed to carry out research – leads and encourages commercial support and development of those leads and encourages commercial support and development of those

tools that are successful tools that are successful – promotes reproducible research by providing open and accessible tools promotes reproducible research by providing open and accessible tools

with which to carry out that research with which to carry out that research

Page 12: Introduction to BioConductor

Open developmentOpen development

Users are encouraged to become developers, either by Users are encouraged to become developers, either by contributing bioconductor compliant packages or contributing bioconductor compliant packages or documentation. documentation.

Page 13: Introduction to BioConductor

Installation of bioconductorInstallation of bioconductor

Install RInstall R

Install bioconductor packages:Install bioconductor packages:http://www.bioconductor.org/docs/install-howto.htmlhttp://www.bioconductor.org/docs/install-howto.html

Installation tailored for this course:Installation tailored for this course:

http://sfi.nr.no/sfi/index.php/Click_herehttp://sfi.nr.no/sfi/index.php/Click_here

To check if your packages really is installed type library().To check if your packages really is installed type library().