Introduction to BioConductor
description
Transcript of Introduction to BioConductor
Introduction to BioConductorIntroduction to BioConductor
Friday 23th nov 2007Friday 23th nov 2007Ståle Nygård ([email protected])Ståle Nygård ([email protected])
Course in Statistical methods and Statistical methods and bioinformatics for the analysis of microarray bioinformatics for the analysis of microarray
datadata
What is BioConductor?What is BioConductor?
An An open sourceopen source and and open developmentopen development software project software project for the analysis and comprehension of for the analysis and comprehension of genomic datagenomic data. .
Started in 2001. The Started in 2001. The core team is based primarily at the is based primarily at the Fred Hutchinson Cancer Research CenterFred Hutchinson Cancer Research Center. .
Is primarily based on the Is primarily based on the RR programming language. programming language. There are two releases of Bioconductor every year. In There are two releases of Bioconductor every year. In addition there are a large number of addition there are a large number of meta-data packagesmeta-data packages available, mainly, but not solely oriented towards available, mainly, but not solely oriented towards different types of microarrays. different types of microarrays.
Goals of the Bioconductor Goals of the Bioconductor ProjectProject
Provide access to a wide range of powerful statistical Provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data.and graphical methods for the analysis of genomic data.Facilitate the integration of biological metadata in the Facilitate the integration of biological metadata in the analysis of experimental data: e.g. literature data from analysis of experimental data: e.g. literature data from PubMed, annotation data from LocusLink.PubMed, annotation data from LocusLink.Allow the rapid development of extensible, scalable, and Allow the rapid development of extensible, scalable, and interoperable software.interoperable software.Promote high-quality documentaion and reproducible Promote high-quality documentaion and reproducible research research Provide training in computational and statistical methods Provide training in computational and statistical methods for the analysis of genomic data. for the analysis of genomic data.
Main features of the Bioconductor Main features of the Bioconductor ProjectProject
Use of RUse of R
Documentation and reproducible researchDocumentation and reproducible research
Statistical and graphical methodsStatistical and graphical methods
AnnotationAnnotation
Bioconductor short coursesBioconductor short courses
Open sourceOpen source
Open developmentOpen development
Use of RUse of R
R and the R package system are the main vehicles for R and the R package system are the main vehicles for designing and releasing software. designing and releasing software.
Documentation and reproducible Documentation and reproducible researchresearch
Each Each packagepackage contains at least one contains at least one vignettevignette, which is a , which is a document that provides a textual, task-oriented document that provides a textual, task-oriented description of the package's functionality and that can be description of the package's functionality and that can be used interactively. used interactively.
In the future: In the future: looking towards vignettes not specifically looking towards vignettes not specifically tied to a package, but rather demonstrating more tied to a package, but rather demonstrating more complex concepts.complex concepts.
Bioconductor FAQ: Bioconductor FAQ: http://www.bioconductor.org/docs/faq/index.html#Open%20sourcehttp://www.bioconductor.org/docs/faq/index.html#Open%20source
Book:Book:
Statistical and graphical Statistical and graphical methodsmethods
Bioconductor analysis packagesBioconductor analysis packages– Preprosessing Affymetrix and cDNA array dataPreprosessing Affymetrix and cDNA array data– Identifying differentially expressed genesIdentifying differentially expressed genes– Graph theoretical analysesGraph theoretical analyses– Plotting genomic dataPlotting genomic data
In addition, R itself provides iIn addition, R itself provides implementations for a broad range of mplementations for a broad range of state-of-the-art statistical and graphical techniques includingstate-of-the-art statistical and graphical techniques including– Linear and non-linear modelingLinear and non-linear modeling– Cluster analysis Cluster analysis – PredictionPrediction– ResamplingResampling– Survival analysisSurvival analysis– Time series analysisTime series analysis
(Screenshots: (Screenshots: http://www.bioconductor.org/whatisit/screenshots/)http://www.bioconductor.org/whatisit/screenshots/)
AnnotationAnnotation
Bioconductor project provides software for associating genomic data Bioconductor project provides software for associating genomic data in real time to biological metadata from web databases such as in real time to biological metadata from web databases such as GenBank, Locus Link, and PubmedGenBank, Locus Link, and Pubmed ((annotateannotate package). package).Provides functions for incorporating the results in HTML reports with Provides functions for incorporating the results in HTML reports with links to annotation www resourceslinks to annotation www resourcesProvides software tools for assembling and processing genomic Provides software tools for assembling and processing genomic annotation fannotation from databases such as GenBank, the Gene Ontology rom databases such as GenBank, the Gene Ontology Consortium, LocusLink, UniGene, the UCSC Human Genome Consortium, LocusLink, UniGene, the UCSC Human Genome Project (Project (AnnBuilderAnnBuilder package). package). Data packagesData packages are distributed to provide mappings between are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). Customized annotation libraries can also be assembled. Customized annotation libraries can also be assembled.
Bioconductor short coursesBioconductor short courses
The Bioconductor project has developed a program of The Bioconductor project has developed a program of short courses on software and statistical methods for the short courses on software and statistical methods for the analysis of genomic data. (course materials etc at: analysis of genomic data. (course materials etc at: http://www.bioconductor.org/services/workshops)http://www.bioconductor.org/services/workshops)
Open sourceOpen source
There are many different reasons why open-source software is There are many different reasons why open-source software is beneficial to the analysis of microarray data and to computational beneficial to the analysis of microarray data and to computational biology in general, because it biology in general, because it – facilitates full access to algorithms and their impementationfacilitates full access to algorithms and their impementation– enables to fix bugs and extend and improve the supplied software enables to fix bugs and extend and improve the supplied software – encourages good scientific computing and statistical practice by encourages good scientific computing and statistical practice by
providing appropriate tools and instructionproviding appropriate tools and instruction– provides a workbench of tools that allow researchers to explore and provides a workbench of tools that allow researchers to explore and
expand the methods used to analyze biological data expand the methods used to analyze biological data – ensures that the international scientific community is the owner of the ensures that the international scientific community is the owner of the
software tools needed to carry out research software tools needed to carry out research – leads and encourages commercial support and development of those leads and encourages commercial support and development of those
tools that are successful tools that are successful – promotes reproducible research by providing open and accessible tools promotes reproducible research by providing open and accessible tools
with which to carry out that research with which to carry out that research
Open developmentOpen development
Users are encouraged to become developers, either by Users are encouraged to become developers, either by contributing bioconductor compliant packages or contributing bioconductor compliant packages or documentation. documentation.
Installation of bioconductorInstallation of bioconductor
Install RInstall R
Install bioconductor packages:Install bioconductor packages:http://www.bioconductor.org/docs/install-howto.htmlhttp://www.bioconductor.org/docs/install-howto.html
Installation tailored for this course:Installation tailored for this course:
http://sfi.nr.no/sfi/index.php/Click_herehttp://sfi.nr.no/sfi/index.php/Click_here
To check if your packages really is installed type library().To check if your packages really is installed type library().