Discoveringggy biology through genomicsbarc.wi.mit.edu/...2013/...Apr_2013.color_slides.pdfIGV: Demo...
Transcript of Discoveringggy biology through genomicsbarc.wi.mit.edu/...2013/...Apr_2013.color_slides.pdfIGV: Demo...
Genome browsers:
Discovering biologyg gythrough genomics
BaRC Hot Topics April 2013BaRC Hot Topics – April 2013
George Bell, Ph.D.http://jura.wi.mit.edu/bio/education/hot_topics/
Today's outlineToday s outlineG b i t d ti• Genome browser introduction
P l t f b• Popular types of genome browsers– UCSC Genome Browser
Integrative Genomics Viewer (IGV)– Integrative Genomics Viewer (IGV)– Ensembl– Gbrowse (SGD, FlyBase, WormBase, TAIR, ZFIN,Gbrowse (SGD, FlyBase, WormBase, TAIR, ZFIN,
HapMap, Planarian at Whitehead )
f f f• Browser file formats for custom data tracks
• Throughout the talk: Mining the genome2
ResourcesResources
• Genome browser tutorial materials– http://genome ucsc edu/training htmlhttp://genome.ucsc.edu/training.html– http://www.broadinstitute.org/igv/UserGuide– http://www.ensembl.org/info/website/tutorials/index.html– http://gmod.org/wiki/GBrowse
• Browser file formats: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks
• Previous Hot Topics (http://jura.wi.mit.edu/bio/education/hot_topics/)
• OpenHelix training materials (some free)– http://www.openhelix.com/cgi/freeTutorials.cgi
• BaRC scientists3
Genome browser componentsGenome browser components
• Genome sequence (partially or fully assembled)• Graphics + data browsing/searching system• Graphics + data browsing/searching system• Collection of data (qualitative and quantitative)
li k d tlinked to – genome coordinates– genome features linked to genome coordinates
• System to view custom data• Algorithm to align sequences to genome
4
Practical hintsPractical hints
• Take careful notes of genome assembly for– All coordinatesAll coordinates– All custom browser files
• Genome is updated infrequently• Genome is updated infrequently• Data in genome browser can be updated as
often as dailoften as daily• Data displayed in genome browser is often
generated by others• Try out different genome browsers
5
UCSC Genome BrowserUCSC Genome Browser
6
UCSC: Demo and exercise 1UCSC: Demo and exercise 1
• Does the RefSeq gene catalog contain the correct isoforms of your favorite human gene?
• Provide evidence from primary sequenceProvide evidence from primary sequence
Examples: WASH2P BMP4• Examples: WASH2P, BMP4
7
UCSC: Demo and exercise 2UCSC: Demo and exercise 2
• Get the promoter of your favorite gene (defined as 2kb upstream to 2kb downstream of the transcription start site)
• Examples: BMP4, SERPIND1
• According to ENCODE, do any transcription factors bind this promoter?factors bind this promoter?
8
Integrative Genomics Viewer (IGV)Integrative Genomics Viewer (IGV)
9
IGV: Demo and exercise 3IGV: Demo and exercise 3
• Using the Illumina Body Map RNA-Seq data on IGV,
– Is GATA4 really expressed at a higher level in heart y p gthan in skeletal muscle?
– Why isn't this comparison of mapped reads quantitative?
10
IGV: Demo and exercise 4IGV: Demo and exercise 4
• Using the Illumina Body Map RNA-Seq data on IGV,
– Does the heart subject have any variants in GATA4? j yWhere?
– Center the variant(s) in the display, zoom in all the way, and save that view as a session.
– Beyond IGV: Is this variant a known SNP?
11
Ensembl: more than a browserEnsembl: more than a browser
• An automated genome annotation pipeline• Includes thorough homology analysis via g gy y
Compara• Hosts hand-curated gene annotation projectsHosts hand curated gene annotation projects
(Vega; Havana)• All data can be downloaded in a variety of ways• All data can be downloaded in a variety of ways• BioMart is a powerful web interface to the
Ensembl databasesEnsembl databases
12
Ensembl gene pagesEnsembl gene pages
13
Ensembl: Demo and exercise 5Ensembl: Demo and exercise 5
• Go to the Ensembl page for mouse Uox (urate oxidase)
• Download Uox homologs (in fasta format) fromDownload Uox homologs (in fasta format) from as many species as possible
• Is this gene missing in any primates?
14
Ensembl: Demo and exercise 6Ensembl: Demo and exercise 6
• Use BioMart to get a list of all human genes on chromosome 1 and corresponding mouse p ghomologs
15
Gbrowse (many MODs)Gbrowse (many MODs)
16
GBrowse: Demo and exercise 7GBrowse: Demo and exercise 7G t TAIR (Th A bid i I f ti R )• Go to TAIR (The Arabidopsis Information Resource)
Fi d Gb ( d T l )• Find Gbrowse (under Tools)
• Find gene AT2G19420
• What non-coding gene overlaps it?
• Download a GFF file of these genes and view it in Excel.
17
Viewing custom dataViewing custom data
• About any data can be viewed in a genome browser as long as it is– Linked to genome coordinates– Organized in a standard format that is
• qualitative (ex: bed, bam), or• quantitative (ex: wig, bedgraph)
ff f ff• Different formats using different counting schemes (starting at 0 or 1) so off-by-one bugs
kare easy to make• BAM files need to be sorted and indexed first
18
Demo and exercise 8Demo and exercise 8Go to UCSC (http://membrane wi mit edu/ WI only) or• Go to UCSC (http://membrane.wi.mit.edu/ - WI only) or IGV
• Locate track files in \\BaRC_Public\Hot_Topics\Genome_browsers_Apr_2013
• Add the 4 tracks to the browser (mm9)Add the 4 tracks to the browser (mm9)– TargetScanMouse6_mm9.chr3.bed– TargetScanMouse6_mm9.chr3.bedgraph
CGH mm9 chr3 4 wig– CGH.mm9.chr3-4.wig– track type=bam name="Heart BAM"
bigDataUrl=http://tak.wi.mit.edu/barc_ucsc/Hot_Topics/Genome_browsers Apr 2013/HeartCellRNASeq bambrowsers_Apr_2013/HeartCellRNASeq.bam
• Look at some chr3 genes (ex: Pfn2, Serp1, Ssr3, Hdgf)• Optimize the display modes of your custom tracks
19
Other notable browsersOther notable browsers
JBrowse• JBrowse• Golden Helix GenomeBrowse• WashU Epigenome Browser• UCSC Cancer Genome BrowserUCSC Cancer Genome Browser• 1000 Genomes Browser
20
SummarySummaryG b i t d ti• Genome browser introduction
• Popular types of genome browsers– UCSC Genome Browser– Integrative Genomics Viewer (IGV)– Ensembl– Gbrowse (SGD, FlyBase, WormBase, TAIR, ZFIN,
Planarian at Whitehead )
• Browser file formats for custom data tracks
21
Browser locationsBrowser locations
• UCSC Genome Browser:– http://genome.ucsc.edu/
htt // b i it d / (i id Whit h d t k)– http://membrane.wi.mit.edu/ (inside Whitehead network)
• IGV: http://www.broadinstitute.org/software/igv/download
• Ensembl: http://www.ensembl.org/
• Gbrowse: http://gmod.org/wiki/GBrowsep g g
22