UCSC Genome Browser Workshop (pdf)

23
UCSC Genome Browser Workshop

Transcript of UCSC Genome Browser Workshop (pdf)

Page 1: UCSC Genome Browser Workshop (pdf)

UCSC Genome Browser Workshop

Page 2: UCSC Genome Browser Workshop (pdf)

Intro to the Browser

• Interactive website providing access to genome data from >45 species

• Multiple annotation datasets (“tracks”) available for each genome

– Include information on known genes, disease associations, variants, expression, regulation, conservation…

– Can search by gene/region/accession numbers or upload your own data

Page 3: UCSC Genome Browser Workshop (pdf)

Intro to the Site

• Also includes: – Information mining tools

(Table Browser)

– Fast sequence alignment (BLAT)

– Visualization of GWAS results (Genome Graphs)

– Gene grouping by shared features (e.g. tissue of expression, pathways, etc – Gene Sorter)

– Predicted amplicons given primers (in silico PCR)

Page 4: UCSC Genome Browser Workshop (pdf)

Brief Basics

• http://genome.ucsc.edu

• Select Genome Browser

• Option to select any genome/species (for simplicity we’ll use the default)

• In search term enter gene/region/accession# and click submit

Page 5: UCSC Genome Browser Workshop (pdf)

Pick a gene…

Example: GPR89A

Page 6: UCSC Genome Browser Workshop (pdf)

Interface:

Page 7: UCSC Genome Browser Workshop (pdf)

Gene Features Click “Base” in zoom bar to get to sequence level

Page 8: UCSC Genome Browser Workshop (pdf)

Track Highlights

• Click on gene name to acquire information on chemical interactions, haplotypes, related genes, expression and protein domains

• Mapping and Sequencing:

– Base position

– BAC end pairs: find BACs containing your gene

• Genes and Gene Predictions

– UCSC Genes (includes CCDS)

– Pfam (domains=functional regions of the protein)

– MCG Genes: use to order clones

Page 9: UCSC Genome Browser Workshop (pdf)

Track Highlights

• Phenotype and Literature: we’ll come back to this

• mRNA and EST: useful for identifying exons and transcript variants not provided in Genes track.

• Comparative Genomics

– Conservation (dark values/higher histograms = higher conservation scores)

– Primate/vertebrate alignments

Page 10: UCSC Genome Browser Workshop (pdf)

Track Highlights

• Regulation

– ENCODE regulation: shows multiple regulatory features

• Histone modifications

• Transcription factor binding sites

• DNaseI hypersensitivity sites

These all complement one another to highlight regions that are likely to be regulatory in nature

– CpG islands: associated with transcription start sites, often near promoters

– ENCODE methylation: gene regulation via silencing, enriched in regulatory regions

Page 11: UCSC Genome Browser Workshop (pdf)

Pick a region…

Example: 1q21.1

Page 12: UCSC Genome Browser Workshop (pdf)

Track Highlights

• Expression

– RNAseq expression data (example: Burge)

– Sestan Brain data

– Exon array expression data (skip for now but keep in mind for later lectures)

– Proteogenomics and peptide data (expressed proteins)

– qPCR pre-designed primers

Page 13: UCSC Genome Browser Workshop (pdf)

Track Highlights

• Variation – SNPs

– Structural Variation

– SNP/CNV arrays

• Repeats – Repeatmasker

– Segmental Duplications

• Clinically oriented track: Phenotypes and Literature – Decipher

– OMIM

Page 14: UCSC Genome Browser Workshop (pdf)

Other Site Functions

• BLAT: alignment tool – https://genome.ucsc.edu/cgi-bin/hgBlat

– Can paste sequence directly into search box, or upload a text file containing the sequence

– Example:

AGGGAGATGCAGAAGGCTGAAGAAAAGGAAGTCCCTGAGGACTCACTGGAGGAATGTGCCATCACTTGTTCAAATAGCCACGGCCCTTATGACTCCAACCAGCCTCACAGGAACACCAAAATCACATTTGAGGAAGACAAAGTCGACTCAACTCTGGTTGTAGA

Page 15: UCSC Genome Browser Workshop (pdf)

Other Site Functions

• In Silico PCR:

– Predicts amplicons based on defined primers

– http://genome.ucsc.edu/cgi-bin/hgPcr

– Example

• Left: GCCTTATTAGCATCCCAAGACAA

• Right: CCCTGAACAGCCTTTCCTTCT

Page 16: UCSC Genome Browser Workshop (pdf)

Other Site Functions

• Creating Custom Tracks (your data!):

– Annotation data can be in standard GFF format or bedgraph, GTF, PSL, BED, bigBed, WIG, bigWIG, BAM, VCF, MAF, BED detail, Personal Genome SNP, broadPeak, narrowPeak and microarray (BED15)

– Can upload files directly, type information into the track input, or link to URLs containing the data of interest

Page 17: UCSC Genome Browser Workshop (pdf)

Custom Tracks: Examples browser position chr22:20100000-20100900 track name=coords description="Chromosome coordinates list" visibility=2 chr22 20100000 20100100 chr22 20100011 20100200 chr22 20100215 20100400 chr22 20100350 20100500 chr22 20100700 20100800 chr22 20100700 20100900

browser position chr22:20100000-20140000 track name=spacer description="Blue ticks every 10000 bases" color=0,0,255, chr22 20100000 20100001 chr22 20110000 20110001 chr22 20120000 20120001 track name=even description="Red ticks every 100 bases, skip 100" color=255,0,0 chr22 20100000 20100100 first chr22 20100200 20100300 second chr22 20100400 20100500 third

browser position chr21:33,031,597-33,041,570 track type=bigBed name="bigBed Example One" description="A bigBed file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb

http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb

Page 18: UCSC Genome Browser Workshop (pdf)

Custom Tracks: Key Points

browser position chr22:10000000-10020000 browser hide all track name=clones description="Clones" visibility=2 color=0,128,0 useScore=1 url="http://genome.ucsc.edu/goldenPath/help/clones.html#$$" chr22 10000000 10004000 cloneA 960 chr22 10002000 10006000 cloneB 200 chr22 10005000 10009000 cloneC 700 chr22 10006000 10010000 cloneD 600 chr22 10011000 10015000 cloneE 300 chr22 10012000 10017000 cloneF 100

Browser Lines: how you instruct the browser to display innate features such as genome position and track visibility Basic format: browser attribute_name attribute_value Track Lines: how you instruct the browser to display features of your data such as name, color, file type, quality score, etc Basic format: attribute=value pair

Page 19: UCSC Genome Browser Workshop (pdf)

Other Site Functions: Genome Graphs

• Visualizes GWA data (SNP, linkage, etc)

• Your data in UCSC-readable format

— chromosome base: e.g. chr1 130000 (Note that the first base in a chromosome is considered position 0.) — STS Marker: e.g. RH75228 — dbSNP rsID: e.g. rs12345 — Affymetrix 500k Gene Chip: e.g. SNP_A-1780270 — Affymetrix Genome-Wide SNP Array 6: e.g. SNP_A-8575125 — Affymetrix SNP Array 6 Structural-Variation: e.g. CN_47396 — Illumina HumanHap300 Bead Chip: e.g. rs3934834 — Illumina HumanHap550 Bead Chip: e.g. rs3094315 — Illumina HumanHap650 Bead Chip: e.g. rs3094315 — Agilent CGH 244A: e.g. A_14_P112718

Page 20: UCSC Genome Browser Workshop (pdf)

Genome Graphs

• Example: ChIPSeq data

– Go to genome graphs, click upload and paste following into URL box:

• http://genome-test.cse.ucsc.edu/ABRF2010/chr21_extended.txt_redbin.sgr

• Key features: can set significance threshold, browse significant hits, gene sorter for information on genes

Page 21: UCSC Genome Browser Workshop (pdf)

Other Site Functions: Table Browser

• Retrieve data associated with tracks, intersect data, retrieve sequences and output data

• Go to ToolsTable Browser: – Group: “Genes and gene predictions”

– Table: “Known Genes”

– Click [paste list]

– Copy and paste the list at:http://genome-test.cse.ucsc.edu/~kuhn/workshops/ashg2014/genelist

– [get output]

– Select fields you want

Page 22: UCSC Genome Browser Workshop (pdf)

Other Site Functions

• Gene Sorter: sort genes by similarity measures, tissue expression features, etc

• VisiGene: find IHC/other imaging data on genes of interest

• Utilities: features liftOver tools, formatting optimization, code downloads

Page 23: UCSC Genome Browser Workshop (pdf)

Additional Resources

• http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html

• http://www.nature.com/scitable/ebooks/guide-to-the-ucsc-genome-browser-16569863/contents

• http://genomewiki.ucsc.edu/index.php/ABRF2010_Tutorial