Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from...

48
Genomic Analysis with Genome Browsers http://barc.wi.mit.edu/hot_topics/ 1

Transcript of Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from...

Page 1: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Genomic Analysis with Genome Browsers

http://barc.wi.mit.edu/hot_topics/

1

Page 2: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Outline

• Genome browsers overview • UCSC Genome Browser

– Navigating: View your list of regions in the browser – Available tracks (eg. ENCODE, GTEx, etc.) – Table Browser to download and annotate genome features – Convert coordinates/features between genomes – Use Public Hub to display tracks hosted at non-UCSC

servers. – Saving your session

• Ensembl Browser – Gene Tree – BioMart

2

Page 3: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Genome Browsing

• Facilitate genomic analysis in the context of genomic sequences, such as

– Alignments (eg. conservation)

– Experimental/Annotation Data (eg. TFBS)

• Commonly used browsers:

UCSC Genome Browser (UCSC)

EMBL-EBI Ensembl

NCBI Map Viewer

3 Cline, M.C, and Kent, J.W Understanding Genome Browsing Nat Biotech (2009)

Page 4: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

UCSC Genome Browser

• Created in 2001 • Browse almost 100 genomes (as of 2017) • Web-based and open source platform • Many datasets are hosted on UCSC • Visualize your own genome-mapped

datasets or custom data – save images as PDF/EPS for publication

• Easily download genome features with Table Browser

4

Page 5: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

UCSC Genome Browser: Home Page

5

Genomes* from: • Vertebrates • Deuterostomes • Insects • Nematodes • Yeast • Viruses

Detailed assembly info *UCSC Genome Browser currently doesn't host any plant genomes

Page 6: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

UCSC Genome Browser: genome.ucsc.edu Local Whitehead Mirror: membrane.wi.mit.edu

6

Genome Viewer or Display

Tracks (group of data)

Page 7: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

7

Enter position (eg. chr:start-end, chr:position) or search terms (eg. gene or SNP id): Auto-completes if it's an exact match, otherwise hit enter to get all matching terms

Choosing a Genome/Assembly

Page 8: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Navigation

8

Click: item description

Left

clic

k: t

rack

de

scri

pti

on

R

igh

t cl

ick:

tra

ck c

on

figu

rati

on

Drag track and move to new position

Move horizontally by 10%,50%,95%

Ideogram Zoom In/Out

Move end by X bases Move start by X bases

Page 9: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Navigation: Zoom-in with Drag-and-select

9

Drag/select at the top near chromosome position

Zoom-in to see base and amino acid

Page 10: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Browser Tracks

click links for detail information

10

The current genome build might have less tracks than the last version. For example: hg38 has considerably fewer tracks than hg19

Page 11: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Tracks*

• Mapping and Sequencing • Genes and Gene Predictions • Phenotype and Literature • mRNA and EST • Expression • Regulation • Comparative Genomics • Variation • Repeats

11 *Some tracks may not be available in your genome of interest

Page 12: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Track Settings and Description

12

left-click

Page 13: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Item description

13

left-click anywhere on item/feature

Page 14: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Configure: Image and Tracks

14

Page 15: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

15

Hide: the track is not displayed at all.

Dense: the track is displayed with all features collapsed into a single line.

Squish: the track is displayed with each annotation feature shown separately, but at 50% the height of full mode. Features are unlabeled.

Pack: the track is displayed with each annotation feature shown separately and labeled

Full: the track is displayed with each annotation feature on a separate line.

Annotation Track Display Mode

Page 16: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Gene Structure

16

5’UTR 3’UTR exon intron

Commonly used gene models • GENCODE • Ensembl • Refseq

Page 17: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Multi-region: "Slice" to view or highlight exons

17

multi-region

Page 18: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Demo and Exercise 1

• Finding your gene of interest

– number of transcripts

• Comparing RefSeq, Ensembl and GENCODE models

– EST evidence

• Codons and amino acid

18

Page 19: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

UCSC Tracks*

• Regulation - ENCODE

• Expression - GTEx

• Variation

• Comparative Genomics

19 *Incomplete listing

Page 20: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

ENCODE

• Encyclopedia of DNA Elements Identify functional genomic elements

DNA Elements: TF, histone modification, RNA binding protein, etc.

Various tissues and cell lines

• Extended to other model organisms (modENCODE) UCSC has human/mouse: hg19 contains more

data/tracks than hg38

20

Page 21: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

ENCODE track: use settings to select specific samples/datasets

21

Click to Get more info

Page 22: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

ENCODE track: configuration

22

Page 23: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

ENCODE: More Info genome.ucsc.edu/ENCODE

23

Page 24: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

ENCODE: search tracks

24

Page 25: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Demo and Exercise 2

• Use ENCODE data to look for TFBS, and gene expression (RNA-Seq) for your favorite gene

25

Page 26: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Expression*: GTEx

26

Tissues are color-coded left-click to configure track: select specific tissues Click to get all

expression in a boxplot

*Other expression tracks available as well, eg. RNASeq from ENCODE and array data

• Genotype-Tissue Expression (GTEx) project to study ~8600 tissues from ~600 (post-mortem) adults.

• Track is displayed by default in hg38, available in hg19 as well

Page 27: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Variation

• Variants from several resources available: dbSNP, 1000 Genomes Project and ExAC

27

dbSNP track color-coded: Blue: UTR or ncRNA variant Red: coding non-synonymous Green: coding synonymous

Page 28: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Comparative Genomics: Conservation

28

left-click to configure track: select species PhyloP basewise conservation: ranges

From -20 (least conserved) to 7.532 (most conserved); other options include PhastCons and GERP

• Vertebrate Multiz Alignment using 100 species

• Fly 27-way alignment

Page 29: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Add custom track

29

OR

track name='sample peaks' description='sample peaks'

track name='User Track' description='User Supplied Track'

File format: above links Or view in

http://genome.ucsc.edu/FAQ/FAQformat.html

WI UCSC Browser: membrane.wi.mit.edu URL for viewing files: http://tak.wi.mit.edu/solexa_ucsc/

URL

Page 30: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

File Formats

• BED0

bigBed

• GFF/GTF1

• WIG

bedGraph

bigWig

• BAM

30

0 zero-based start coordinate 1 one-based start coordinate

BED: regions Enriched Chip-seq signal for TF binding

Wig(gle): continues signal Chip-seq signal

BAM: alignment of reads RNA-seq alignment

Page 31: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

UCSC Genome Browser: Tools

• Blat

• Table Browser

• Gene Sorter

• LiftOver

31

Page 32: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

32

Tools: Table Browser

Page 33: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Demo and exercise 3

• Load peaks (bed format) derived from ChIP-seq: • GM12878 H3K36me3 Histone Mods by ChIP-seq Peaks

from ENCODE/Broad

• To save time, only peaks in chr22

• Identify the Refseq genes that could be regulated by H3K36me3 – Go to the Table Browser:

– Choose RefGene table

– Intersect with the above uploaded track

33

Page 34: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Tools: LiftOver Genome/Build Conversion

34

Save to file

Page 35: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

35

LiftOver: Genome/Build Conversion – Multiple Regions

Page 36: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Tools: blat align sequences

36

view alignment in browser

Page 37: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Tools: Gene Sorter*

37

Enter gene of interest Sort by criteria: eg. similarity, gene expression

*Not available for all genomes

Page 38: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Track Data Hubs

38

Page 39: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Share/Save Session

39

Page 40: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Ensembl ensembl.org

• Website was launched in 2000

• Joint project by EMBL-EBI and Wellcome Trust

• Current Ensembl release: 89 (May 2017)

• Not just a browser - an automated genome annotation pipeline – includes hand-curated gene annotation (eg. Vega and

Havana)

• BioMart allows easy access to (download) most of the Ensembl data

40

Page 41: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Ensembl Browser

41

Select species or genome

Enter gene

Page 42: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Ensembl: Gene Tree

42

Page 43: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

BioMart

43

Choose Database: • Ensembl Genes • Mouse Strains • Ensembl Variation • Ensembl Regulation

Select Genome

Page 44: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

BioMart: Filter for Input

44

Filter by • Region • Gene • Phenotype • GO • Multi species

comparison • Protein

domains/fam • Variant

Page 45: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

BioMart: Attributes for Output

45

Page 46: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

BioMart: Attributes for Output

46

Page 47: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

Practical Tips

• Take careful notes of genome assembly (eg. hg19 vs hg38) for – coordinates – custom browser files

• Genome assembly are usually updated every several years, however, annotation may be updated frequently (eg. RefSeq)

• Data displayed, especially on UCSC Genome Browser, may be generated from other sources (click on track description for more info or citation)

• Try different genome browsers based on your questions and needs – turning-on all available tracks/data will be too noisy: use

relevant tracks only

47

Page 48: Genomic Analysis with Genome Browsersbarc.wi.mit.edu/.../GenomeBrowsers.pdf · be generated from other sources (click on track description for more info or citation) •Try different

More Information • UCSC Browser Tutorials

https://genome.ucsc.edu/training/index.html Free Videos: https://genome.ucsc.edu/training/vids/index.html Open Helix: http://www.openhelix.com/ucsc

• FAQ: https://genome.ucsc.edu/FAQ/

• Genomewiki: http://genomewiki.ucsc.edu/index.php/Main_Page

• Kent Utils (command-line tools – already installed on tak): http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER

• Ensembl Tutorials: http://www.ensembl.org/info/website/tutorials/index.html

48