Michael 2011 japan

66
A Grammar of Graphics for Genomics The ggbio Package Michael Lawrence Genentech August 29, 2012 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18

Transcript of Michael 2011 japan

Page 1: Michael 2011 japan

A Grammar of Graphics for GenomicsThe ggbio Package

Michael Lawrence

Genentech

August 29, 2012

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18

Page 2: Michael 2011 japan

Outline

1 Motivation

2 High-level Plots

3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 2 / 18

Page 3: Michael 2011 japan

Outline

1 Motivation

2 High-level Plots

3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 3 / 18

Page 4: Michael 2011 japan

Data on the Genome

• Comes in two flavors:• Annotations (genes, TF binding sites, ...)• Experimental measurements (sequence reads)

• Both types are tied to genomic coordinates, providing a common axisthat permits cross-dataset comparison and inference

• Typically stored as a table, with the range as a fundamental variabletype, plus metadata

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

Page 5: Michael 2011 japan

Data on the Genome

• Comes in two flavors:• Annotations (genes, TF binding sites, ...)• Experimental measurements (sequence reads)

• Both types are tied to genomic coordinates, providing a common axisthat permits cross-dataset comparison and inference

• Typically stored as a table, with the range as a fundamental variabletype, plus metadata

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

Page 6: Michael 2011 japan

Data on the Genome

• Comes in two flavors:• Annotations (genes, TF binding sites, ...)• Experimental measurements (sequence reads)

• Both types are tied to genomic coordinates, providing a common axisthat permits cross-dataset comparison and inference

• Typically stored as a table, with the range as a fundamental variabletype, plus metadata

0

10

20

30

40

50

60

120928000 120930000 120932000 120934000 120936000 120938000

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

Page 7: Michael 2011 japan

Data on the Genome

• Comes in two flavors:• Annotations (genes, TF binding sites, ...)• Experimental measurements (sequence reads)

• Both types are tied to genomic coordinates, providing a common axisthat permits cross-dataset comparison and inference

• Typically stored as a table, with the range as a fundamental variabletype, plus metadata

0

10

20

30

40

50

60

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

Page 8: Michael 2011 japan

Data on the Genome

• Comes in two flavors:• Annotations (genes, TF binding sites, ...)• Experimental measurements (sequence reads)

• Both types are tied to genomic coordinates, providing a common axisthat permits cross-dataset comparison and inference

• Typically stored as a table, with the range as a fundamental variabletype, plus metadata

seqnames start end strand exon id tx id10 120927215 120928045 - 129230 14886,1488710 120928689 120928854 - 129229 14886,1488710 120931894 120931997 - 129228 14886,1488710 120933249 120933384 - 129227 14886,1488710 120933963 120934069 - 129226 1488610 120933963 120934104 - 119757 1488710 120936533 120936665 - 119756 1488710 120936552 120936665 - 129225 1488610 120938267 120938345 - 129224 14886,14887

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

Page 9: Michael 2011 japan

ChallengesBig data, wide spaces

• Need summaries that are efficiently computed, communicate morewith less and expose the most interesting aspects of the data

• Need different ways of viewing the data, depending on the density andscale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

Page 10: Michael 2011 japan

ChallengesBig data, wide spaces

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

• Need summaries that are efficiently computed, communicate morewith less and expose the most interesting aspects of the data

• Need different ways of viewing the data, depending on the density andscale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

Page 11: Michael 2011 japan

ChallengesBig data, wide spaces

0

10

20

30

40

50

60

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

• Need summaries that are efficiently computed, communicate morewith less and expose the most interesting aspects of the data

• Need different ways of viewing the data, depending on the density andscale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

Page 12: Michael 2011 japan

ChallengesBig data, wide spaces

0

50000

100000

150000

200000

250000

300000

0 Mb 50 Mb 100 Mb

• Need summaries that are efficiently computed, communicate morewith less and expose the most interesting aspects of the data

• Need different ways of viewing the data, depending on the density andscale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

Page 13: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 14: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 15: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 16: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 17: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 18: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 19: Michael 2011 japan

Existing Tools

UCSC IGB IGV Circos GViz

Limitations

• Limited to one type of view (linear or circular)

• Not tightly integrated with an analysis environment throughstandard, abstract data structures (except GViz)

• No low-level toolkit for prototyping new types of graphics

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

Page 20: Michael 2011 japan

Grammars of Graphics

• A grammar of graphics is alanguage for expressing plots

• Graphics are constructedthrough the combination ofvarious types of primitives;like legos for graphics

• The most prominentgrammar was introduced byWilkinson’s book TheGrammar of Graphics

• Wilkinson’s grammar wasextended by Wickham andthe ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

Page 21: Michael 2011 japan

Grammars of Graphics

• A grammar of graphics is alanguage for expressing plots

• Graphics are constructedthrough the combination ofvarious types of primitives;like legos for graphics

• The most prominentgrammar was introduced byWilkinson’s book TheGrammar of Graphics

• Wilkinson’s grammar wasextended by Wickham andthe ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

Page 22: Michael 2011 japan

Grammars of Graphics

• A grammar of graphics is alanguage for expressing plots

• Graphics are constructedthrough the combination ofvarious types of primitives;like legos for graphics

• The most prominentgrammar was introduced byWilkinson’s book TheGrammar of Graphics

• Wilkinson’s grammar wasextended by Wickham andthe ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

Page 23: Michael 2011 japan

Grammars of Graphics

• A grammar of graphics is alanguage for expressing plots

• Graphics are constructedthrough the combination ofvarious types of primitives;like legos for graphics

• The most prominentgrammar was introduced byWilkinson’s book TheGrammar of Graphics

• Wilkinson’s grammar wasextended by Wickham andthe ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

Page 24: Michael 2011 japan

The ggbio Package

• An R/Bioconductor package that extends the Wilkinson/Wickhamgrammar for applications in genomics

• Integrated with Bioconductor• Operates on standard, abstract genomic data structures• Leverages efficient range-based algorithms

• Programming interface has two levels of abstraction:

autoplot Maps Bioconductor data structures to plotsgrammar Mix and match to create custom plots

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 8 / 18

Page 25: Michael 2011 japan

Outline

1 Motivation

2 High-level Plots

3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 9 / 18

Page 26: Michael 2011 japan

Basic Plots

Gene Structures Read Alignments Sequence Multiple

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Page 27: Michael 2011 japan

Basic Plots

Gene Structures Read Alignments Sequence Multiple

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Page 28: Michael 2011 japan

Basic Plots

Gene Structures Read Alignments Sequence Multiple

0

10

20

30

40

50

60

120928000 120930000 120932000 120934000 120936000 120938000

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Page 29: Michael 2011 japan

Basic Plots

Gene Structures Read Alignments Sequence Multiple

120928700 120928750 120928800 120928850

A

C

G

T

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Page 30: Michael 2011 japan

Basic Plots

Gene Structures Read Alignments Sequence Multiple

CGTAGGAGAATCCGGTGTCCAGTTCGCTGGGCAGACTTCTCCATGTGTTT

120928690 120928700 120928710 120928720 120928730 120928740

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Page 31: Michael 2011 japan

Basic Plots

Gene Structures Read Alignments Sequence Multiple

0

10

20

30

40

50

60

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Page 32: Michael 2011 japan

Overview Plots

Grand Linear Karyogram Circular

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

Page 33: Michael 2011 japan

Overview Plots

Grand Linear Karyogram Circular

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

Page 34: Michael 2011 japan

Overview Plots

Grand Linear Karyogram Circular

12

34

56

78

910

1112

1314

1516

1718

1920

2122

X

5.0e+07 1.0e+08 1.5e+08 2.0e+08

seqReg

Exon

Intron

Other

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

Page 35: Michael 2011 japan

Overview Plots

Grand Linear Karyogram Circular

●●

●●

●●

0M 50M

100M

150M

200M

0M

50M

100M

150M

200M

0M

50M

100M

150M

0M

50M

100M

150M

0M

50M

100M

150M

0M

50M

100M

150M

0M

50M

100M150M0M50M100M

0M

50M

100M

0M

50M

100M

0M

50M

100M

0M

50M

100M

0M

50M

100M

0M

50M

100M

0M

50M

100M

0M

50M

0M

50M

0M

50M

0M

50M 0M

50M 0M

0M 50M

1

2

3

45

6

7

8910

11

12

1314

1516

17

18

1920

21 22

rearrangements

interchromosomal

intrachromosomal

tumreads●

4

6

8

10

12

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

Page 36: Michael 2011 japan

Specialized Plots

Mismatch summary + VCF Edge-linked Intervals

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18

Page 37: Michael 2011 japan

Specialized Plots

Mismatch summary + VCF Edge-linked Intervals

0

5

10

15

Cou

nts

read

A

C

G

T

T

A T T A A G A A A G T A C C G T G T G A C A T C A C A G G C T G G G A G C T T G A G A G

25235720 25235725 25235730 25235735 25235740 25235745 25235750 25235755

mis

mat

chsn

pre

fere

nce

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18

Page 38: Michael 2011 japan

Specialized Plots

Mismatch summary + VCF Edge-linked IntervalsE

xpre

ssio

n

200

400

600

800

1000

group

GM12878

K562

0

uc002rau.2

uc010yjg.1

uc002rav.2

uc010yjh.1

uc002raw.2

10930000 10940000 10950000 10960000 10970000 10980000

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18

Page 39: Michael 2011 japan

Outline

1 Motivation

2 High-level Plots

3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 13 / 18

Page 40: Michael 2011 japan

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data

Stat Transforms the data before plotting

Scale Maps data to geom aesthetics, guides like legends and axes

Coord Maps from geom space to device space

Facet Small multiples of data subsets (trellis)Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

Page 41: Michael 2011 japan

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data

Stat Transforms the data before plotting

Scale Maps data to geom aesthetics, guides like legends and axes

Coord Maps from geom space to device space

Facet Small multiples of data subsets (trellis)Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

Page 42: Michael 2011 japan

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data

Stat Transforms the data before plotting

Scale Maps data to geom aesthetics, guides like legends and axes

Coord Maps from geom space to device space

Facet Small multiples of data subsets (trellis)Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

Page 43: Michael 2011 japan

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data

Stat Transforms the data before plotting

Scale Maps data to geom aesthetics, guides like legends and axes

Coord Maps from geom space to device space

Facet Small multiples of data subsets (trellis)Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

Page 44: Michael 2011 japan

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data

Stat Transforms the data before plotting

Scale Maps data to geom aesthetics, guides like legends and axes

Coord Maps from geom space to device space

Facet Small multiples of data subsets (trellis)Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

Page 45: Michael 2011 japan

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data

Stat Transforms the data before plotting

Scale Maps data to geom aesthetics, guides like legends and axes

Coord Maps from geom space to device space

Facet Small multiples of data subsets (trellis)Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

Page 46: Michael 2011 japan

A Grammar of Graphics for GenomicsExtensions are marked in red

1

2

48.245 Mb 48.250 Mb 48.255 Mb 48.260 Mb 48.265 Mb 48.270 Mb

strand

+

statistical transformation:

geometric object: chevron

geometric object:alignment

Y scale: discrete from stepping

geometric object: rect

stepping

X scale: sequence

color scale: discretefrom strand

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 15 / 18

Page 47: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 48: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

NM_006793(GeneID:10935)

NM_014098(GeneID:10935)

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 49: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

NM_006793(GeneID:10935)

NM_014098(GeneID:10935)

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 50: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

NM_006793(GeneID:10935)

NM_014098(GeneID:10935)

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 51: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

NM_006793(GeneID:10935)

NM_014098(GeneID:10935)

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 52: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

NM_006793(GeneID:10935)

NM_014098(GeneID:10935)

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 53: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

orig

inal

redu

ced

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 54: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

step

ping

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 55: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

0

10

20

30

40

50

60

120928000 120930000 120932000 120934000 120936000 120938000

Cov

erag

e

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 56: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

0

10

20

30

40

50

60

120928000 120930000 120932000 120934000 120936000 120938000

Cou

nts

A

C

G

T

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 57: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

0

10

20

30

40

50

60

Cou

nts

A

C

G

T

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 58: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

orig

inal

trun

cate

d

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 59: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

0

500

1000

1500

2000

0

500

1000

1500

2000

normaltumor

score

500

1000

1500

novel

FALSE

TRUE

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 60: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

10 11

0e+00

2e+05

4e+05

6e+05

0e+00 5e+07 1e+08 0e+00 5e+07 1e+08

Cov

erag

e

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 61: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

0e+00

2e+05

4e+05

6e+05

10 11

Cov

erag

e

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 62: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

50

100

150

200

1 2 3 4 5 6Samples

Fea

ture

s

−5000

0

5000

value

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 63: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

chr10 chr10

chr1

0

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 64: Michael 2011 japan

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrectStat: gene reduce stepping coverage mismatch tableScale: sequence genome fold-change giemsaCoord: truncate-gapsLayout: tracks range-facet

chr10 chr10

chr1

0

0

10

20

30

40

50

60

Cou

nts

A

C

G

T

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

Page 65: Michael 2011 japan

Summary

• The ggbio package is a toolkit for plotting genomic data andannotations

• Available as part of the Bioconductor project

• Easy to use and flexible enough to handle the diverse use casesencountered in genomics

• Useful plots are automatically generated from Bioconductor datastructures using reasonable defaults

• New types of plots can be constructed from grammar primitivesspecially designed for genomics

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 17 / 18

Page 66: Michael 2011 japan

Acknowledgements

Tengfei YinDi Cook

Robert Gentleman

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 18 / 18