Variation structure

17
Variation structure Gabor T. Marth Department of Biology, Boston College [email protected] BI420 – Introduction to Bioinformatics

description

BI420 – Introduction to Bioinformatics. Variation structure. Gabor T. Marth. Department of Biology, Boston College [email protected]. Human variation structure is heterogeneous. chromosomal averages. polymorphism density along chromosomes. marker density. “dense”. “sparse”. allele frequency. - PowerPoint PPT Presentation

Transcript of Variation structure

Page 1: Variation structure

Variation structure

Gabor T. Marth

Department of Biology, Boston [email protected]

BI420 – Introduction to Bioinformatics

Page 2: Variation structure

Human variation structure is heterogeneous

chromosomal averages

polymorphism density along chromosomes

Page 3: Variation structure

Heterogeneity at the level of distributions

0.0

0

5.0

0

10

.00

15

.00

20

.00

25

.00

30

.00

35

.00

40

.00

4 kb

8 kb

12 kb

16 kb0

0.1

0.2

0.3

0.4

“sparse” “dense”

marker density

“rare” “common”

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

allele frequenc

y

Page 4: Variation structure

What explains nucleotide diversity?

5

6

7

8

30 33 36 39 42 45 48 51 54

G+C Content [%]

SN

P R

ate

[per

10,

000

bp

]

5

6

7

8

0.3 1.2 2.1 3 3.9 4.8 5.7

CpG Content [%]

SN

P R

ate

[p

er

10,0

00 b

p]

G+C nucleotide content

CpG di-nucleotide content

5

6

7

8

9

10

0 0.5 1 1.5 2 2.5 3 3.5 4

Recombination rate [per Mb]

SN

P R

ate

[per

10,

000

bp

] recombination rate

functional constraints

3’ UTR 5.00 x 10-4

5’ UTR 4.95 x 10-4

Exon, overall 4.20 x 10-4

Exon, coding 3.77 x 10-4

synonymous 366 / 653non-synonymous 287 / 653

Variance is so high that these quantities are poor predictors of nucleotide diversity in local regions hence random processes are likely to govern the basic shape of the genome variation landscape (random) genetic drift

Page 5: Variation structure

Components of drift: Genealogy

present generation

randomly mating population, genealogy evolves in a non-deterministic fashion

Page 6: Variation structure

Components of drift: Mutation

mutation randomly “drift”: die out, go to higher frequency or get fixed

Page 7: Variation structure

Modulators: Changing population size

mutation randomly “drift”: die out, go to higher frequency or get fixed

genetic bottleneck

Page 8: Variation structure

Modulators: Population subdivision

subdivision

subdivision promotes private polymorphisms, and skews allele frequency

Page 9: Variation structure

Modulators: Recombination

accgttatgcaga acagttatgtaga

acagttatgcaga

accgttatgtagaaccgttatgcaga acagttatgtaga

recombination

different nucleotide sites within the same DNA segment no longer share the same genealogy

Page 10: Variation structure

Modulators: Natural selection

negative (purifying) selection

positive selection

the genealogy is no longer independent of (and hence cannot be decoupled from) the mutation process

Page 11: Variation structure

Modeling ancestral processes

“forward simulations” the “Coalescent” process

By focusing on a small sample, complexity of the relevant part of the ancestral process is greatly reduced. There are,

however, limitations.

Page 12: Variation structure

Inferences from variation data

larger population size (N) -> more mutations -> higher diversity (θ)

larger mutation rate (μ) -> more mutations -> higher diversity (θ)

higher diversity -> larger population size OR higher mutation rate(θ = 4Nμ)

Page 13: Variation structure

Ancestral inference: modeling

past

present

stationary expansioncollapse

MD(simulation)

AFS(direct form)

histo

ry

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 100

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

bottleneck

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

Page 14: Variation structure

Ancestral inference: model fitting

0

0.05

0.1

0.15

1 2 3 4 5 6 7 8 9 10

minor allele count

bottleneckmodest but

uninterrupted expansion

Page 15: Variation structure

Allelic association

accgttatgcaga

acagttatgtaga

acagttatgcaga

accgttatgtaga

possible allele combinations (2-marker

haplotypes)

higher recombination rate

(r)

Page 16: Variation structure

Allelic association: LD

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.81E-6

1E-5

1E-4

1E-3

0.01

0.1

1

10

100

1000

Recom

bin

ation F

raction

r2

European Asian

African American

Dis

tance (k

b)

measure of allelic association: “linkage disequilibrium (LD)”

Page 17: Variation structure

Haplotype structure

“haplotype block”