SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture...

28
SS 2008 lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3 ... - What are Tandem repeats? - How does one find CpG islands? - What are Gardiner-Frommer and Takai-Jones parameters? - Why do we need t-tests? – - What are the findings of (Hutter et al. 2006)?
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture...

Page 1: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis1

V4 Genome of Arabidopsis thaliana

Review of lecture V3 ...

- What are Tandem repeats?

- How does one find CpG islands?

- What are Gardiner-Frommer and Takai-Jones parameters?- Why do we need t-tests? –- What are the findings of (Hutter et al. 2006)?

Page 2: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis2

Arabidopsis thaliana

Arabidopsis thaliana is a small

flowering plant that is widely

used as a model organism in

plant biology.

Arabidopsis is a member of the

mustard (Brassicaceae) family,

which includes cultivated

species such as cabbage and

radish.

Arabidopsis is not of major

agronomic significance, but it

offers important advantages for

basic research in genetics and

molecular biology.

TAIR

Page 3: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis3

Some useful statistics for Arabidopsis thaliana

– Small genome (114.5 Mb/125 Mb total) has been sequenced in the year 2000.

– Extensive genetic and physical maps of all 5 chromosomes.

– A rapid life cycle (about 6 weeks from germination to mature seed).

– Prolific seed production and easy cultivation in restricted space.

– Efficient transformation methods utilizing Agrobacterium tumefaciens.

– A large number of mutant lines and genomic resources many of which are available from Stock Centers.

– Multinational research community of academic, government and industry laboratories.

• Such advantages have made Arabidopsis a model organism for studies of the cellular and molecular biology of flowering plants.TAIR collects and makes available the information arising from these efforts.

TAIR

Page 4: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis4

Arabidopsis thaliana genome sequenceRepresentation of the Arabidopsis

chromosomes. Sequenced portions are

red, telomeric and centromeric regions

are light blue, heterochromatic knobs are

shown black and the rDNA repeat regions

are magenta.

Left: DAPI-stained chromosomes.

Gene density (`Genes') ranged from 38

per 100 kb to 1 gene per 100 kb;

expressed sequence tag matches

(`ESTs') ranged from more than 200 per

100 kb to 1 per 100 kb.

Transposable element densities (`TEs')

ranged from 33 per 100 kb to 1 per 100

kb.

Mitochondrial and chloroplast insertions

(`MT/CP') were assigned black and green

tick marks, respectively.

Transfer RNAs and small nucleolar RNAs

(`RNAs') were assigned black and red

ticks marks, respectively. Nature 408, 796 (2000)

Page 5: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis5

Arabidopsis thaliana genome sequence

Nature 408, 796 (2000)

The proportion of Arabidopsis proteins having related counterparts in eukaryotic genomes varies by a factor of 2 to 3 depending on the functional category. Only 8 ± 23% of Arabidopsis proteins involved in transcription have related genes in other eukaryotic genomes, reflecting the independent evolution of many plant transcription factors. In contrast, 48 ± 60% of genes involved in protein synthesis have counterparts in the other eukaryotic genomes, reflecting highly conserved gene functions. The relatively high proportion of matches between Arabidopsis and bacterial proteins in the categories `metabolism' and `energy' reflects both the acquisition of bacterial genes from the ancestor of the plastid and high conservation of sequences across all species. Finally, a comparison between unicellular and multicellular eukaryotes indicates that Arabidopsis genes involved in cellular communication and signal transduction have more counterparts in multicellular eukaryotes than in yeast, reflecting the need for sets of genes for communication in multicellular organisms.

Page 6: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis6

Many genes were duplicated

Nature 408, 796 (2000)

Page 7: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis7

Segmental duplication

Nature 408, 796 (2000)

Segmentally duplicated regions in the Arabidopsis genome.

Individual chromosomes are depicted as horizontal grey bars (with chromosome 1

at the top), centromeres are marked black.

Coloured bands connect corresponding duplicated segments. Similarity between

the rDNA repeats are excluded. Duplicated segments in reversed orientation are

connected with twisted coloured bands.

Page 8: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis8

Membrane channels and transporters

Nature 408, 796 (2000)

Transporters in the plasma and intracellular membranes of Arabidopsis are responsible for the acquisition, redistribution and compartmentalization of organic nutrients and inorganic ions, as well as for the efflux of toxic compounds and metabolic end products, energy and signal transduction. Unlike animals, which use a sodium ion P-type ATPasepump to generate an electrochemical gradient across the plasma membrane, plants and fungi use a proton P-type ATPase pump to form a large membrane potential. plant secondary transporters are typically coupled to protons rather than to sodium.

-almost half of the Arabidopsis channel proteins are aquaporins which emphasizes the importance of hydraulics in a wide range of plant processes. - Compared with other sequenced organisms, Arabidopsis has 10-fold more predicted peptide transporters, primarily of the proton-dependent oligopeptide transport (POT) family, emphasizing the importance of peptide transport or indicating that there is broader substrate specificity than previously realized. - nearly 1,000 Arabidopsis genes encoding Ser/Thr protein kinases, suggesting that peptides may have an important role in plant signalling.

Page 9: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis9

What is TAIR*?

• NSF-funded project begun in 1999

• Web resource for Arabidopsis data and stocks

• Literature-based manual annotation of gene function

• Genome annotation (gene structure, computational gene function)

*

URL

The following slides were borrowedfrom a talk at the TAIR7 workshop by Eva Huala & Donghui Li

Page 10: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis10

Portals

Page 11: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis11

Tools

Page 12: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis12

Search

Page 13: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis13

Page 14: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis14

Names

Description

Page 15: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis15

GO annotations

Expression

Page 16: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis16

Sequences

Maps

Page 17: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis17

Mutations

Seed lines

Page 18: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis18

Seed lines

Links to othersites

Page 19: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis19

Seed lines

Links to othersites

Page 20: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis20

Seed lines

Links to othersites

Page 21: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis21

Seed lines

Links to othersites

Page 22: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis22

Comments

References

Page 23: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis23

Page 24: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis24

Page 25: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis25

Page 26: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis26

Page 27: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis27

GBrowse - coming soon

Page 28: SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture V3... - What are Tandem repeats? - How does one find.

SS 2008 lecture 4

Biological Sequence Analysis28

TAIR6 (11/05)

TAIR7 (4/07)

Protein coding genes 25,498 25,554 26,156 27,117 27,170 26,207 26,541 26,819

Transposons and pseudogenes NA 1,274 1,305 1,967 2,218 3,786 3,818 3,889

Alternatively spliced genes NA 0 28 162 1,267 2,330 3,159 3,866Gene density (kb per gene) 4.50 4.55 4.48 4.32 4.38 4.54 4.48 4.44Avg. exons per gene 5.20 5.23 5.25 5.24 5.31 5.42 5.64 5.79Avg. exon length 250 256 265 266 279 276 269 268Avg. intron length 168 168 167 166 166 164 164 165

Nature (12/00)

TIGR1 (8/01)

TIGR2 (1/02)

TIGR3 (8/02)

TIGR4 (4/03)

TIGR5 (1/04)

Protein coding genes 25,498 25,554 26,156 27,117 27,170 26,207 26,541 26,819

Transposons and pseudogenes NA 1,274 1,305 1,967 2,218 3,786 3,818 3,889

Alternatively spliced genes NA 0 28 162 1,267 2,330 3,159 3,866Gene density (kb per gene) 4.50 4.55 4.48 4.32 4.38 4.54 4.48 4.44Avg. exons per gene 5.20 5.23 5.25 5.24 5.31 5.42 5.64 5.79Avg. exon length 250 256 265 266 279 276 269 268Avg. intron length 168 168 167 166 166 164 164 165

Overview of releases to date

26,819 protein coding genes

3,866 alternatively spliced