SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture...
-
date post
18-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of SS 2008lecture 4 Biological Sequence Analysis 1 V4 Genome of Arabidopsis thaliana Review of lecture...
SS 2008 lecture 4
Biological Sequence Analysis1
V4 Genome of Arabidopsis thaliana
Review of lecture V3 ...
- What are Tandem repeats?
- How does one find CpG islands?
- What are Gardiner-Frommer and Takai-Jones parameters?- Why do we need t-tests? –- What are the findings of (Hutter et al. 2006)?
SS 2008 lecture 4
Biological Sequence Analysis2
Arabidopsis thaliana
Arabidopsis thaliana is a small
flowering plant that is widely
used as a model organism in
plant biology.
Arabidopsis is a member of the
mustard (Brassicaceae) family,
which includes cultivated
species such as cabbage and
radish.
Arabidopsis is not of major
agronomic significance, but it
offers important advantages for
basic research in genetics and
molecular biology.
TAIR
SS 2008 lecture 4
Biological Sequence Analysis3
Some useful statistics for Arabidopsis thaliana
– Small genome (114.5 Mb/125 Mb total) has been sequenced in the year 2000.
– Extensive genetic and physical maps of all 5 chromosomes.
– A rapid life cycle (about 6 weeks from germination to mature seed).
– Prolific seed production and easy cultivation in restricted space.
– Efficient transformation methods utilizing Agrobacterium tumefaciens.
– A large number of mutant lines and genomic resources many of which are available from Stock Centers.
– Multinational research community of academic, government and industry laboratories.
• Such advantages have made Arabidopsis a model organism for studies of the cellular and molecular biology of flowering plants.TAIR collects and makes available the information arising from these efforts.
TAIR
SS 2008 lecture 4
Biological Sequence Analysis4
Arabidopsis thaliana genome sequenceRepresentation of the Arabidopsis
chromosomes. Sequenced portions are
red, telomeric and centromeric regions
are light blue, heterochromatic knobs are
shown black and the rDNA repeat regions
are magenta.
Left: DAPI-stained chromosomes.
Gene density (`Genes') ranged from 38
per 100 kb to 1 gene per 100 kb;
expressed sequence tag matches
(`ESTs') ranged from more than 200 per
100 kb to 1 per 100 kb.
Transposable element densities (`TEs')
ranged from 33 per 100 kb to 1 per 100
kb.
Mitochondrial and chloroplast insertions
(`MT/CP') were assigned black and green
tick marks, respectively.
Transfer RNAs and small nucleolar RNAs
(`RNAs') were assigned black and red
ticks marks, respectively. Nature 408, 796 (2000)
SS 2008 lecture 4
Biological Sequence Analysis5
Arabidopsis thaliana genome sequence
Nature 408, 796 (2000)
The proportion of Arabidopsis proteins having related counterparts in eukaryotic genomes varies by a factor of 2 to 3 depending on the functional category. Only 8 ± 23% of Arabidopsis proteins involved in transcription have related genes in other eukaryotic genomes, reflecting the independent evolution of many plant transcription factors. In contrast, 48 ± 60% of genes involved in protein synthesis have counterparts in the other eukaryotic genomes, reflecting highly conserved gene functions. The relatively high proportion of matches between Arabidopsis and bacterial proteins in the categories `metabolism' and `energy' reflects both the acquisition of bacterial genes from the ancestor of the plastid and high conservation of sequences across all species. Finally, a comparison between unicellular and multicellular eukaryotes indicates that Arabidopsis genes involved in cellular communication and signal transduction have more counterparts in multicellular eukaryotes than in yeast, reflecting the need for sets of genes for communication in multicellular organisms.
SS 2008 lecture 4
Biological Sequence Analysis6
Many genes were duplicated
Nature 408, 796 (2000)
SS 2008 lecture 4
Biological Sequence Analysis7
Segmental duplication
Nature 408, 796 (2000)
Segmentally duplicated regions in the Arabidopsis genome.
Individual chromosomes are depicted as horizontal grey bars (with chromosome 1
at the top), centromeres are marked black.
Coloured bands connect corresponding duplicated segments. Similarity between
the rDNA repeats are excluded. Duplicated segments in reversed orientation are
connected with twisted coloured bands.
SS 2008 lecture 4
Biological Sequence Analysis8
Membrane channels and transporters
Nature 408, 796 (2000)
Transporters in the plasma and intracellular membranes of Arabidopsis are responsible for the acquisition, redistribution and compartmentalization of organic nutrients and inorganic ions, as well as for the efflux of toxic compounds and metabolic end products, energy and signal transduction. Unlike animals, which use a sodium ion P-type ATPasepump to generate an electrochemical gradient across the plasma membrane, plants and fungi use a proton P-type ATPase pump to form a large membrane potential. plant secondary transporters are typically coupled to protons rather than to sodium.
-almost half of the Arabidopsis channel proteins are aquaporins which emphasizes the importance of hydraulics in a wide range of plant processes. - Compared with other sequenced organisms, Arabidopsis has 10-fold more predicted peptide transporters, primarily of the proton-dependent oligopeptide transport (POT) family, emphasizing the importance of peptide transport or indicating that there is broader substrate specificity than previously realized. - nearly 1,000 Arabidopsis genes encoding Ser/Thr protein kinases, suggesting that peptides may have an important role in plant signalling.
SS 2008 lecture 4
Biological Sequence Analysis9
What is TAIR*?
• NSF-funded project begun in 1999
• Web resource for Arabidopsis data and stocks
• Literature-based manual annotation of gene function
• Genome annotation (gene structure, computational gene function)
*
URL
The following slides were borrowedfrom a talk at the TAIR7 workshop by Eva Huala & Donghui Li
SS 2008 lecture 4
Biological Sequence Analysis10
Portals
SS 2008 lecture 4
Biological Sequence Analysis11
Tools
SS 2008 lecture 4
Biological Sequence Analysis12
Search
SS 2008 lecture 4
Biological Sequence Analysis13
SS 2008 lecture 4
Biological Sequence Analysis14
Names
Description
SS 2008 lecture 4
Biological Sequence Analysis15
GO annotations
Expression
SS 2008 lecture 4
Biological Sequence Analysis16
Sequences
Maps
SS 2008 lecture 4
Biological Sequence Analysis17
Mutations
Seed lines
SS 2008 lecture 4
Biological Sequence Analysis18
Seed lines
Links to othersites
SS 2008 lecture 4
Biological Sequence Analysis19
Seed lines
Links to othersites
SS 2008 lecture 4
Biological Sequence Analysis20
Seed lines
Links to othersites
SS 2008 lecture 4
Biological Sequence Analysis21
Seed lines
Links to othersites
SS 2008 lecture 4
Biological Sequence Analysis22
Comments
References
SS 2008 lecture 4
Biological Sequence Analysis23
SS 2008 lecture 4
Biological Sequence Analysis24
SS 2008 lecture 4
Biological Sequence Analysis25
SS 2008 lecture 4
Biological Sequence Analysis26
SS 2008 lecture 4
Biological Sequence Analysis27
GBrowse - coming soon
SS 2008 lecture 4
Biological Sequence Analysis28
TAIR6 (11/05)
TAIR7 (4/07)
Protein coding genes 25,498 25,554 26,156 27,117 27,170 26,207 26,541 26,819
Transposons and pseudogenes NA 1,274 1,305 1,967 2,218 3,786 3,818 3,889
Alternatively spliced genes NA 0 28 162 1,267 2,330 3,159 3,866Gene density (kb per gene) 4.50 4.55 4.48 4.32 4.38 4.54 4.48 4.44Avg. exons per gene 5.20 5.23 5.25 5.24 5.31 5.42 5.64 5.79Avg. exon length 250 256 265 266 279 276 269 268Avg. intron length 168 168 167 166 166 164 164 165
Nature (12/00)
TIGR1 (8/01)
TIGR2 (1/02)
TIGR3 (8/02)
TIGR4 (4/03)
TIGR5 (1/04)
Protein coding genes 25,498 25,554 26,156 27,117 27,170 26,207 26,541 26,819
Transposons and pseudogenes NA 1,274 1,305 1,967 2,218 3,786 3,818 3,889
Alternatively spliced genes NA 0 28 162 1,267 2,330 3,159 3,866Gene density (kb per gene) 4.50 4.55 4.48 4.32 4.38 4.54 4.48 4.44Avg. exons per gene 5.20 5.23 5.25 5.24 5.31 5.42 5.64 5.79Avg. exon length 250 256 265 266 279 276 269 268Avg. intron length 168 168 167 166 166 164 164 165
Overview of releases to date
26,819 protein coding genes
3,866 alternatively spliced