Bioinformatics and Sequencing Relevant to SolCAP
-
Upload
zephania-gentry -
Category
Documents
-
view
29 -
download
1
description
Transcript of Bioinformatics and Sequencing Relevant to SolCAP
![Page 1: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/1.jpg)
1
Bioinformatics and Sequencing Relevant to SolCAP
C. Robin BuellDepartment of Plant Biology
Michigan State UniversityEast Lansing MI 48824
![Page 2: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/2.jpg)
An Overview ofDNA Sequencing
![Page 3: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/3.jpg)
3
Prokaryotic DNA
http://en.wikipedia.org/wiki/Image:Prokaryote_cell_diagram.svg
Plasmid
![Page 4: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/4.jpg)
4
Eukaryotic DNA
http://en.wikipedia.org/wiki/Image:Plant_cell_structure_svg.svg
![Page 5: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/5.jpg)
5
The two strands of a DNA molecule are held together by weak bonds (hydrogen bonds) between the nitrogenous bases, which are paired in the interior of the double helix.
The two strands of DNA are antiparallel; they run in opposite directions. The carbon atoms of the deoxyribose sugars are numbered for orientation.
DNA Structure
http://en.wikipedia.org/wiki/Image:DNA_chemical_structure.png
![Page 6: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/6.jpg)
The goal of sequencing DNA is to tell the order of the bases, or nucleotides, that form the inside of the double-helix molecule.
High throughput sequencing methods
-Sanger/Dideoxy
-Next Generation (NextGen)
Sequencing DNA
![Page 7: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/7.jpg)
Whole Genome Shotgun Sequencing
• Start with a whole genome
• Shear the DNA into many different, random segments.
• Sequence each of the random segments.
• Then, put the pieces back together again in their original order using a computer
![Page 8: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/8.jpg)
8
![Page 9: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/9.jpg)
9
SEQUENCER OUTPUT
ASSEMBLE FRAGMENTS
TAGCTAGC
AGCTAGC
AGCTAGGCTC
AGCTCGCTAGCTAGCTAGCTAGCTAGGCTC
AGCTCGCTATAGCTAGCTA
CTAGCTAGCTAGGCTCGCTAGCTAGCT
CTCGCTAGCTAG
AGCTCGCTAGCTAGCTAGCTAGC
AGCTAGGCTC AGCTCGCTA
CTAGCTAGCTAGGCTC
GCTAGCTAGCT
AGCTCGCTAGCTA
TAGCTAGCTAAGCTAGC
CTCGCTAGCTAGTAGCTAGC
GCTAGCTAGC
Gene 1Gene 2Gene 3……
Fill in any gaps
Annotate genes
![Page 10: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/10.jpg)
10
Theory Behind Shotgun Sequencing
Haemophilus influenzae 1.83 Mb base Coverage unsequenced (%)
1X 37%2X 13%5X 0.67%6X 0.257X 0.09%
For 1.83 Mb genome, 6X coverage is 10.98 Mb of sequence, or 22,000 sequencing reactions, 11000 clones (1.5-2.0 kb insert), 500 bp average read.
0
500
1000
1500
2000
2500
3000
0 20000 40000 60000 80000
Sequences
Gap
s
![Page 11: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/11.jpg)
11
-Initial dideoxy sequencing involved use of radioactive dATP and 4 separate reactions (ddATP, ddTTP, ddCTP, ddGTP) & separation on 4 separate lanes on an acrylamide gel with detection through autoradiogram
-New techologies use 4 fluorescently labeled bases and separation on capillaries and detection through a CCD camera
Sanger Dideoxy Sequencing reactions
![Page 12: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/12.jpg)
12
Sanger Dideoxy DNA sequencing
![Page 13: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/13.jpg)
Data Analysis
• An chromatogram is produced and the bases are called
• Software assign a quality value to each base • Phred & TraceTuner
• Read DNA sequencer traces• Call bases• Assign base quality values• Write basecalls and quality values to output files.
![Page 14: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/14.jpg)
14
GOOD
BAD
![Page 15: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/15.jpg)
15
454 Genome Sequencing System
• Library prep, amplification and sequencing: 2-4 days• Single sample preparation from bacterial to human genomic DNA• Single amplification per genome with no cloning or cloning artifacts• Picoliter volume molecular biology• 400 Mb per run (4-5 hr); less than $ 15,000 per run• Read lengths 200-230 bases; new Titanium platform, 400 Mb per
run, 400-500 bases per reads• Massively parallel imaging, fluidics and data analysis • Requires high genome coverage for good assembly• Error rate of 1-2%• Problem with homopolymers
![Page 16: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/16.jpg)
16
454-Pyrosequencing
Perform emulsion PCR
Depositing DNA Beads into the PicoTiter™Plate
Construct Single stranded
adaptor liagated DNA
Sequencing by Synthesis:Simultaneous sequencing of the entire genome in hundreds of thousands of picoliter-size wells
Pyrophosphate signal generation
![Page 17: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/17.jpg)
17
Solexa/lIlumina Sequencing
• Sequencing by synthesis (not chain termination)• Generate up to 12 Gb per run
![Page 18: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/18.jpg)
18
![Page 19: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/19.jpg)
19
Other “Next Generation” Sequencing Technologies
SoLiD by Applied Biosystems- short reads (~25-75 nucleotides)
Helicos- short reads (<50 nucleotides)
Pacific Biosystems-LONG reads (several kilobases)
![Page 20: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/20.jpg)
20
Wheat:16,000 Mb
Arabidopsis130 Mb John Doe
2,500 Mb
5 Mb
Rice:430 Mb
How much sequences are needed to assemble a eukaryotic genome?
-Depends on the genome size of the organism (genes plus repeats), ploidy level, heterozygosity, desired quality
Potato850 Mb
![Page 21: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/21.jpg)
21
Gene GeneGeneIntergenicRegion
IntergenicRegion
Eukaryotic Genomes and Gene Structures
![Page 22: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/22.jpg)
22
What is an EST? single pass sequence from cDNA specific tissue, stage, environment, etc.
Multiple tissues, states..with enough sequences, can ask quantitative questions
cDNA library in E.coli
pick individual clones
template prep
pBluescript
T7 T3
Insert in
Expressed Sequence Tags (ESTs): Sampling the Transcriptome and Genic Regions
![Page 23: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/23.jpg)
23
Uses of EST sequencing:-Gene discovery-Digital northerns/insights into transcriptome-Genome analyses, especially annotation of genomic DNA-SNP discovery in genic regions
Issues with EST sequencing:-Inherent low quality due to single pass nature-Not 100 % full length cDNA clones -Redundant sequencing of abundant transcripts
Address throughclustering/
assembly to buildconsensus sequences
= Gene Index,Unigene Set,
Transcript Assembly
![Page 24: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/24.jpg)
24
Locus/Gene
Gene models
Full length cDNAs
Expressed Sequence Tags
![Page 25: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/25.jpg)
25
Types of Genomic/DNA-based Diagnostic Markers
1. Restriction Fragment Length Polymorphisms (RFLPs)2. Random Amplification of Polymorphic DNA (RAPDs)3. Cleaved Amplified Polymorphisms (CAPs)4. Amplified Fragment Length Polymorphisms (AFLPs)5. Simple Sequence Repeats (SSRs; microsatellites)6. Single Nucleotide Polymorphisms (SNPs)
![Page 26: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/26.jpg)
26
SSRs-Specific primers that flank simple sequence repeat (mono-, di-, tri-, tetra-, etc) which has a higher likelihood of a polymorphism-Amplify genomic DNA -Separate on gel-Look for size polymorphisms
http://www.nal.usda.gov/pgdic/Probe/v2n1/chart.gif
http://cropandsoil.oregonstate.edu/classes/css430/images/0902.jpg
![Page 27: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/27.jpg)
27
SSRs-Computational prediction of SSRs in potato transcriptome data-http://solanaceae.plantbiology.msu.edu/analyses_ssr_query.php
![Page 28: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/28.jpg)
28
SNPs-Specific primers-Amplify genomic DNA-Detect mismatch (many methods for this)
http://cmbi.bjmu.edu.cn/cmbidata/snp/images/SNP.gif
![Page 29: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/29.jpg)
29
Potato SNPs: Intra-varietal and inter-varietal-Bulk of sequence data from ESTs (Sanger derived)-Use computational methods to identify SNPs within existing potato ESTs-http://solanaceae.plantbiology.msu.edu/analyses_snp.php
![Page 30: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/30.jpg)
Illumina Paired End RNA-Seq
• Potato Varieties: Atlantic, Premier, Snowden• Two Paired End RNA-Seq runs were performed.• Reads are 61bp long• Insert sizes:
• Atlantic: 350bp• Premier: 300bp• Snowden: 300bp
• Paired End Sequencing is carried out by an Illumina module that regenerates the clusters after the first run and sequences the clusters from the other end.
![Page 31: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/31.jpg)
Velvet Assemblies of Potato Illumina Sequences
• With a read length of 31 and a minimum contig length of 150bp:
• Atlantic:• 45214 contigs
• Premier:• 54917 contigs
• Snowden:• 58754 contigs
![Page 32: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/32.jpg)
Sequence quality: Viewing a Atlantic potato contig from the Velvet assembly
![Page 33: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/33.jpg)
Query SNPs Filtered SNPs
Atlantic Asm 224748 150669
Premier Asm 265673 181800
Snowden Asm 258872 166253
Identify intra-varietal SNPs
A/C SNP
![Page 34: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/34.jpg)
Hawkeye Viewer – Visualizing SNPs
G/T SNP
![Page 35: Bioinformatics and Sequencing Relevant to SolCAP](https://reader034.fdocuments.in/reader034/viewer/2022051516/56812b46550346895d8f5fd6/html5/thumbnails/35.jpg)
Analyses in progressSNP Identification:-Identify inter-varietal SNPs using draft genome sequence from S. phureja-Identify only biallelic SNPs-Identify high confidence SNPs-Identify SNPs that meet Infinium design requirements
SNP Selection:-Annotate transcripts for gene function-Identify candidate genes within SNP set