MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor,...
-
Upload
whitney-leona-carroll -
Category
Documents
-
view
212 -
download
0
Transcript of MES7594-01 Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor,...
Genome Informatics I (2015 Spring)
MES7594-01 Genome Infor-matics I
- Lecture IV. NGS basics
Sangwoo Kim, Ph.D.Assistant Professor,
Severance Biomedical Research Institute, Yonsei University College of Medicine
Genome Informatics I (2015 Spring)
Overview
• Goal of this lecture– You will learn the basic technologies and proper-
ties of Next Generation Sequencing
• Sequencing technologies– Sanger sequencing– Next generation sequencing
• Illumina sequencing• 454/Ion torrent sequencing• Other sequencing
– Raw data (fastq)• Format/Phred Quality
– Practice• meet the raw data
Genome Informatics I (2015 Spring)
SEQUENCING TECHNOLO-GIES
Genome Informatics I (2015 Spring)
Traditional Sequencing
1. Genomic DNA is fragmented, then cloned to a plasmid vector and used to transform E. coli
2. For each sequencing reaction, a single bacterial colony is picked and plasmid DNA isolated
3. Each cycle sequencing reaction takes place within a microliter-scale volume
Genome Informatics I (2015 Spring)
Sanger Sequencing
Genome Informatics I (2015 Spring)
Next Generation Sequenc-ing
• No cloning– DNA to be sequenced is used to construct a library of
fragments that have synthetic DNAs (adapters) added covalently to each fragment end by use of DNA ligase
• Amplification can be done in parallel– Library fragments are amplified in situ on a solid surface
• Sequencing can be done in parallel (in 3 it-erative steps)– a nucleotide addition step– a detection step– a wash step
Genome Informatics I (2015 Spring)
Illumina Sequencing
Genome Informatics I (2015 Spring)
Illumina Sequencing
Genome Informatics I (2015 Spring)
Illumina Sequencing
Genome Informatics I (2015 Spring)
Illumina Sequencing
https://www.youtube.com/watch?v=HMyCqWhwB8E
Genome Informatics I (2015 Spring)
Ion Torrent Sequencing
1. DNA capture on beads2. Single bead in a well3. Attach one nucleotide (A/T/G/C)
at one time4. Detect pH change
1. Measure the level of change for homopolymer detection
Genome Informatics I (2015 Spring)
Ion Torrent Sequencing
Genome Informatics I (2015 Spring)
Ion Torrent Sequencing
Genome Informatics I (2015 Spring)
Ion Torrent Sequencing
Genome Informatics I (2015 Spring)
Pacbio SMRT sequencing
zero-mode waveguide (ZMW) http://www.pacificbiosciences.com/products/smrt-technology/
Genome Informatics I (2015 Spring)
Nanopore sequencing
https://www.youtube.com/watch?v=3UHw22hBpAk
Genome Informatics I (2015 Spring)
Comparison
18
19
Genome Informatics I (2015 Spring)
NGS DATAraw data (FASTQ)
Genome Informatics I (2015 Spring)
FASTA format
A format for DNA (or protein) se-quence
Genome Informatics I (2015 Spring)
FASTQ format (NGS raw data)
one read
se-quencequal-ity
A format for NGS read (FASTQ + qual-ity)
Genome Informatics I (2015 Spring)
Practice
• First look on NGS data
cd /scratch/2015_GenomeInformatics/public/fastq ls less sample1.fastq
Genome Informatics I (2015 Spring)
[email protected] D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTG-TATTCACCTTAGAAC+SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101#1:BDDDDF?FF@B>:ACFIBCGB3BF@C<?F9?DFBFCFEBFEFIFEIFFFDC>@ABBBB?BBBBBBBB?@:?AA@B@?(:4:>?<AB@:B@@B>>ABBB
Genome Informatics I (2015 Spring)
Quality
• Each basecall (a call for nucleotide – ‘A’,’T’,’C’,’G’) has its own quality– quality is a confidence of the machine
Genome Informatics I (2015 Spring)
Phred scale [email protected] D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101NCTCTCACCGAGCTCCACGAACGATAAGGGAATCAGTCTTAAAAGAGCCGCGAGTTACAGGCACACCTGAGAGAAAGAGATGTTTG-TATTCACCTTAGAAC+SRR1798798.1 D4LHBFN1:204:D1B2UACXX:6:1101:1156:1996 length=101#1:BDDDDF?FF@B>:ACFIBCGB3BF@C<?F9?DFBFCFEBFEFIFEIFFFDC>@ABBBB?BBBBBBBB?@:?AA@B@?(:4:>?<AB@:B@@B>>ABBB
Q = -10log10(e)
Probability of the base call being wrong10%, 1%, 0.1%,
0.01%...
10, 20, 30, 40…Quality score
+33
+,5,?,I…
ASCII code table
Genome Informatics I (2015 Spring)
practice• pick any sequence and find out where it is from• calculate what is the probability of a basecall with
quality ‘D’ is wrong• (advanced) write a python code that transforms Q
to e (or vice versa) – hint: function chr(i) converts the integer i to its matching
ASCII code character. e.g. chr(65)=‘A’– function ord(c) converts the character c to its matching
ASCII code integer. e.g. ord(‘A’)=65– math.log(10, x) calculates the log10 value of X
• You must import math library at the first line (import math)
– answer is in public/script/qtoe.py