BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing...
Transcript of BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing...
![Page 1: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/1.jpg)
BIOINFORMATICS LAB
Episode IV – Next Generation
Sequencing
Federico M. Giorgi, PhD
Department of Pharmacy and Biotechnology
First Cycle Degree in Genomics
![Page 2: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/2.jpg)
2/60
![Page 3: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/3.jpg)
3/60
Sequencing Techniques
Qu
alit
y
Length (nt)
Illumina HiSeq 2000
Illumina NextSeq 500
Roche 454
Illumina MiSeq 500
OxfordNanopore
Sanger
Solexa
throughput
20 100 300 600 2000 10000
![Page 4: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/4.jpg)
4/60
FASTQ format
![Page 5: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/5.jpg)
5/60
Phred+33 Quality encoding
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ
0.........................26.............41
![Page 6: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/6.jpg)
6/60
Phred+33 Quality encoding
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ
0.........................26.............41
The numeric Quality Score (Q) is then converted to the
error probability (p) using this formula:
Q = -10 log10(P)
P = 10-Q/10
![Page 7: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/7.jpg)
7/60
FastQC
![Page 8: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/8.jpg)
8/60
• Quality
• Adapters
Read Trimming
![Page 9: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/9.jpg)
9/60
Read Trimming
Barplots indicating the performance of nine read trimming tools at different quality thresholds on a Homo sapiens RNA-Seq dataset.
![Page 10: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/10.jpg)
10/60
Read Trimming
![Page 11: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/11.jpg)
11/60
• Benefits for
– RNA-Seq (higher quality reads)
– Variant/Mutation Calling (lower error rate)
– Genome Assembly (faster with lower RAM requirements at similar quality
levels)
Read Trimming
![Page 12: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/12.jpg)
12/60
• Generated during library preparation (sequence amplification
• Detected by FASTQC
• Taken care of by most Trimming Tools (e.g. PRINSEQ)
PCR duplicates removal
![Page 13: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/13.jpg)
13/60
• Input: FASTQ
• Tools
– DNA: BWA, Bowtie, Bowtie2
– RNA: Tophat, STAR
– Both: Hisat2
• Output: SAM
Aligning Reads on a Genome
![Page 14: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/14.jpg)
14/60
The SAM format• Format used to store information on read alignment on a reference genome
• Can be compressed (BAM)
• Can contain only aligned reads (SAM < FASTQ)
• Can contain all reads (you can then delete the original FASTQ files)
![Page 15: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/15.jpg)
15/60
The SAM format• Format used to store information on read alignment on a reference genome
• Can be compressed (BAM)
• Can contain only aligned reads (SAM < FASTQ)
• Can contain all reads (you can then delete the original FASTQ files)
![Page 16: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/16.jpg)
16/60
The SAM format• Format used to store information on read alignment on a reference genome
• Can be compressed (BAM)
• Can contain only aligned reads (SAM < FASTQ)
• Can contain all reads (you can then delete the original FASTQ files)
https://samtools.github.io/hts-specs/SAMv1.pdf
![Page 17: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/17.jpg)
17/60
The SAM Flag Column
FLAG
![Page 18: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/18.jpg)
18/60
The SAM Flag Column
FLAG
The number is a univocal sum of individual flags,
such as:
• Read paired: 1
• Both reads in pair are aligned: 2
• Read not aligned: 4
• Read in reverse strand: 10
• Secondary alignment: 2048
![Page 19: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/19.jpg)
19/60
The SAM Flag Column
FLAG
The number is a univocal sum of individual flags, in
hexadecimal format (x) such as:
• Read paired: 0x1
• Both reads in pair are aligned: 0x2
• Read not aligned: 0x4
• Read in reverse strand: 0x10
• Second in pair: 0x80
• Secondary alignment: 0x2048
…etc
E.g.
• Read Paired: 0x1=1
• Both reads in pair are aligned: 0x2=2• Read in reverse strand: 0x10=16
• Second in pair: 0x80=128
Total: 128 + 16 + 2 + 1 = 147
![Page 20: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/20.jpg)
20/60
The SAM Flag Column
FLAG
The number is a univocal sum of individual flags, in
hexadecimal format (x) such as:
• Read paired: 0x1
• Both reads in pair are aligned: 0x2
• Read not aligned: 0x4
• Read in reverse strand: 0x10
• Second in pair: 0x80
• Secondary alignment: 0x2048
…etc
E.g.
• Read Paired: 0x1=1
• Both reads in pair are aligned: 0x2=2• Read in reverse strand: 0x10=16
• Second in pair: 0x80=128
Total: 128 + 16 + 2 + 1 = 147
Trick: if this column is an
odd number, the dataset
has paired reads
![Page 21: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/21.jpg)
21/60
The SAM CIGAR Column
CIGAR
![Page 22: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/22.jpg)
22/60
The SAM CIGAR Column
CIGAR
• A string describing how the read
aligns with the reference
• It consists of one or more
components
• Each component comprises an
operator and the number of bases
which the operator applies to
![Page 23: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/23.jpg)
23/60
The SAM CIGAR Column
CIGAR
• A string describing how the read
aligns with the reference
• It consists of one or more
components
• Each component comprises an
operator and the number of bases
which the operator applies to
CIGAR string operators:
D Deletion; the nucleotide is present in the reference but not in the read
H Hard Clipping; the clipped nucleotides are not present in the read.
I Insertion; the nucleotide is present in the read but not in the rference.
M Match; can be either an alignment match or mismatch. The nucleotide
is present in the reference.
N Skipped region; a region of nucleotides is not present in the read
P Padding; padded area in the read and not in the reference
S Soft Clipping; the clipped nucleotides are present in the read
![Page 24: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/24.jpg)
24/60
The SAM CIGAR Column
CIGAR
• A string describing how the read
aligns with the reference
• It consists of one or more
components
• Each component comprises an
operator and the number of bases
which the operator applies to
CIGAR string operators:
D Deletion; the nucleotide is present in the reference but not in the read
H Hard Clipping; the clipped nucleotides are not present in the read.
I Insertion; the nucleotide is present in the read but not in the rference.
M Match; can be either an alignment match or mismatch. The nucleotide is
present in the reference.
N Skipped region; a region of nucleotides is not present in the read
P Padding; padded area in the read and not in the reference
S Soft Clipping; the clipped nucleotides are present in the read
![Page 25: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/25.jpg)
25/60
The SAM CIGAR Column
![Page 26: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/26.jpg)
26/60
Common Operations:
• Converting to BAM (binary zipped SAM: smaller)
• Sort BAM (required by BAM visualizers for faster navigation)
• Index BAM (generates a BAI, makes the BAM faster to read by tools)
• Merge BAMs (e.g. from technical replicates)
Common Tools:
• samtools (the old classic: fast and reliable)
• Picard Tools (the Broad Institute alternative: it performs more operations
and has several more parameters to play with)
Working on SAM files
![Page 27: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/27.jpg)
27/60
• samtools tview
– Command line
– Fast
– Weak
• Tablet
– The first beautiful GUI
• SeqMonk
– For ChIP-Seq
• Integrative Genomics Viewer
– Everyon uses this
Visualizing BAMs
![Page 28: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/28.jpg)
28/60
• GEO – Gene Expression Omnibus
– American (NCBI, Bethesda, Maryland)
– Largest repository of high-throughput data in the World
• NGS
• Microarrays
Getting NGS data from public databases
![Page 29: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/29.jpg)
29/60
• ArrayExpress
– European (EBI, Hinxton, United Kingdom)
– More recent than GEO (better search tools)
– GEO and ArrayExpress are partially redundant
Getting NGS data from public databases
![Page 30: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/30.jpg)
30/60
• Sequence Read Archive SRA
– Subset of NCBI GEO specifically for NGS data (no microarrays)
– Raw data is available
– Essentially FASTQ files
– Compressed and optionally encrypted in the SRA format
Getting NGS data from public databases
![Page 31: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/31.jpg)
31/60
• Common pipeline when you start from a public dataset
– Find a suitable dataset (ArrayExpress is the best)
– Find a link to the sample IDs (in SRA format)
– Download SRA files
– Convert SRA files to FASTQ files
– Quality control of FASTQ files
– Optional FASTQ Trimming/Adapter removal
– FASTQ alignment on reference genome (BAM)
– BAM visualization
– Downstream Analysis
The SRA Toolkit
![Page 32: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/32.jpg)
32/60
• Common pipeline when you start from a public dataset
– Find a suitable dataset (ArrayExpress is the best)
– Find a link to the sample IDs (in SRA format)
– Download SRA files
– Convert SRA files to FASTQ files
– Quality control of FASTQ files
– Optional FASTQ Trimming/Adapter removal
– FASTQ alignment on reference genome (BAM)
– BAM visualization
– Downstream Analysis
The SRA Toolkit
NCBI’s SRA
Toolkit}
![Page 33: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/33.jpg)
33/60
Three Datasets
We will now download and analyze 3 different datasets
Each one represents the three major classes of NGS Experiments:
• DNA-Seq• Whole Genome Sequencing (WGS)
• Whole Exome Sequencing (WXS)
• RNA-Seq
• ChIP-Seq
StarkLannister
Baratheon
![Page 34: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/34.jpg)
34/60
Converting BAM to gene expression
The predominant reads within a BAM originating from
an RNA-Seq experiment derive from messenger RNAs
RNA-seq reads
Short (36-250 bases)High error rates (1%)Hundreds of millions of readsMany reads span exon-exon junctions
![Page 35: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/35.jpg)
35/60
Converting BAM to gene expression
Peculiarities of RNA-Seq short reads:
• Alignment is not uniform (proportional to transcript expression)
• Alignment on the same transcript is not uniform (exonucleases
cut from 5’ and 3’)• When aligned on the genome, eukaryotic RNASeq reads can
span across introns
• Alternative isoforms
• RNA editing
![Page 36: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/36.jpg)
36/60
The GFF format
1.seqid - Chromosome/Scaffold/Reference name
2.source - Source that annotated this feature
3.type - Type of feature (e.g. gene, transcript, exon)
4.start - Start position of the feature
5.end - End position of the feature
6.score - A floating point value (can be used for e.g. peak intensity for ChIP-Seq features)
7.strand - defined as + (forward) or - (reverse).
8.phase - 0, 1 or 2. For coding sequences. “0” means “in frame”, 1 and 2 mean that the codon is shifted 1 or 2 bases
9.attributes - A semicolon-separated list of tag-value pairs, providing additional information about each feature. E.gID, Parent, gene_type, gene_name
Tab-separated
Empty columns denoted with “.”
![Page 37: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/37.jpg)
37/60
Getting counts from RNA-Seq
GFF3
annotation
BAM
alignment
Htseq-count} Gene Counts
![Page 38: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/38.jpg)
38/60
Getting counts from RNA-Seq
GFF3
annotation
BAM
alignment
Htseq-count} Gene Counts
Exon Counts
Transcript Counts
Anything Counts
![Page 39: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/39.jpg)
39/60
Getting counts from RNA-Seq
GFF3
annotation
BAM
alignment
Htseq-count} Gene Counts
Exon Counts
Transcript Counts
Anything Counts
![Page 40: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/40.jpg)
40/60
Let’s open The Terminal!
Reminders:• userid student• password 4genomics4
Terminal
![Page 41: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/41.jpg)
41/60
Sequences Exercises(Open exercises_04_NGS.pdf)
![Page 42: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/42.jpg)
42/60
• Please turn it off nicely
Turn Unix off CORRECTLY
Click on the mouse
![Page 43: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/43.jpg)
43/60
• Please turn it off nicely
Turn Unix off CORRECTLY
Click Again
![Page 44: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/44.jpg)
44/60
• Please turn it off nicely
Turn Unix off CORRECTLY
Last Click
![Page 45: BIOINFORMATICS LAB Episode IV Next Generation Sequencing · 2019-04-08 · 3/60 Sequencing Techniques y Length (nt) Illumina HiSeq 2000 Illumina NextSeq 500 Roche 454 Illumina MiSeq](https://reader033.fdocuments.in/reader033/viewer/2022052803/5f82c3ce79bb6800f147f814/html5/thumbnails/45.jpg)
www.giorgilab.org
Federico M. Giorgi, PhD
Department of Pharmacy and Biotechnology