Alexis Dereeper, François Sabot
description
Transcript of Alexis Dereeper, François Sabot
Alexis Dereeper, François Sabot
Analysis of NGS raw data with Galaxy
Cleaning, data control, alignment, polymorphism
Alexis Dereeper CIBA courses – Brasil 2011
Aim of the Tutorial classes:
1- Galaxy vs Command line
2- Understand FASTQ files
3- Cleaning of Illumina data (FASTQ)
4- Perform an assembly
5- Perform a mapping of Illumina reads on a reference sequence
6- Cleaning of a multiple SAM file
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
1- Galaxy
Serveur principal: http://main.g2.bx.psu.edu/
CIRAD Server : http://gohelle.cirad.fr/galaxy/
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
TOOLS DATA
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
WEB APPLICATION - “Click'n'Play” system- transparent for user
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks
Alexis Dereeper CIBA courses – Brasil 2011
WEB APPLICATION - “Click'n'Play” system- transparent for user
Alexis Dereeper, François Sabot
MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...
Alexis Dereeper CIBA courses – Brasil 2011
MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks
WEB APPLICATION - “Click'n'Play” system- transparent for user
Alexis Dereeper, François Sabot
BUT- Simple support- Much less powerful than terminal- Only for routine analysis - Only for limited data
Alexis Dereeper CIBA courses – Brasil 2011
MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...
MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks
WEB APPLICATION - “Click'n'Play” system- transparent for user
CONNECTION FOR THE TUTORIAL CLASSES:
http://gohelle.cirad.fr/galaxy/
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Connecting...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Add data...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Import data from Galaxy libraries
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Import data from Galaxy libraries
FASTQ file → TEXT file
STRUCTURE:
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTACGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
@HWUSI-EAS454_0006:1:37:16314:3410#CTTGTAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG
+`bTbbccccceeeeeceeeecccYeedded`ceec]dddde^a`deeeec\`dddcbaadadYd`]]Jc_^bc^^\
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
SEQUENCE NAME
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
IUPAC SEQUENCE
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Quality in ASCII
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+
cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper CIBA courses – Brasil 2011
f → Quality = 38 (102 – 64)
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA
CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT
+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb
Alexis Dereeper, François Sabot
WHAT IS QUALITY ?
Quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect).
Alexis Dereeper CIBA courses – Brasil 2011
FASTQC: quality control
http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#fastqc
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Why do we need to clean ?
To remove remaining adapters/primers andlow quality sequences
→ CutAdapt
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
20
7
Alexis Dereeper CIBA courses – Brasil 2011
70
Your data are now ready to be analyzed...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Concatenate files
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Untested Tools → NGS → Assembly → Assemble with MIRA
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
BLAST of putative contigs against reference
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
BLAST of putative contigs against reference
Separate sequences by original individuals RC1, RC2...
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Separate sequences by original individuals RC1, RC2...
Separate sequences by original individuals RC1, RC2...
Use of regular expression via Galaxy:
→ RC[13456789] & remove reads => keep RC2
→ RC[123456789]_ & remove reads => keep RC10
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Separate sequences by original individuals RC1, RC2...
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
2- Associate positions of each member of the pair
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
2- Associate positions of each member of the pair
3- Selection of the more probable position respecting the conditions
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Mapping: Map 'pair-end‘ reads on a reference
1- Compute positions for each read
2- Associate positions of each member of the pair
3- Select of the more probable position respecting the conditions
4- Edit a SAM output file
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Reference From History: Shared data/Formation/PreProcess/reference.fasta
Library: Paired-end
FASTQ files: From your history
BWA setting to use: Commonly Used
Unselect “Suppress the header in the output SAM file”
Click
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
SAM output file (Sequence Alignment/Map)
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François Sabot
Sort of SAM file by coordinate
Alexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Creation of Workflow for automated analysis
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Workflow: how to avoid to run all the process by hand
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011
Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011