Alexis Dereeper, François Sabot

Post on 07-Jan-2016

56 views 0 download

Tags:

description

Analysis of NGS raw data with Galaxy. Cleaning, data control, alignment, polymorphism. CIBA courses – Brasil 2011. Alexis Dereeper. Alexis Dereeper, François Sabot. Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ) - PowerPoint PPT Presentation

Transcript of Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Analysis of NGS raw data with Galaxy

Cleaning, data control, alignment, polymorphism

Alexis Dereeper CIBA courses – Brasil 2011

Aim of the Tutorial classes:

1- Galaxy vs Command line

2- Understand FASTQ files

3- Cleaning of Illumina data (FASTQ)

4- Perform an assembly

5- Perform a mapping of Illumina reads on a reference sequence

6- Cleaning of a multiple SAM file

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

1- Galaxy

Serveur principal: http://main.g2.bx.psu.edu/

CIRAD Server : http://gohelle.cirad.fr/galaxy/

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

TOOLS DATA

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

WEB APPLICATION - “Click'n'Play” system- transparent for user

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks

Alexis Dereeper CIBA courses – Brasil 2011

WEB APPLICATION - “Click'n'Play” system- transparent for user

Alexis Dereeper, François Sabot

MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...

Alexis Dereeper CIBA courses – Brasil 2011

MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks

WEB APPLICATION - “Click'n'Play” system- transparent for user

Alexis Dereeper, François Sabot

BUT- Simple support- Much less powerful than terminal- Only for routine analysis - Only for limited data

Alexis Dereeper CIBA courses – Brasil 2011

MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...

MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks

WEB APPLICATION - “Click'n'Play” system- transparent for user

CONNECTION FOR THE TUTORIAL CLASSES:

http://gohelle.cirad.fr/galaxy/

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Connecting...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Add data...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Import data from Galaxy libraries

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Import data from Galaxy libraries

FASTQ file → TEXT file

STRUCTURE:

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTACGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

@HWUSI-EAS454_0006:1:37:16314:3410#CTTGTAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG

+`bTbbccccceeeeeceeeecccYeedded`ceec]dddde^a`deeeec\`dddcbaadadYd`]]Jc_^bc^^\

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

SEQUENCE NAME

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

IUPAC SEQUENCE

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Quality in ASCII

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper CIBA courses – Brasil 2011

f → Quality = 38 (102 – 64)

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper, François Sabot

WHAT IS QUALITY ?

Quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect).

Alexis Dereeper CIBA courses – Brasil 2011

FASTQC: quality control

http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#fastqc

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Why do we need to clean ?

To remove remaining adapters/primers andlow quality sequences

→ CutAdapt

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

20

7

Alexis Dereeper CIBA courses – Brasil 2011

70

Your data are now ready to be analyzed...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Concatenate files

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Untested Tools → NGS → Assembly → Assemble with MIRA

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

BLAST of putative contigs against reference

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

BLAST of putative contigs against reference

Separate sequences by original individuals RC1, RC2...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Separate sequences by original individuals RC1, RC2...

Separate sequences by original individuals RC1, RC2...

Use of regular expression via Galaxy:

→ RC[13456789] & remove reads => keep RC2

→ RC[123456789]_ & remove reads => keep RC10

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Separate sequences by original individuals RC1, RC2...

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

Alexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

2- Associate positions of each member of the pair

Alexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

2- Associate positions of each member of the pair

3- Selection of the more probable position respecting the conditions

Alexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

2- Associate positions of each member of the pair

3- Select of the more probable position respecting the conditions

4- Edit a SAM output file

Alexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Reference From History: Shared data/Formation/PreProcess/reference.fasta

Library: Paired-end

FASTQ files: From your history

BWA setting to use: Commonly Used

Unselect “Suppress the header in the output SAM file”

Click

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

SAM output file (Sequence Alignment/Map)

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François Sabot

Sort of SAM file by coordinate

Alexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Creation of Workflow for automated analysis

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Workflow: how to avoid to run all the process by hand

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011