Alexis Dereeper, François Sabot

66
Alexis Dereeper, François Analysis of NGS raw data with Galaxy Cleaning, data control, alignment, polymorphism Alexis Dereeper CIBA courses – Brasil 2011

description

Analysis of NGS raw data with Galaxy. Cleaning, data control, alignment, polymorphism. CIBA courses – Brasil 2011. Alexis Dereeper. Alexis Dereeper, François Sabot. Aim of the Tutorial classes: 1- Galaxy vs Command line 2- Understand FASTQ files 3- Cleaning of Illumina data (FASTQ) - PowerPoint PPT Presentation

Transcript of Alexis Dereeper, François Sabot

Page 1: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Analysis of NGS raw data with Galaxy

Cleaning, data control, alignment, polymorphism

Alexis Dereeper CIBA courses – Brasil 2011

Page 2: Alexis Dereeper, François Sabot

Aim of the Tutorial classes:

1- Galaxy vs Command line

2- Understand FASTQ files

3- Cleaning of Illumina data (FASTQ)

4- Perform an assembly

5- Perform a mapping of Illumina reads on a reference sequence

6- Cleaning of a multiple SAM file

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 3: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 4: Alexis Dereeper, François Sabot

1- Galaxy

Serveur principal: http://main.g2.bx.psu.edu/

CIRAD Server : http://gohelle.cirad.fr/galaxy/

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 5: Alexis Dereeper, François Sabot

TOOLS DATA

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 6: Alexis Dereeper, François Sabot

WEB APPLICATION - “Click'n'Play” system- transparent for user

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 7: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks

Alexis Dereeper CIBA courses – Brasil 2011

WEB APPLICATION - “Click'n'Play” system- transparent for user

Page 8: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...

Alexis Dereeper CIBA courses – Brasil 2011

MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks

WEB APPLICATION - “Click'n'Play” system- transparent for user

Page 9: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

BUT- Simple support- Much less powerful than terminal- Only for routine analysis - Only for limited data

Alexis Dereeper CIBA courses – Brasil 2011

MULTIPLE- Based on a web server (Apache...)- On a single machine, or a cluster...

MODULAR- Numerous default bricks (already integrated)- Adding of customizable bricks

WEB APPLICATION - “Click'n'Play” system- transparent for user

Page 10: Alexis Dereeper, François Sabot

CONNECTION FOR THE TUTORIAL CLASSES:

http://gohelle.cirad.fr/galaxy/

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 11: Alexis Dereeper, François Sabot

Connecting...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 12: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 13: Alexis Dereeper, François Sabot

Add data...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 14: Alexis Dereeper, François Sabot

Import data from Galaxy libraries

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 15: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Import data from Galaxy libraries

Page 16: Alexis Dereeper, François Sabot

FASTQ file → TEXT file

STRUCTURE:

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTACGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

@HWUSI-EAS454_0006:1:37:16314:3410#CTTGTAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGTGGTGGCCG

+`bTbbccccceeeeeceeeecccYeedded`ceec]dddde^a`deeeec\`dddcbaadadYd`]]Jc_^bc^^\

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 17: Alexis Dereeper, François Sabot

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 18: Alexis Dereeper, François Sabot

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

SEQUENCE NAME

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 19: Alexis Dereeper, François Sabot

IUPAC SEQUENCE

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Page 20: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Page 21: Alexis Dereeper, François Sabot

Quality in ASCII

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+

cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Page 22: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 23: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Alexis Dereeper CIBA courses – Brasil 2011

Page 24: Alexis Dereeper, François Sabot

f → Quality = 38 (102 – 64)

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

@HWUSI-EAS454_0006:1:112:14105:5498#CTTGTA

CGCCAAGAAGTGTAGCAAAACGGCAGAGCTCGTGGATTAAACAAACAGAGGATTTCGGTGAGGATTGAGGGGGAGT

+cfffcfeffdeefefffcffffffffcffeffffdffffafcfffffdffffdfefeddf^eececfffdfcbffb

Page 25: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

WHAT IS QUALITY ?

Quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect).

Alexis Dereeper CIBA courses – Brasil 2011

Page 26: Alexis Dereeper, François Sabot

FASTQC: quality control

http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#fastqc

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 27: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 28: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 29: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 30: Alexis Dereeper, François Sabot

Why do we need to clean ?

To remove remaining adapters/primers andlow quality sequences

→ CutAdapt

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 31: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

20

7

Alexis Dereeper CIBA courses – Brasil 2011

70

Page 32: Alexis Dereeper, François Sabot

Your data are now ready to be analyzed...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 33: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 34: Alexis Dereeper, François Sabot

Concatenate files

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 35: Alexis Dereeper, François Sabot

Untested Tools → NGS → Assembly → Assemble with MIRA

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 36: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 37: Alexis Dereeper, François Sabot

BLAST of putative contigs against reference

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 38: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

BLAST of putative contigs against reference

Page 39: Alexis Dereeper, François Sabot

Separate sequences by original individuals RC1, RC2...

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 40: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Separate sequences by original individuals RC1, RC2...

Page 41: Alexis Dereeper, François Sabot

Separate sequences by original individuals RC1, RC2...

Use of regular expression via Galaxy:

→ RC[13456789] & remove reads => keep RC2

→ RC[123456789]_ & remove reads => keep RC10

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 42: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Separate sequences by original individuals RC1, RC2...

Page 43: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

Alexis Dereeper CIBA courses – Brasil 2011

Page 44: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

2- Associate positions of each member of the pair

Alexis Dereeper CIBA courses – Brasil 2011

Page 45: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

2- Associate positions of each member of the pair

3- Selection of the more probable position respecting the conditions

Alexis Dereeper CIBA courses – Brasil 2011

Page 46: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Mapping: Map 'pair-end‘ reads on a reference

1- Compute positions for each read

2- Associate positions of each member of the pair

3- Select of the more probable position respecting the conditions

4- Edit a SAM output file

Alexis Dereeper CIBA courses – Brasil 2011

Page 47: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 48: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 49: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 50: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 51: Alexis Dereeper, François Sabot

Reference From History: Shared data/Formation/PreProcess/reference.fasta

Library: Paired-end

FASTQ files: From your history

BWA setting to use: Commonly Used

Unselect “Suppress the header in the output SAM file”

Click

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 52: Alexis Dereeper, François Sabot

SAM output file (Sequence Alignment/Map)

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 53: Alexis Dereeper, François Sabot

Alexis Dereeper, François Sabot

Sort of SAM file by coordinate

Alexis Dereeper CIBA courses – Brasil 2011

Page 54: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 55: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 56: Alexis Dereeper, François Sabot

Creation of Workflow for automated analysis

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 57: Alexis Dereeper, François Sabot

Workflow: how to avoid to run all the process by hand

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 58: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 59: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 60: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 61: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 62: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 63: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 64: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 65: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011

Page 66: Alexis Dereeper, François Sabot

Alexis Dereeper, François SabotAlexis Dereeper CIBA courses – Brasil 2011