Next Generation Sequencing & Transcriptome Analysis

46
NEXT GENERATION SEQUENCING

description

How to use next generation sequencing in transcriptomics and how to analyse those data.

Transcript of Next Generation Sequencing & Transcriptome Analysis

Page 1: Next Generation Sequencing & Transcriptome Analysis

NEXT GENERATION SEQUENCING

Page 2: Next Generation Sequencing & Transcriptome Analysis

NEXT GENERATION SEQUENCING

AND HOW TO USE THE DATA GENERATED

FOR TRANSCRIPTOMICS

Page 3: Next Generation Sequencing & Transcriptome Analysis

METHODS

Page 4: Next Generation Sequencing & Transcriptome Analysis

METHODS

454 SEQUENCING

SOLEXA / ILLUMINA

SOLID

Page 5: Next Generation Sequencing & Transcriptome Analysis

454 SEQUENCING

SEQUENCING BY SYNTHESIS

PYROSEQUENCING

> 400 BASEPAIRS IN A SINGLE READ

Page 6: Next Generation Sequencing & Transcriptome Analysis

454 SEQUENCING

Page 7: Next Generation Sequencing & Transcriptome Analysis

454 SEQUENCING

Page 8: Next Generation Sequencing & Transcriptome Analysis

454 SEQUENCING

Page 9: Next Generation Sequencing & Transcriptome Analysis

454 SEQUENCING

REPEATS OF SINGLE NUCLEOTIDES ARE DETECTED BY SIGNAL STRENGTH

WORKS FOR UP TO 8 CONSECUTIVE BASES

Page 10: Next Generation Sequencing & Transcriptome Analysis

SOLEXA / ILLUMINA

AGAIN: SEQUENCING BY SYNTHESIS

ANOTHER DETECTION-APPROACH

UP TO 100 BASEPAIRS IN A SINGLE READ

Page 11: Next Generation Sequencing & Transcriptome Analysis

SOLEXA / ILLUMINA

T

A

C

C

G

G

...

...

Page 12: Next Generation Sequencing & Transcriptome Analysis

SOLEXA / ILLUMINA

TG

C

AT

A

C

C

G

G

...

...

Page 13: Next Generation Sequencing & Transcriptome Analysis

SOLEXA / ILLUMINA

TG

C

AT

A

C

C

G

G

...

...

Page 14: Next Generation Sequencing & Transcriptome Analysis

SOLEXA / ILLUMINA

TG

C

AT

A

C

C

G

G

...

...

Page 15: Next Generation Sequencing & Transcriptome Analysis

ADVANTAGES OF NGS

CAN RUN IN PARALLEL

PREPERATION CAN BE AUTOMATED

MUCH CHEAPER WHEN COMPARED TO TRADITIONAL SEQUENCING

Page 16: Next Generation Sequencing & Transcriptome Analysis

TRANSCRIPTOME ANALYSIS

ALLOWS FOR EXPRESSION CHANGES IN:

DIFFERENT CELL TYPES

DIFFERENT CONDITIONS OF THE ENVIRONMENT

DISEASES

DIFFERENT DEVELOPMENTAL STAGES

Page 17: Next Generation Sequencing & Transcriptome Analysis

TRANSCRIPTOME ANALYSIS

CAN BE USED TO IDENTIFY NEW GENES

CAN BE APPLIED TO NON-MODEL ORGANISMS

Page 18: Next Generation Sequencing & Transcriptome Analysis

HOW TO ANALYSE TRANSCRIPTOMES

TRADITIONALLY: EXPRESSED SEQUENCE TAGS (ESTS)

USING NGS: RNA-SEQ

FIRST STEP: GET THE DATA

Page 19: Next Generation Sequencing & Transcriptome Analysis

ESTS

DONE USING SHOTGUN-SEQUENCING

TAKES CLONES OF EXPRESSED MRNA

CHEAP TO PRODUCE

Page 20: Next Generation Sequencing & Transcriptome Analysis

RNA-SEQ

SAME PRINCIPLE:

GET AVAILABLE MRNA

THEN SEQUENCING IN PARALLEL VIA NGS

Page 21: Next Generation Sequencing & Transcriptome Analysis

RNA-SEQ

SAME PRINCIPLE:

GET AVAILABLE MRNA

THEN SEQUENCING IN PARALLEL VIA NGS

RNA-SEQ == EST + NGS

Page 22: Next Generation Sequencing & Transcriptome Analysis

HOW TO ANALYSE TRANSCRIPTOMES

ASSEMBLY OF READS

DETECTION OF SNPS

GENE ANNOTATION

DETECTION OF OPEN READING FRAMES

DETECTION OF HOMOLOGOUS GENES

Page 23: Next Generation Sequencing & Transcriptome Analysis

ASSEMBLY

CAP3

MIRA

...

AVAILABLE TOOLS:

Page 24: Next Generation Sequencing & Transcriptome Analysis

CAP3

SMITH-WATERMAN TO CLIP BAD ENDINGS

GLOBAL ALIGNMENT TO FIND FALSE OVERLAPS

Page 25: Next Generation Sequencing & Transcriptome Analysis

MIRA

COMBINES ASSEMBLY & SNP-DETECTION

USES:

TRACE FILES

TEMPLATE INSERT INFORMATION

REDUNDANCY

Page 26: Next Generation Sequencing & Transcriptome Analysis

MIRA

FAST READ COMPARISON TO DETECT POTENTIAL OVERLAPS

CONFIRMS OVERLAPS USING SMITH-WATERMAN AND CREATES ALIGNMENTS

ASSEMBLES READ-PAIRS BY FINDING BEST PATH

CHECKS ASSEMBLIES FOR ERRORS AND BEGINS AGAIN

Page 27: Next Generation Sequencing & Transcriptome Analysis

MIRATHE WORKFLOW

Page 28: Next Generation Sequencing & Transcriptome Analysis

MIRA

RESULTS:

CONSENSUS CONTIGS MADE OF READS THAT OVERLAP

SNPS THAT ARE CALLED DURING ASSEMBLY PROCESS

Page 29: Next Generation Sequencing & Transcriptome Analysis

SNP DETECTION

TOOLS:

MIRA

QUALITYSNP

AND SOME MORE

Page 30: Next Generation Sequencing & Transcriptome Analysis

QUALITYSNP

USES CAP3-FILES

INPUT: CLUSTERS OF POTENTIAL HAPLOTYPES

CALCULATES SIMILARITY BETWEEN SEQUENCES TO CONSTRUCT HAPLOTYPES AND REMOVES PARALOGS

Page 31: Next Generation Sequencing & Transcriptome Analysis

QUALITYSNP

REMOVES HAPLOTYPES THAT CONSIST OF ONLY ONE SEQUENCE

DETECTS SYNONYMOUS AND NON-SYNONYMOUS SNPS

PROVIDES A WEB-FRONTEND CALLED HAPLOSNPER

Page 32: Next Generation Sequencing & Transcriptome Analysis

HOMOLOGY DETECTION

ALLOWS TO FIND GENES THAT SHARE AN ANCESTOR

USUALLY ONE SEARCHES AGAINST A DATABASE

Page 33: Next Generation Sequencing & Transcriptome Analysis

HOMOLOGY DETECTION

DIFFERENT KIND OF SEARCHES:

PROTEIN AGAINST PROTEIN

NUCLEOTIDE AGAINST NUCLEOTIDE

PROTEIN AGAINST NUCLEOTIDE

NUCLEOTIDE AGAINST PROTEIN

Page 34: Next Generation Sequencing & Transcriptome Analysis

HOMOLOGY DETECTION

TOOLS:

BLAST

FASTX / FASTY

HMMER

PATTERNHUNTER

Page 35: Next Generation Sequencing & Transcriptome Analysis

BLAST

AVAILABLE FOR ALL TYPES OF COMPARISONS

ONE OF THE OLDEST ALGORITHMS

WIDELY USED

SPEED OVER SENSITIVITY

Page 36: Next Generation Sequencing & Transcriptome Analysis

FASTX / FASTY

PARTS OF FASTA

COMPARE NUCLEOTIDES AGAINST PROTEINS

DETERMINES A HYPOTHESIZED CODING REGION (HCR)

FASTX IS FASTER, FASTY IS MORE ACCURATE

Page 37: Next Generation Sequencing & Transcriptome Analysis

HMMER

PROTEIN-QUERIES AGAINST PROTEIN-DATABASE

USES HIDDEN MARKOV MODELS

MAPS SMITH-WATERMAN PARAMETERS ONTO A PROBABILISTIC MODEL

IMPROVES ACCURACY

Page 38: Next Generation Sequencing & Transcriptome Analysis

PATTERNHUNTER

NUCLEOTIDE-QUERIES AGAINST OTHER NUCLEOTIDE-SEQUENCES

USES NON-CONSECUTIVE SEEDS FOR INCREASED SENSITIVITY

COMPARES HUMAN GENOME TO MOUSE GENOME IN 20 CPU-DAYS

Page 39: Next Generation Sequencing & Transcriptome Analysis

ORF DETECTION

READING FRAMES CAN BE DETECTED IN EST-DATA

ALLOWS TO SCREEN FOR PREVIOUSLY UNKNOWN GENES

ALLOWS TO GIVE A POTENTIAL PROTEIN SEQUENCE

Page 40: Next Generation Sequencing & Transcriptome Analysis

ORF DETECTION

TOOLS:

ESTSCAN

ORFPREDICTOR

...

Page 41: Next Generation Sequencing & Transcriptome Analysis

ESTSCAN

USES HIDDEN MARKOV MODELS

ROBUST FOR FRAMESHIFT ERRORS

SENSITIVE ( 5 % FN, 18 % FP)

Page 42: Next Generation Sequencing & Transcriptome Analysis

ORFPREDICTOR

WEB-BASED

USES BLASTX AS GUIDELINE IF POSSIBLE

USES A DEFINED RULESET FOR DEFINING ORFS

Page 43: Next Generation Sequencing & Transcriptome Analysis

ORFPREDICTOR

Page 44: Next Generation Sequencing & Transcriptome Analysis

GENE ANNOTATION

BLAST2GO VIA GENE ONTOLOGY

FINDS HOMOLOG GENES TO ANNOTATE FUNCTIONS OF GENE OF INTEREST

Page 45: Next Generation Sequencing & Transcriptome Analysis

GENE ONTOLOGY

3 ONTOLOGIES:

MOLECULAR FUNCTION

CELLULAR COMPONENTS

BIOLOGICAL PROCESS

Page 46: Next Generation Sequencing & Transcriptome Analysis

CONCLUSIONS

NGS PROVIDES A FAST AND CHEAP WAY TO GENERATE DATA

TONS OF TOOLS EXIST TO ANALYSE TRANSCRIPTOME DATA

ALL TOOLS HAVE THEIR OWN PROS & CONTRAS

MOST OF THOSE TOOLS ARE UNSUITABLE FOR A „NORMAL USER“