Next Generation Sequencing & Transcriptome Analysis
-
Upload
bastian-greshake -
Category
Education
-
view
16.200 -
download
4
description
Transcript of Next Generation Sequencing & Transcriptome Analysis
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
AND HOW TO USE THE DATA GENERATED
FOR TRANSCRIPTOMICS
METHODS
METHODS
454 SEQUENCING
SOLEXA / ILLUMINA
SOLID
454 SEQUENCING
SEQUENCING BY SYNTHESIS
PYROSEQUENCING
> 400 BASEPAIRS IN A SINGLE READ
454 SEQUENCING
454 SEQUENCING
454 SEQUENCING
454 SEQUENCING
REPEATS OF SINGLE NUCLEOTIDES ARE DETECTED BY SIGNAL STRENGTH
WORKS FOR UP TO 8 CONSECUTIVE BASES
SOLEXA / ILLUMINA
AGAIN: SEQUENCING BY SYNTHESIS
ANOTHER DETECTION-APPROACH
UP TO 100 BASEPAIRS IN A SINGLE READ
SOLEXA / ILLUMINA
T
A
C
C
G
G
...
...
SOLEXA / ILLUMINA
TG
C
AT
A
C
C
G
G
...
...
SOLEXA / ILLUMINA
TG
C
AT
A
C
C
G
G
...
...
SOLEXA / ILLUMINA
TG
C
AT
A
C
C
G
G
...
...
ADVANTAGES OF NGS
CAN RUN IN PARALLEL
PREPERATION CAN BE AUTOMATED
MUCH CHEAPER WHEN COMPARED TO TRADITIONAL SEQUENCING
TRANSCRIPTOME ANALYSIS
ALLOWS FOR EXPRESSION CHANGES IN:
DIFFERENT CELL TYPES
DIFFERENT CONDITIONS OF THE ENVIRONMENT
DISEASES
DIFFERENT DEVELOPMENTAL STAGES
TRANSCRIPTOME ANALYSIS
CAN BE USED TO IDENTIFY NEW GENES
CAN BE APPLIED TO NON-MODEL ORGANISMS
HOW TO ANALYSE TRANSCRIPTOMES
TRADITIONALLY: EXPRESSED SEQUENCE TAGS (ESTS)
USING NGS: RNA-SEQ
FIRST STEP: GET THE DATA
ESTS
DONE USING SHOTGUN-SEQUENCING
TAKES CLONES OF EXPRESSED MRNA
CHEAP TO PRODUCE
RNA-SEQ
SAME PRINCIPLE:
GET AVAILABLE MRNA
THEN SEQUENCING IN PARALLEL VIA NGS
RNA-SEQ
SAME PRINCIPLE:
GET AVAILABLE MRNA
THEN SEQUENCING IN PARALLEL VIA NGS
RNA-SEQ == EST + NGS
HOW TO ANALYSE TRANSCRIPTOMES
ASSEMBLY OF READS
DETECTION OF SNPS
GENE ANNOTATION
DETECTION OF OPEN READING FRAMES
DETECTION OF HOMOLOGOUS GENES
ASSEMBLY
CAP3
MIRA
...
AVAILABLE TOOLS:
CAP3
SMITH-WATERMAN TO CLIP BAD ENDINGS
GLOBAL ALIGNMENT TO FIND FALSE OVERLAPS
MIRA
COMBINES ASSEMBLY & SNP-DETECTION
USES:
TRACE FILES
TEMPLATE INSERT INFORMATION
REDUNDANCY
MIRA
FAST READ COMPARISON TO DETECT POTENTIAL OVERLAPS
CONFIRMS OVERLAPS USING SMITH-WATERMAN AND CREATES ALIGNMENTS
ASSEMBLES READ-PAIRS BY FINDING BEST PATH
CHECKS ASSEMBLIES FOR ERRORS AND BEGINS AGAIN
MIRATHE WORKFLOW
MIRA
RESULTS:
CONSENSUS CONTIGS MADE OF READS THAT OVERLAP
SNPS THAT ARE CALLED DURING ASSEMBLY PROCESS
SNP DETECTION
TOOLS:
MIRA
QUALITYSNP
AND SOME MORE
QUALITYSNP
USES CAP3-FILES
INPUT: CLUSTERS OF POTENTIAL HAPLOTYPES
CALCULATES SIMILARITY BETWEEN SEQUENCES TO CONSTRUCT HAPLOTYPES AND REMOVES PARALOGS
QUALITYSNP
REMOVES HAPLOTYPES THAT CONSIST OF ONLY ONE SEQUENCE
DETECTS SYNONYMOUS AND NON-SYNONYMOUS SNPS
PROVIDES A WEB-FRONTEND CALLED HAPLOSNPER
HOMOLOGY DETECTION
ALLOWS TO FIND GENES THAT SHARE AN ANCESTOR
USUALLY ONE SEARCHES AGAINST A DATABASE
HOMOLOGY DETECTION
DIFFERENT KIND OF SEARCHES:
PROTEIN AGAINST PROTEIN
NUCLEOTIDE AGAINST NUCLEOTIDE
PROTEIN AGAINST NUCLEOTIDE
NUCLEOTIDE AGAINST PROTEIN
HOMOLOGY DETECTION
TOOLS:
BLAST
FASTX / FASTY
HMMER
PATTERNHUNTER
BLAST
AVAILABLE FOR ALL TYPES OF COMPARISONS
ONE OF THE OLDEST ALGORITHMS
WIDELY USED
SPEED OVER SENSITIVITY
FASTX / FASTY
PARTS OF FASTA
COMPARE NUCLEOTIDES AGAINST PROTEINS
DETERMINES A HYPOTHESIZED CODING REGION (HCR)
FASTX IS FASTER, FASTY IS MORE ACCURATE
HMMER
PROTEIN-QUERIES AGAINST PROTEIN-DATABASE
USES HIDDEN MARKOV MODELS
MAPS SMITH-WATERMAN PARAMETERS ONTO A PROBABILISTIC MODEL
IMPROVES ACCURACY
PATTERNHUNTER
NUCLEOTIDE-QUERIES AGAINST OTHER NUCLEOTIDE-SEQUENCES
USES NON-CONSECUTIVE SEEDS FOR INCREASED SENSITIVITY
COMPARES HUMAN GENOME TO MOUSE GENOME IN 20 CPU-DAYS
ORF DETECTION
READING FRAMES CAN BE DETECTED IN EST-DATA
ALLOWS TO SCREEN FOR PREVIOUSLY UNKNOWN GENES
ALLOWS TO GIVE A POTENTIAL PROTEIN SEQUENCE
ORF DETECTION
TOOLS:
ESTSCAN
ORFPREDICTOR
...
ESTSCAN
USES HIDDEN MARKOV MODELS
ROBUST FOR FRAMESHIFT ERRORS
SENSITIVE ( 5 % FN, 18 % FP)
ORFPREDICTOR
WEB-BASED
USES BLASTX AS GUIDELINE IF POSSIBLE
USES A DEFINED RULESET FOR DEFINING ORFS
ORFPREDICTOR
GENE ANNOTATION
BLAST2GO VIA GENE ONTOLOGY
FINDS HOMOLOG GENES TO ANNOTATE FUNCTIONS OF GENE OF INTEREST
GENE ONTOLOGY
3 ONTOLOGIES:
MOLECULAR FUNCTION
CELLULAR COMPONENTS
BIOLOGICAL PROCESS
CONCLUSIONS
NGS PROVIDES A FAST AND CHEAP WAY TO GENERATE DATA
TONS OF TOOLS EXIST TO ANALYSE TRANSCRIPTOME DATA
ALL TOOLS HAVE THEIR OWN PROS & CONTRAS
MOST OF THOSE TOOLS ARE UNSUITABLE FOR A „NORMAL USER“