Detection of chimeric sequences from PCR artefacts

21
Detection of chimeric sequences from PCR artefacts Thomas Huber [email protected] Computational Biology and Bioinformatics Environment ComBinE Departments of Biochemistry & Mathematics The University of Queensland

description

Detection of chimeric sequences from PCR artefacts. Thomas Huber [email protected] Computational Biology and Bioinformatics Environment ComBinE Departments of Biochemistry & Mathematics The University of Queensland. What are PCR-generated chimeric sequence?. Prematurely terminated amplicon - PowerPoint PPT Presentation

Transcript of Detection of chimeric sequences from PCR artefacts

Page 1: Detection of chimeric sequences from PCR artefacts

Detection of chimeric sequences from PCR artefacts

Thomas Huber [email protected]

Computational Biology andBioinformatics Environment

ComBinE Departments of Biochemistry & Mathematics

The University of Queensland

Page 2: Detection of chimeric sequences from PCR artefacts

What are PCR-generated chimeric sequence?

• Prematurely terminated amplicon

• Re-annealing with foreign DNA• Copied to completion in

following PCR cycle

• Artificial sequence from 2 parent sequences

From: http://www.gnis-pedagogie.org

Page 3: Detection of chimeric sequences from PCR artefacts

Are chimeric sequence a problem?

• Culture independent surveys of microbial communities– Chimeric sequences suggest non-existing

organisms 0.5-5% of all sequences are PCR artefacts

• Why bother with such a small artefact?– Signal vs Noise

• 100 times repetition of same survey (5% chimeras): ratio of existing:non-existing organisms = 1:5

Page 4: Detection of chimeric sequences from PCR artefacts

Detection of chimeras:1. Alignment to reference sequences

• Each target sequence in turn– Align to ref. sequences– if alignment to a single

sequence gives better match then alignment to two sequences:

No chimera– else:

Chimera !!

(Cole et al., 2003; Komatsoulis and Waterman, 1997, …)

Page 5: Detection of chimeric sequences from PCR artefacts

Problems

• Database contamination– More and more chimeras accumulate

• Database coverage– Parent sequences are not necessarily in

database

Page 6: Detection of chimeric sequences from PCR artefacts

2. Partial tree building approach

• Align sequence to existing sequences (build MSA)

• Divide MSA at postulated conversion point

• Construct 2 trees• Compare consistency

of phylogeny

(Wang and Wang, 1997; Hugenholtz , 2003)

1

2

3

4

53

4

5

2

1

Page 7: Detection of chimeric sequences from PCR artefacts

3. Bellerophon approach

• Just like “partial tree building”, but:– MSA from PCR library

• More likely to contain parent sequence– No trees are actually built– All possible conversion points are tested

Page 8: Detection of chimeric sequences from PCR artefacts

How Bellerophon works

• Compute MSA• for each conversion point:

– 2 windows left/right• Calculate all “distances”

between sequence– Instead of comparing trees,

compare distance matrices

n

i

n

j

rightleft jidmjidmdme ]][[]][[

Page 9: Detection of chimeric sequences from PCR artefacts

How Bellerophon works (cont.)

• Chimeric sequence will result in large dme

• Chimera detection:– Exclude sequence– Observe change of dme

][

][idme

dmeipreference

Page 10: Detection of chimeric sequences from PCR artefacts

How Bellerophon works (cont.)

• Chimeric sequence will result in large dme

• Chimera detection:– Exclude sequence– Observe change of dme

][

][idme

dmeipreference

n

j

rightleft jidmjidmicol ]][[]][[][

])[2(][

icoldmedmeipreference

• Expensive to calculate (O(n3))

• Speedy way

n

i

n

j

rightleft jidmjidmdme ]][[]][[

Page 11: Detection of chimeric sequences from PCR artefacts

Bellerophon user interface

Page 12: Detection of chimeric sequences from PCR artefacts

Example output

Title line

Page 13: Detection of chimeric sequences from PCR artefacts

Example output

Title line

Job parameter

Page 14: Detection of chimeric sequences from PCR artefacts

Example output

Title line

Job parameter

!! Advice !!

Chi

mer

a ou

tput

Page 15: Detection of chimeric sequences from PCR artefacts

Example output

Title line

Job parameter

!! Advice !!

Chi

mer

a ou

tput

Preference score (only relative)Conversion points

Sequence identities across windows

IDs of chimera and parents

Page 16: Detection of chimeric sequences from PCR artefacts

Server usage

0

50

100

150

200

250

300

350

400

450

500

Mar-03

Apr-03

May-03

Jun-03

Jul-03

Aug-03

Sep-03

Oct-03

Nov-03

Dec-03

Jan-04

Feb-04

Mar-04

Apr-04

May-04

Jun-04

Jul-04

Aug-04

Sep-04

Oct-04

Nov-04

Dec-04

Jan-05

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

http://foo.maths.uq.edu.au/~huber/bellerophon.pl

Bellerophon: Number of jobs processed

Page 17: Detection of chimeric sequences from PCR artefacts

Who uses Bellerophon?

Page 18: Detection of chimeric sequences from PCR artefacts

What Bellerophon does/does not do!

• Bellerophon does not determine chimeric sequences !!

• It merely indicates putative chimeras• You must confirm them !

Page 19: Detection of chimeric sequences from PCR artefacts

Current developments

• Bellerophon 2– For large PCR libraries (or single sequences)

• A smaller library of related sequences is selected for each target sequence

– Cost reduction from O(n3) to something more tractable

– Cleaning up sequence databases• Web services• Large scale data statistics on chimeras

Page 20: Detection of chimeric sequences from PCR artefacts

Bellerophon web services

• Sporadic user (web page interface)– Interactive / manual use– Easy to understand, convenient to use

• Large scale users have different needs– E.g. JGI’s microbial ecology pipeline– Easy to implement/use interface that allows automatic

submission and processing of data Web services

• Standardised protocol (SOAP, WSDL)• Remote service calls from own scripts and programs• Not a mirror. All Bellerophon services are maintained in

Brisbane

Page 21: Detection of chimeric sequences from PCR artefacts

Large scale data statistics on chimeras

• How much chimeras to expect in a PCR library– Differences in phyla?

• Is recombination in 16S rRNA a random event?– Structural bias?