Research Article Multiplex Degenerate Primer Design for ...

9
Research Article Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes Shea N. Gardner, 1 Crystal J. Jaing, 2 Maher M. Elsheikh, 2 José Peña, 2 David A. Hysom, 1 and Monica K. Borucki 2 1 Computations, Lawrence Livermore National Laboratory (LLNL), Livermore, CA 94550, USA 2 Physical and Life Sciences/Global Security, Lawrence Livermore National Laboratory (LLNL), Livermore, CA 94550, USA Correspondence should be addressed to Shea N. Gardner; [email protected] Received 30 May 2014; Accepted 14 July 2014; Published 3 August 2014 Academic Editor: Paul Harrison Copyright © 2014 Shea N. Gardner et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results. A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. e new script, run tiled primers, is part of the PriMux soſtware. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Riſt Valley fever virus, and Japanese encephalitis virus. Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions. is soſtware should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family. 1. Background Sequencing whole genomes of potentially heterogeneous or divergent viruses can be challenging from a small or complex sample with low viral concentrations. Deep sequencing to detect rare viral variants or metagenomic sequencing to geno- type viruses from a complex background requires targeted viral amplification. Techniques such as consensus PCR, Ion Ampliseq (Life Technologies) [1], TruSeq Amplicon (Illu- mina), and Haloplex (Agilent) [2] apply highly multiplexed PCR for target enrichment. Targeted enrichment should pref- erentially amplify the target virus over host or environmental DNA/RNA, in contrast to random amplification commonly used prior to whole genome sequencing. Primers designed to tile amplicons across a set of related viral genomes prior to sequencing can enrich whole viral genomes or large regions. However, high levels of intraspecific sequence variation combined with low virus concentrations mean that standard PCR primer design from a reference may fail due to mutations in the sample virus that prevent primer binding. To address this problem, we added a capability to the PriMux soſtware distribution (http://sourceforge.net/projects/primux/) called run tiled primers that applies the PriMux soſtware [3] to automate PCR primer design to achieve a near-minimal set of conserved, degenerate, multiplex-compatible primers designed to tile overlapping regions across multiple related whole genomes or regions. JCVI has an automated degenerate PCR primer design system called JCVI Primer Designer, which is similar to run tiled primers in that it designs degenerate primers to tile across viral genomes [4]. e major difference is that it begins with a consensus sequence containing degenerate bases and selects primers with fewer than 3 or 4 degenerate bases, so that in the end a majority of strains are amplified, Hindawi Publishing Corporation Advances in Bioinformatics Volume 2014, Article ID 101894, 8 pages http://dx.doi.org/10.1155/2014/101894

Transcript of Research Article Multiplex Degenerate Primer Design for ...

Page 1: Research Article Multiplex Degenerate Primer Design for ...

Research ArticleMultiplex Degenerate Primer Design for Targeted WholeGenome Amplification of Many Viral Genomes

Shea N Gardner1 Crystal J Jaing2 Maher M Elsheikh2 Joseacute Pentildea2

David A Hysom1 and Monica K Borucki2

1 Computations Lawrence Livermore National Laboratory (LLNL) Livermore CA 94550 USA2 Physical and Life SciencesGlobal Security Lawrence Livermore National Laboratory (LLNL) Livermore CA 94550 USA

Correspondence should be addressed to Shea N Gardner gardner26llnlgov

Received 30 May 2014 Accepted 14 July 2014 Published 3 August 2014

Academic Editor Paul Harrison

Copyright copy 2014 Shea N Gardner et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Background Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samplesDegenerate primers that anneal to conserved regions can facilitate amplification of divergent low concentration variants evenwhenthe strain present is unknown Results A tool for designing multiplex sets of degenerate sequencing primers to tile overlappingamplicons across multiple whole genomes is described The new script run tiled primers is part of the PriMux softwarePrimers were designed for each segment of South American hemorrhagic fever viruses tick-borne encephalitis HenipavirusesArenaviruses Filoviruses Crimean-Congo hemorrhagic fever virus Rift Valley fever virus and Japanese encephalitis virus Eachgroup is highly diverse with as little as 5 genome consensus Primer sets were computationally checked for nontarget crossreactions against the NCBI nucleotide sequence database Primers for murine hepatitis virus were demonstrated in the lab tospecifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivoConclusions This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior tosequencing to obtain better coverage of low titer divergent viruses Applications include viral discovery from a complex backgroundand improved sensitivity and coverage of rapidly evolving strains or variants in a gene family

1 Background

Sequencing whole genomes of potentially heterogeneous ordivergent viruses can be challenging from a small or complexsample with low viral concentrations Deep sequencing todetect rare viral variants ormetagenomic sequencing to geno-type viruses from a complex background requires targetedviral amplification Techniques such as consensus PCR IonAmpliseq (Life Technologies) [1] TruSeq Amplicon (Illu-mina) and Haloplex (Agilent) [2] apply highly multiplexedPCR for target enrichment Targeted enrichment should pref-erentially amplify the target virus over host or environmentalDNARNA in contrast to random amplification commonlyused prior to whole genome sequencing Primers designed totile amplicons across a set of related viral genomes prior tosequencing can enrich whole viral genomes or large regionsHowever high levels of intraspecific sequence variation

combined with low virus concentrations mean that standardPCRprimer design froma referencemay fail due tomutationsin the sample virus that prevent primer binding To addressthis problem we added a capability to the PriMux softwaredistribution (httpsourceforgenetprojectsprimux) calledrun tiled primers that applies the PriMux software [3] toautomate PCR primer design to achieve a near-minimalset of conserved degenerate multiplex-compatible primersdesigned to tile overlapping regions across multiple relatedwhole genomes or regions

JCVI has an automated degenerate PCR primer designsystem called JCVI Primer Designer which is similar torun tiled primers in that it designs degenerate primers totile across viral genomes [4] The major difference is thatit begins with a consensus sequence containing degeneratebases and selects primers with fewer than 3 or 4 degeneratebases so that in the end a majority of strains are amplified

Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2014 Article ID 101894 8 pageshttpdxdoiorg1011552014101894

2 Advances in Bioinformatics

Table 1 Summary of average lengths number of sequences and percentage of conserved bases in a multiple sequence alignment (withMUSCLE [5]) and number of tiled primers required for the short and long amplicon settings

Organism Number of sequences Avg Length Consensus () Number of primers forsim3000 bp amplicons

Number of primers forsim10000 bp amplicons

CCHF S 56 1668 39 6 6CCHF M 49 5314 24 46 16CCHF L 31 12113 46 69 27RVF S 89 1684 53 2 2RVF M 69 3885 78 4 6RVF L 62 6404 83 6 4Ebola 22 18659 5 116 35Marburg 31 19115 70 34 8Hendra 10 18234 97 12 4Nipah 9 18247 91 18 6Junin L 12 7114 96 6 2Machupo L 5 7141 88 10 2Junin S 26 3410 80 4 4Machupo S 13 3432 76 4 4JEV 144 10968 56 26 6NW Arena S 100 3396 18 64 42NW Arena L 42 7107 18 83 19OW Arena S 54 3547 8 116 32OW Arena L 45 7199 21 110 35TBEV 67 10840 36 56 10Abbreviations CCHF = Crimean-Congo hemorrhagic fever RVF = Rift Valley fever JEV = Japanese encephalitis virus NW Arena = NewWorld ArenavirusOW Arena = Old World Arenavirus TBEV = tick-borne encephalitis virus L = L segment S = S segment

but it does not require primers to amplify all strains In theirexamples most of the primer pairs could amplify gt75 ofisolates Each primer pair for a given region is intended tobe run as a specific pair not as a multiplex with multiplepairs Consensus sequences with too little conservation thatis lt90 consensus are divided manually in a preprocessingstep into subgroups which can be run separately throughthe pipeline The method here differs in that it takes thefull multiple sequence alignment as input rather than aconsensus and it seeks to automatically design a minimaldegenerate set of multiplex compatible primers to amplify allthe strains for a given region in a single reaction The majoroperational difference of run tiled primers compared to theJCVI pipeline is that run tiled primers does not requireman-ual subdivision of the target sequences into high consensusgroups to be run separately by the user and run tiled primersattempts to cover 100 of the target sequences in a single passusing a greedy minimal set algorithm

Some regions of high conservation may have only oneprimer pair predicted to amplify all strain variants whileother regions may require many primers to cover all knownvariants If multiple strains are present at once or if multipleforward andor reverse primers in the multiplex amplifythe strain present the reaction will generate multiple over-lapping amplicons spanning the same region which couldbe problematic if exactly one amplicon sequence is neededfor example for Sanger sequencing In this case the JCVI

Primer Designer would be preferable since it designs primerpairs each to be run in singleplex reactions rather than asa multiplex with the risk that outlier strains may not beamplified However when multiple overlapping reads withdifferent endpoints or from different strains are acceptableas in high throughput sequencing run tiled primers shouldbe suitable and could serve as a good alternative to randomamplification when more specific enrichment is needed andamplification of outliers is desired

For the viral groups we used here the target sets includedup to hundreds of sequences and in many cases consensuswas extremely low as little as 5 of the bases in themultiple sequence alignment (Table 1) The JCVI PrimerDesigner pipeline with a manual approach of subdividingthe sequences into groups with 90 consensus and runningeach group separately could be a labor-intensive endeavorand would certainly result in a large number of singleplexreactions to cover each genome

Possible applications include target enrichment for viraldiscovery of new members in a viral family from a com-plex host background improving high throughput sequenc-ing sensitivity and coverage of a rapidly evolving virusor enriched coverage of variants in a gene family Wedemonstrate the scalability of this software for designingwhole genome amplification primers for a number of highlypathogenic viral groups which display very high levels ofsequence variation and for which we anticipate that targeted

Advances in Bioinformatics 3

OverlapSplit size

05Overlap

Regions from which to select primers

Overlap

05Overlap

Regions from which to select primers

FP RP

FP RP

Continue tiling across genome

Figure 1 Diagram showing how themultiple sequence alignment is split into overlapping sections and conserved degenerate sets of primersare designed near the ends of the overlapping pieces so that overlapping amplicons should be produced which tile across the viral genomeFP = forward primer RP = reverse primer

enrichment would be needed to obtain adequate sensitivityand genome coverage when sequencing from a clinical orenvironmental sample

2 Implementation21 Process The run tiled primers process can be summa-rized as follows split a multiple sequence alignment intooverlapping regions and for each region design a degen-erate multiplex set of primers that in combination amplifythat region in all strains with as few primers as possibleRun tiled primers takes as input a multiple sequence align-ment (MSA) Run tiled primers splits the alignment intoregions of size ldquo119904rdquo bases that overlap by ldquo119909rdquo bases (Figure 1)

When splitting the alignment into regions of size 119904 if thelast ldquoremainderrdquo piece of an alignment is less than half of119904 then 119904 is increased by the amount that evenly divides thealignment without any remainder to 1199041015840 and the split regionsare recalculated with 1199041015840 If a user desires to tile across onlyselected regions instead of tiling across the entire sequencethen an optional regions file may be specified which containsthe regions (eg genes) and their start and end positions inthe alignment

For each region the PriMux software [3] is used to searchfor conserved degenerate and multiplex compatible primersets to amplify that region in all target sequences with as fewprimers as possible The PriMux ldquomaxrdquo algorithm is usedPrimers should be multiplex compatible since the primersfor a given region are predicted not to form primer dimersand all to have 119879

119898rsquos in a range specified by the user As

run tiled primers is a wrapper script around the PriMuxworkhorse all the primer design characteristics are specifiedin a PriMux options file The minimum and maximumamplicon lengths are determined by the (119904 119909) parametersto run tiled primers (Table 2) so these parameters may beomitted in the input options file or if they are presenttheir values will be replaced with values appropriate forthe specified values of (119904 119909) Run tiled primers requires thatprimers must anneal within 05119909 of either end of the regionIf the value of 119909 is 36 bp or less it is too short for two

nonoverlapping primers typically at least 18 bp long In thiscase the code does not require that adjacent regions overlapand amplicons are allowed from anywhere in each regionSmall overlaps (eg 40ndash80) do not leave much room to findgood priming regions that pass the filters on 119879

119898 entropy free

energy and homopolymers as specified in the options fileand consequently it may not be possible to find primers forall targets When this happens increasing the overlap andrelaxing the primer specifications may be necessary

Requiring that primers fall within 05119909 bases of the endsof each region facilitates the creation of amplicons whichshould overlap across a genome allowing full genome assem-bly from the amplified productsThere may not be ampliconscovering the extreme 51015840 and 31015840 ends of a target sequence sincethe first and last primers may be located some distance (max-imum of 1199092) from the ends Rapid Amplification of cDNAEnds (RACE) PCRwould be necessary to amplify the genomeends not covered by an overlapping region priming with thereverse complement of the run tiled primers primers closestto the end so as to prime toward the edge of the genome

Because this split size is based on the alignment and sincedashes in the alignment are not counted in amplicon lengthactual amplicons may be substantially shorter than the splitsize 119904 This is likely to happen for poorly aligning regionsor regions in which there are insertions or deletions in asubset of the sequences To compensate for this one shouldselect 119904 that is larger than the actual amplicon lengths desiredparticularly if the length of the MSA is much larger than theaverage genome length

Run tiled primers labels each overlapping region aspart where indicates the order of the regions for example0part 1part and 2part are the three regions shown in Figure 1For each region sets of conserved degenerate primers aredesigned to ensure amplification of all the targets if possiblegiven the primer specifications

The primers can be run in separate singleplex reactionsfor each split region or alternatively primers for all regionscan be combined in a large multiplex after the large set ischecked for primer dimers that could occur between primersfrom different regions Combining primers for all regions

4 Advances in Bioinformatics

Table 2 Parameters used for primer design in in silico examples and MHV example presented here

In silico primer settings MHV primer settingsPrimer length range 18ndash25 18ndash27119879

119898range allowed1 60ndash65∘C 58ndash65∘C

Number degenerate bases allowed per primer 5 3Minimum distance of degenerate base to 31015840 end of primer 3 nt 3 ntMinimum trimer entropy allowed (to avoid repetitive sequence)2 35 33Maximum length of homopolymer allowed 4 nt 5 ntGC range allowed 20ndash80 20ndash80Minimum primer dimer Δ119866 minus6 kcalmol minus15 kcalmolMinimum hairpin Δ119866 minus5 kcalmol minus12 kcalmolPrimer selection iterations 1 31119879119898is calculated using Unafold [6]

2Low complexity regions (repetitive sequence) are excluded from consideration as primers by setting a minimum entropy threshold for a primer candidateThe entropy 119878

119894of a sequence was computed by counting the numbers of occurrences of 119899

119860119860119860 119899119860119860119862 119899

119879119879119879of the 64 possible trimers in the probe sequence

and dividing by the total number of trimers yielding the corresponding frequencies 119891119860119860119860 119891

119879119879119879 The entropy is then given by the sum of minus119891

119905log2119891119905where

the sum is over the trimers t with 119891119905= 0

in multiplex should facilitate whole genome amplificationin a single reaction It may yield longer amplicons fromthe reaction of forward and reverse primers from differentparts (FP from 0part reacting with RP from 1part givesproductsim2 times the split size) depending on the polymeraseprocessivity and the duration of the extension step andshould facilitate assembly across amplified regionsThis helpsalleviate cases where a primer cannot be found for one partin an outlier genome due to 119879

119898 homopolymers primer

dimer Δ119866 and so forth since primers from different partsmay amplify across the region However since primers ofoverlapping regions can also produce amplicons shorter (lessthan 119909 bp) than the desired amplicon of length between 119904 minus 119909and 119904 bp (eg RP of 0part with the FP from 1part) a step toremove short amplicons before sequencing may be desiredIn our experimental test with MHV the primers from parts0 2 and 4 were combined in one reaction and the primersfrom parts 1 and 3 were combined in another so that shortproducts would not be produced

We used the script simulate PCRpl (httpssourceforgenetprojectssimulatepcr [7]) to predict all PCR ampliconsfrom the multiplex degenerate primers compared to thetarget sequences and to the NCBI nt database This scriptis run automatically from the run tiled primers code after itpredicts primers It is set to predict amplicons up to twice themaximum amplicon length specified by the user

22 Computational Examples Computationally predictedtiled primer sets were generated for the viruses and primerspecifications provided in Table 1 MSAs were created withMUSCLE [5] Two settings of split size 119904 and overlap size 119909were used long amplicons with 119904 = 10 000 119909 = 500 or shortamplicons of 119904 = 3000119909 = 500The choice of which set to usecould depend upon the product lengths the polymerase canamplify and the duration of the extension step of PCRThesefairly long amplicons are provided as theoretical examplesUsers may run run tiled primers with shorter amplicons(eg 119904 = 400 bp) to divide the MSA into many more parts

One amplicon per target sequence per region was desired(PriMux option file with - primer selection iterations = 1)Table 1 shows the average genome or segment length thenumber of genomes available for each target the consensusamong those sequences and the total number of primers toamplify all overlapping regions of all genomes All productsfrom the nt database under 7800 bp (shorter amplicon) or26 kb (longer amplicon) were predicted with simulate PCRto identify potential amplification of nontarget organisms(Tables 3 and 4)

23 Murine Hepatitis Virus Example Run tiled primers wasused to design primers for selected regions of the coronavirusmurine hepatitis virus (strain MHV-1) genome followingpassage in the lab for a separate project in which deepsequencing of selected regions following lab passage was per-formed In other work attempting to amplify passaged RNAviruses finding robust primers based on the original genomewas difficult due tomutationswhichmodified primer bindingsites [8] It was hoped that run tiled primers would helpavoid selecting primers in mutational hotspots by taking intoaccount strain variation across multiple available genomesfor the species since run tiled primers seeks maximallyconserved primers in the available sequences

Input to run tiled primers was an alignment of 22MHVgenomes (genome identities provided as supplementaryinformation) created using MUSCLE [5] Regions tiled werethe Nsp1 Nsp3 Nsp14 and several genes at the 31015840 end ofthe genome (regions file provided in supplementary infor-mation) using the primer parameters in Table 2 Primer setswere predicted to produce overlapping amplicons for theseregions from all MHV genomes and a subset of primerspredicted to amplify theMHV-1 orMHV strain JHMgenomewas selected Some primers that were predicted to amplifythe JHM strain but not the MHV-1 strain were included inthe multiplex to check for possible evolutionary change ofthe original sequence toward the annotated reference JHMsequence or cross reactionswith primer-genomemismatches

Advances in Bioinformatics 5

Table 3 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 3 kb amplicons In a multiplex of the 3 kb-amplicon tiled primers for a given organism of the possible reactions producing products only a small number of primer combinationsare predicted to amplify regions in nontarget organisms Counts show the number of unique primer combinations in a multiplex that yieldproducts for any sequence in the NCBI nt nucleotide database The numerator is for any nontarget organism in nt and the denominator isfor any target or nontarget organism in nt that is nonspecifictotal of the possible primer combinations in the multiplex predicted to yieldproduct when compared against nt Vastly more amplicons are produced from target organisms indicating any contaminating nontargetspecies should be a small minority of amplified product

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 01934 mdashCCHF L 03753 mdashRVF S 0137 mdashRVF M 0356 mdashRVF L 0753 mdashEbola 12657 Zea mays clone BAC ZMMBBb0342E21Marburg 01511 mdashHendra 0206 mdashNipah 0286 mdashJunin L 069 mdashMachupo L 0153 mdashJunin S 084 mdashMachupo S 032 mdash

JEV 79515 RocioWest Nile

NW Arena S 561543

IppyLassaLuna

Lymphocytic choriomeningitisMobalaMopeia

NW Arena L 0819 mdash

OW Arena S 732509

AllpahuayoAmapari

Bear canyonChapareCupixi

DandenongFlexal

GuanaritoJuninLatinoLujo

MachupoMethylococcus capsulatus str Bath

ParanaPiritalSabia

TamiamiWhitewater Arroyo

OW Arena L 11826 DandenongTBEV 04925 mdash

6 Advances in Bioinformatics

Table 4 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons As in Table 3 but for themultiplexes of the 10 kb-amplicon tiled primers

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 0261 mdashCCHF L 0253 mdashRVF S 0137 mdashRVF M 0487 mdashRVF L 0195 mdashEbola 0534 mdashMarburg 0123 mdashHendra 050 mdashNipah 074 mdashJunin L 012 mdashMachupo L 07 mdashJunin S 095 mdashMachupo S 032 mdashJEV 01554 mdashNW Arena S 1337 Human chromosome 14 BAC C-2555K7 of library CalTech-DNW Arena L 086 mdashOW Arena S 0316 mdashOW Arena L 0131 mdashTBEV 0189 mdash

Samples from MHV-1 infected mice were provided byDr Richard Bowen at Colorado State University TheMHV-1strain used to infect the mice was obtained from AmericanType Culture Collection (Manassas VA) and viral stock waspropagated in murine fibroblast 17Cl-1 cells then used toinfect C3H mice via intranasal route Mice were sacrificedfour days after inoculation and bronchoalveolar lavage (BAL)fluid was collected RNAwas extracted from the BAL samplesusing Invitrogen TRIZOL reagent as per the manufacturerrsquosinstructions RNA was converted to cDNA using SuperscriptIII (Invitrogen) and random hexamers according to themanufacturerrsquos protocol

Multiplexed primer sets were designed to cover theNsp3 and 31015840 genes with 3 primer pairs per genomic regionamplified when possible (total number of primers testedin two multiplex reactions was 53 Table S1) The primerswere tested in the lab first by testing the primer pairsin individual reactions then as multiplexed reactions Noeffort was made to optimize the PCR cycling conditionsRT-PCR conditions were as follows reverse transcriptionwas performed using random hexamers and the SuperscriptIII RT reverse transcriptase kit (Invitrogen) The MHV-1 cDNA templates were amplified using the Q5 Hot StartHigh-Fidelity DNA Polymerase kit (New England BioLabsIpswich MA) following manufacturerrsquos instructions PCRconditions consisted of 98∘C for 30 s followed by 35 cyclesof 98∘C for 10 s 60∘C for 20 s and 72∘C for 1min The finalcycle was 72∘C for 2min

Two multiplex reactions were set up with each contain-ing a group of nonoverlapping primer sets (Figure 2) Forexample multiplex ldquoArdquo included primer sets A C E G and

I and multiplex ldquoBrdquo had primer sets B D F and H Bystaggering the primer sets into different multiplex reactionsthe amplification of overlapping primer regions created bythe reverse primer from one set with the forward primer ofthe overlapping adjacent primer set was eliminatedWithoutthis strategy these overlapping primer sets would dominatethe PCR reaction due to the small size of these amplicons

The amplification of each primer pair in the multiplexwas tested using a seminested PCR strategy to verify thatthe correct specific amplicons were being produced fromeach multiplex of primers for a given region (Figure 2 TableS2) The multiplex PCR products served as templates forPCR reactions with primer pairs that included the reverseprimer of one region paired with the forward primer fromthe downstream adjacent region to determine if the templategenerated from the multiplex was present To ensure thatthe PCR product was generated from the multiplex producttemplate rather than genomic DNA carried over from theinitial sample the multiplex product template was diluted1 10000 or excised from a gel and purified prior to use asa template

3 Results and Discussion

All the primers for both (119904 119909) settings are provided asSupplementary data as are the predicted amplicon start andend positions in each target genome from a multiplex ofthe primers for a given viral target set Tiled amplificationof these viruses required from 2 to 116 primers (Table 1)Primers are predicted to be specific to the target organismsfor the most part although not exclusively (Tables 3 and 4)

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Multiplex Degenerate Primer Design for ...

2 Advances in Bioinformatics

Table 1 Summary of average lengths number of sequences and percentage of conserved bases in a multiple sequence alignment (withMUSCLE [5]) and number of tiled primers required for the short and long amplicon settings

Organism Number of sequences Avg Length Consensus () Number of primers forsim3000 bp amplicons

Number of primers forsim10000 bp amplicons

CCHF S 56 1668 39 6 6CCHF M 49 5314 24 46 16CCHF L 31 12113 46 69 27RVF S 89 1684 53 2 2RVF M 69 3885 78 4 6RVF L 62 6404 83 6 4Ebola 22 18659 5 116 35Marburg 31 19115 70 34 8Hendra 10 18234 97 12 4Nipah 9 18247 91 18 6Junin L 12 7114 96 6 2Machupo L 5 7141 88 10 2Junin S 26 3410 80 4 4Machupo S 13 3432 76 4 4JEV 144 10968 56 26 6NW Arena S 100 3396 18 64 42NW Arena L 42 7107 18 83 19OW Arena S 54 3547 8 116 32OW Arena L 45 7199 21 110 35TBEV 67 10840 36 56 10Abbreviations CCHF = Crimean-Congo hemorrhagic fever RVF = Rift Valley fever JEV = Japanese encephalitis virus NW Arena = NewWorld ArenavirusOW Arena = Old World Arenavirus TBEV = tick-borne encephalitis virus L = L segment S = S segment

but it does not require primers to amplify all strains In theirexamples most of the primer pairs could amplify gt75 ofisolates Each primer pair for a given region is intended tobe run as a specific pair not as a multiplex with multiplepairs Consensus sequences with too little conservation thatis lt90 consensus are divided manually in a preprocessingstep into subgroups which can be run separately throughthe pipeline The method here differs in that it takes thefull multiple sequence alignment as input rather than aconsensus and it seeks to automatically design a minimaldegenerate set of multiplex compatible primers to amplify allthe strains for a given region in a single reaction The majoroperational difference of run tiled primers compared to theJCVI pipeline is that run tiled primers does not requireman-ual subdivision of the target sequences into high consensusgroups to be run separately by the user and run tiled primersattempts to cover 100 of the target sequences in a single passusing a greedy minimal set algorithm

Some regions of high conservation may have only oneprimer pair predicted to amplify all strain variants whileother regions may require many primers to cover all knownvariants If multiple strains are present at once or if multipleforward andor reverse primers in the multiplex amplifythe strain present the reaction will generate multiple over-lapping amplicons spanning the same region which couldbe problematic if exactly one amplicon sequence is neededfor example for Sanger sequencing In this case the JCVI

Primer Designer would be preferable since it designs primerpairs each to be run in singleplex reactions rather than asa multiplex with the risk that outlier strains may not beamplified However when multiple overlapping reads withdifferent endpoints or from different strains are acceptableas in high throughput sequencing run tiled primers shouldbe suitable and could serve as a good alternative to randomamplification when more specific enrichment is needed andamplification of outliers is desired

For the viral groups we used here the target sets includedup to hundreds of sequences and in many cases consensuswas extremely low as little as 5 of the bases in themultiple sequence alignment (Table 1) The JCVI PrimerDesigner pipeline with a manual approach of subdividingthe sequences into groups with 90 consensus and runningeach group separately could be a labor-intensive endeavorand would certainly result in a large number of singleplexreactions to cover each genome

Possible applications include target enrichment for viraldiscovery of new members in a viral family from a com-plex host background improving high throughput sequenc-ing sensitivity and coverage of a rapidly evolving virusor enriched coverage of variants in a gene family Wedemonstrate the scalability of this software for designingwhole genome amplification primers for a number of highlypathogenic viral groups which display very high levels ofsequence variation and for which we anticipate that targeted

Advances in Bioinformatics 3

OverlapSplit size

05Overlap

Regions from which to select primers

Overlap

05Overlap

Regions from which to select primers

FP RP

FP RP

Continue tiling across genome

Figure 1 Diagram showing how themultiple sequence alignment is split into overlapping sections and conserved degenerate sets of primersare designed near the ends of the overlapping pieces so that overlapping amplicons should be produced which tile across the viral genomeFP = forward primer RP = reverse primer

enrichment would be needed to obtain adequate sensitivityand genome coverage when sequencing from a clinical orenvironmental sample

2 Implementation21 Process The run tiled primers process can be summa-rized as follows split a multiple sequence alignment intooverlapping regions and for each region design a degen-erate multiplex set of primers that in combination amplifythat region in all strains with as few primers as possibleRun tiled primers takes as input a multiple sequence align-ment (MSA) Run tiled primers splits the alignment intoregions of size ldquo119904rdquo bases that overlap by ldquo119909rdquo bases (Figure 1)

When splitting the alignment into regions of size 119904 if thelast ldquoremainderrdquo piece of an alignment is less than half of119904 then 119904 is increased by the amount that evenly divides thealignment without any remainder to 1199041015840 and the split regionsare recalculated with 1199041015840 If a user desires to tile across onlyselected regions instead of tiling across the entire sequencethen an optional regions file may be specified which containsthe regions (eg genes) and their start and end positions inthe alignment

For each region the PriMux software [3] is used to searchfor conserved degenerate and multiplex compatible primersets to amplify that region in all target sequences with as fewprimers as possible The PriMux ldquomaxrdquo algorithm is usedPrimers should be multiplex compatible since the primersfor a given region are predicted not to form primer dimersand all to have 119879

119898rsquos in a range specified by the user As

run tiled primers is a wrapper script around the PriMuxworkhorse all the primer design characteristics are specifiedin a PriMux options file The minimum and maximumamplicon lengths are determined by the (119904 119909) parametersto run tiled primers (Table 2) so these parameters may beomitted in the input options file or if they are presenttheir values will be replaced with values appropriate forthe specified values of (119904 119909) Run tiled primers requires thatprimers must anneal within 05119909 of either end of the regionIf the value of 119909 is 36 bp or less it is too short for two

nonoverlapping primers typically at least 18 bp long In thiscase the code does not require that adjacent regions overlapand amplicons are allowed from anywhere in each regionSmall overlaps (eg 40ndash80) do not leave much room to findgood priming regions that pass the filters on 119879

119898 entropy free

energy and homopolymers as specified in the options fileand consequently it may not be possible to find primers forall targets When this happens increasing the overlap andrelaxing the primer specifications may be necessary

Requiring that primers fall within 05119909 bases of the endsof each region facilitates the creation of amplicons whichshould overlap across a genome allowing full genome assem-bly from the amplified productsThere may not be ampliconscovering the extreme 51015840 and 31015840 ends of a target sequence sincethe first and last primers may be located some distance (max-imum of 1199092) from the ends Rapid Amplification of cDNAEnds (RACE) PCRwould be necessary to amplify the genomeends not covered by an overlapping region priming with thereverse complement of the run tiled primers primers closestto the end so as to prime toward the edge of the genome

Because this split size is based on the alignment and sincedashes in the alignment are not counted in amplicon lengthactual amplicons may be substantially shorter than the splitsize 119904 This is likely to happen for poorly aligning regionsor regions in which there are insertions or deletions in asubset of the sequences To compensate for this one shouldselect 119904 that is larger than the actual amplicon lengths desiredparticularly if the length of the MSA is much larger than theaverage genome length

Run tiled primers labels each overlapping region aspart where indicates the order of the regions for example0part 1part and 2part are the three regions shown in Figure 1For each region sets of conserved degenerate primers aredesigned to ensure amplification of all the targets if possiblegiven the primer specifications

The primers can be run in separate singleplex reactionsfor each split region or alternatively primers for all regionscan be combined in a large multiplex after the large set ischecked for primer dimers that could occur between primersfrom different regions Combining primers for all regions

4 Advances in Bioinformatics

Table 2 Parameters used for primer design in in silico examples and MHV example presented here

In silico primer settings MHV primer settingsPrimer length range 18ndash25 18ndash27119879

119898range allowed1 60ndash65∘C 58ndash65∘C

Number degenerate bases allowed per primer 5 3Minimum distance of degenerate base to 31015840 end of primer 3 nt 3 ntMinimum trimer entropy allowed (to avoid repetitive sequence)2 35 33Maximum length of homopolymer allowed 4 nt 5 ntGC range allowed 20ndash80 20ndash80Minimum primer dimer Δ119866 minus6 kcalmol minus15 kcalmolMinimum hairpin Δ119866 minus5 kcalmol minus12 kcalmolPrimer selection iterations 1 31119879119898is calculated using Unafold [6]

2Low complexity regions (repetitive sequence) are excluded from consideration as primers by setting a minimum entropy threshold for a primer candidateThe entropy 119878

119894of a sequence was computed by counting the numbers of occurrences of 119899

119860119860119860 119899119860119860119862 119899

119879119879119879of the 64 possible trimers in the probe sequence

and dividing by the total number of trimers yielding the corresponding frequencies 119891119860119860119860 119891

119879119879119879 The entropy is then given by the sum of minus119891

119905log2119891119905where

the sum is over the trimers t with 119891119905= 0

in multiplex should facilitate whole genome amplificationin a single reaction It may yield longer amplicons fromthe reaction of forward and reverse primers from differentparts (FP from 0part reacting with RP from 1part givesproductsim2 times the split size) depending on the polymeraseprocessivity and the duration of the extension step andshould facilitate assembly across amplified regionsThis helpsalleviate cases where a primer cannot be found for one partin an outlier genome due to 119879

119898 homopolymers primer

dimer Δ119866 and so forth since primers from different partsmay amplify across the region However since primers ofoverlapping regions can also produce amplicons shorter (lessthan 119909 bp) than the desired amplicon of length between 119904 minus 119909and 119904 bp (eg RP of 0part with the FP from 1part) a step toremove short amplicons before sequencing may be desiredIn our experimental test with MHV the primers from parts0 2 and 4 were combined in one reaction and the primersfrom parts 1 and 3 were combined in another so that shortproducts would not be produced

We used the script simulate PCRpl (httpssourceforgenetprojectssimulatepcr [7]) to predict all PCR ampliconsfrom the multiplex degenerate primers compared to thetarget sequences and to the NCBI nt database This scriptis run automatically from the run tiled primers code after itpredicts primers It is set to predict amplicons up to twice themaximum amplicon length specified by the user

22 Computational Examples Computationally predictedtiled primer sets were generated for the viruses and primerspecifications provided in Table 1 MSAs were created withMUSCLE [5] Two settings of split size 119904 and overlap size 119909were used long amplicons with 119904 = 10 000 119909 = 500 or shortamplicons of 119904 = 3000119909 = 500The choice of which set to usecould depend upon the product lengths the polymerase canamplify and the duration of the extension step of PCRThesefairly long amplicons are provided as theoretical examplesUsers may run run tiled primers with shorter amplicons(eg 119904 = 400 bp) to divide the MSA into many more parts

One amplicon per target sequence per region was desired(PriMux option file with - primer selection iterations = 1)Table 1 shows the average genome or segment length thenumber of genomes available for each target the consensusamong those sequences and the total number of primers toamplify all overlapping regions of all genomes All productsfrom the nt database under 7800 bp (shorter amplicon) or26 kb (longer amplicon) were predicted with simulate PCRto identify potential amplification of nontarget organisms(Tables 3 and 4)

23 Murine Hepatitis Virus Example Run tiled primers wasused to design primers for selected regions of the coronavirusmurine hepatitis virus (strain MHV-1) genome followingpassage in the lab for a separate project in which deepsequencing of selected regions following lab passage was per-formed In other work attempting to amplify passaged RNAviruses finding robust primers based on the original genomewas difficult due tomutationswhichmodified primer bindingsites [8] It was hoped that run tiled primers would helpavoid selecting primers in mutational hotspots by taking intoaccount strain variation across multiple available genomesfor the species since run tiled primers seeks maximallyconserved primers in the available sequences

Input to run tiled primers was an alignment of 22MHVgenomes (genome identities provided as supplementaryinformation) created using MUSCLE [5] Regions tiled werethe Nsp1 Nsp3 Nsp14 and several genes at the 31015840 end ofthe genome (regions file provided in supplementary infor-mation) using the primer parameters in Table 2 Primer setswere predicted to produce overlapping amplicons for theseregions from all MHV genomes and a subset of primerspredicted to amplify theMHV-1 orMHV strain JHMgenomewas selected Some primers that were predicted to amplifythe JHM strain but not the MHV-1 strain were included inthe multiplex to check for possible evolutionary change ofthe original sequence toward the annotated reference JHMsequence or cross reactionswith primer-genomemismatches

Advances in Bioinformatics 5

Table 3 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 3 kb amplicons In a multiplex of the 3 kb-amplicon tiled primers for a given organism of the possible reactions producing products only a small number of primer combinationsare predicted to amplify regions in nontarget organisms Counts show the number of unique primer combinations in a multiplex that yieldproducts for any sequence in the NCBI nt nucleotide database The numerator is for any nontarget organism in nt and the denominator isfor any target or nontarget organism in nt that is nonspecifictotal of the possible primer combinations in the multiplex predicted to yieldproduct when compared against nt Vastly more amplicons are produced from target organisms indicating any contaminating nontargetspecies should be a small minority of amplified product

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 01934 mdashCCHF L 03753 mdashRVF S 0137 mdashRVF M 0356 mdashRVF L 0753 mdashEbola 12657 Zea mays clone BAC ZMMBBb0342E21Marburg 01511 mdashHendra 0206 mdashNipah 0286 mdashJunin L 069 mdashMachupo L 0153 mdashJunin S 084 mdashMachupo S 032 mdash

JEV 79515 RocioWest Nile

NW Arena S 561543

IppyLassaLuna

Lymphocytic choriomeningitisMobalaMopeia

NW Arena L 0819 mdash

OW Arena S 732509

AllpahuayoAmapari

Bear canyonChapareCupixi

DandenongFlexal

GuanaritoJuninLatinoLujo

MachupoMethylococcus capsulatus str Bath

ParanaPiritalSabia

TamiamiWhitewater Arroyo

OW Arena L 11826 DandenongTBEV 04925 mdash

6 Advances in Bioinformatics

Table 4 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons As in Table 3 but for themultiplexes of the 10 kb-amplicon tiled primers

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 0261 mdashCCHF L 0253 mdashRVF S 0137 mdashRVF M 0487 mdashRVF L 0195 mdashEbola 0534 mdashMarburg 0123 mdashHendra 050 mdashNipah 074 mdashJunin L 012 mdashMachupo L 07 mdashJunin S 095 mdashMachupo S 032 mdashJEV 01554 mdashNW Arena S 1337 Human chromosome 14 BAC C-2555K7 of library CalTech-DNW Arena L 086 mdashOW Arena S 0316 mdashOW Arena L 0131 mdashTBEV 0189 mdash

Samples from MHV-1 infected mice were provided byDr Richard Bowen at Colorado State University TheMHV-1strain used to infect the mice was obtained from AmericanType Culture Collection (Manassas VA) and viral stock waspropagated in murine fibroblast 17Cl-1 cells then used toinfect C3H mice via intranasal route Mice were sacrificedfour days after inoculation and bronchoalveolar lavage (BAL)fluid was collected RNAwas extracted from the BAL samplesusing Invitrogen TRIZOL reagent as per the manufacturerrsquosinstructions RNA was converted to cDNA using SuperscriptIII (Invitrogen) and random hexamers according to themanufacturerrsquos protocol

Multiplexed primer sets were designed to cover theNsp3 and 31015840 genes with 3 primer pairs per genomic regionamplified when possible (total number of primers testedin two multiplex reactions was 53 Table S1) The primerswere tested in the lab first by testing the primer pairsin individual reactions then as multiplexed reactions Noeffort was made to optimize the PCR cycling conditionsRT-PCR conditions were as follows reverse transcriptionwas performed using random hexamers and the SuperscriptIII RT reverse transcriptase kit (Invitrogen) The MHV-1 cDNA templates were amplified using the Q5 Hot StartHigh-Fidelity DNA Polymerase kit (New England BioLabsIpswich MA) following manufacturerrsquos instructions PCRconditions consisted of 98∘C for 30 s followed by 35 cyclesof 98∘C for 10 s 60∘C for 20 s and 72∘C for 1min The finalcycle was 72∘C for 2min

Two multiplex reactions were set up with each contain-ing a group of nonoverlapping primer sets (Figure 2) Forexample multiplex ldquoArdquo included primer sets A C E G and

I and multiplex ldquoBrdquo had primer sets B D F and H Bystaggering the primer sets into different multiplex reactionsthe amplification of overlapping primer regions created bythe reverse primer from one set with the forward primer ofthe overlapping adjacent primer set was eliminatedWithoutthis strategy these overlapping primer sets would dominatethe PCR reaction due to the small size of these amplicons

The amplification of each primer pair in the multiplexwas tested using a seminested PCR strategy to verify thatthe correct specific amplicons were being produced fromeach multiplex of primers for a given region (Figure 2 TableS2) The multiplex PCR products served as templates forPCR reactions with primer pairs that included the reverseprimer of one region paired with the forward primer fromthe downstream adjacent region to determine if the templategenerated from the multiplex was present To ensure thatthe PCR product was generated from the multiplex producttemplate rather than genomic DNA carried over from theinitial sample the multiplex product template was diluted1 10000 or excised from a gel and purified prior to use asa template

3 Results and Discussion

All the primers for both (119904 119909) settings are provided asSupplementary data as are the predicted amplicon start andend positions in each target genome from a multiplex ofthe primers for a given viral target set Tiled amplificationof these viruses required from 2 to 116 primers (Table 1)Primers are predicted to be specific to the target organismsfor the most part although not exclusively (Tables 3 and 4)

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Multiplex Degenerate Primer Design for ...

Advances in Bioinformatics 3

OverlapSplit size

05Overlap

Regions from which to select primers

Overlap

05Overlap

Regions from which to select primers

FP RP

FP RP

Continue tiling across genome

Figure 1 Diagram showing how themultiple sequence alignment is split into overlapping sections and conserved degenerate sets of primersare designed near the ends of the overlapping pieces so that overlapping amplicons should be produced which tile across the viral genomeFP = forward primer RP = reverse primer

enrichment would be needed to obtain adequate sensitivityand genome coverage when sequencing from a clinical orenvironmental sample

2 Implementation21 Process The run tiled primers process can be summa-rized as follows split a multiple sequence alignment intooverlapping regions and for each region design a degen-erate multiplex set of primers that in combination amplifythat region in all strains with as few primers as possibleRun tiled primers takes as input a multiple sequence align-ment (MSA) Run tiled primers splits the alignment intoregions of size ldquo119904rdquo bases that overlap by ldquo119909rdquo bases (Figure 1)

When splitting the alignment into regions of size 119904 if thelast ldquoremainderrdquo piece of an alignment is less than half of119904 then 119904 is increased by the amount that evenly divides thealignment without any remainder to 1199041015840 and the split regionsare recalculated with 1199041015840 If a user desires to tile across onlyselected regions instead of tiling across the entire sequencethen an optional regions file may be specified which containsthe regions (eg genes) and their start and end positions inthe alignment

For each region the PriMux software [3] is used to searchfor conserved degenerate and multiplex compatible primersets to amplify that region in all target sequences with as fewprimers as possible The PriMux ldquomaxrdquo algorithm is usedPrimers should be multiplex compatible since the primersfor a given region are predicted not to form primer dimersand all to have 119879

119898rsquos in a range specified by the user As

run tiled primers is a wrapper script around the PriMuxworkhorse all the primer design characteristics are specifiedin a PriMux options file The minimum and maximumamplicon lengths are determined by the (119904 119909) parametersto run tiled primers (Table 2) so these parameters may beomitted in the input options file or if they are presenttheir values will be replaced with values appropriate forthe specified values of (119904 119909) Run tiled primers requires thatprimers must anneal within 05119909 of either end of the regionIf the value of 119909 is 36 bp or less it is too short for two

nonoverlapping primers typically at least 18 bp long In thiscase the code does not require that adjacent regions overlapand amplicons are allowed from anywhere in each regionSmall overlaps (eg 40ndash80) do not leave much room to findgood priming regions that pass the filters on 119879

119898 entropy free

energy and homopolymers as specified in the options fileand consequently it may not be possible to find primers forall targets When this happens increasing the overlap andrelaxing the primer specifications may be necessary

Requiring that primers fall within 05119909 bases of the endsof each region facilitates the creation of amplicons whichshould overlap across a genome allowing full genome assem-bly from the amplified productsThere may not be ampliconscovering the extreme 51015840 and 31015840 ends of a target sequence sincethe first and last primers may be located some distance (max-imum of 1199092) from the ends Rapid Amplification of cDNAEnds (RACE) PCRwould be necessary to amplify the genomeends not covered by an overlapping region priming with thereverse complement of the run tiled primers primers closestto the end so as to prime toward the edge of the genome

Because this split size is based on the alignment and sincedashes in the alignment are not counted in amplicon lengthactual amplicons may be substantially shorter than the splitsize 119904 This is likely to happen for poorly aligning regionsor regions in which there are insertions or deletions in asubset of the sequences To compensate for this one shouldselect 119904 that is larger than the actual amplicon lengths desiredparticularly if the length of the MSA is much larger than theaverage genome length

Run tiled primers labels each overlapping region aspart where indicates the order of the regions for example0part 1part and 2part are the three regions shown in Figure 1For each region sets of conserved degenerate primers aredesigned to ensure amplification of all the targets if possiblegiven the primer specifications

The primers can be run in separate singleplex reactionsfor each split region or alternatively primers for all regionscan be combined in a large multiplex after the large set ischecked for primer dimers that could occur between primersfrom different regions Combining primers for all regions

4 Advances in Bioinformatics

Table 2 Parameters used for primer design in in silico examples and MHV example presented here

In silico primer settings MHV primer settingsPrimer length range 18ndash25 18ndash27119879

119898range allowed1 60ndash65∘C 58ndash65∘C

Number degenerate bases allowed per primer 5 3Minimum distance of degenerate base to 31015840 end of primer 3 nt 3 ntMinimum trimer entropy allowed (to avoid repetitive sequence)2 35 33Maximum length of homopolymer allowed 4 nt 5 ntGC range allowed 20ndash80 20ndash80Minimum primer dimer Δ119866 minus6 kcalmol minus15 kcalmolMinimum hairpin Δ119866 minus5 kcalmol minus12 kcalmolPrimer selection iterations 1 31119879119898is calculated using Unafold [6]

2Low complexity regions (repetitive sequence) are excluded from consideration as primers by setting a minimum entropy threshold for a primer candidateThe entropy 119878

119894of a sequence was computed by counting the numbers of occurrences of 119899

119860119860119860 119899119860119860119862 119899

119879119879119879of the 64 possible trimers in the probe sequence

and dividing by the total number of trimers yielding the corresponding frequencies 119891119860119860119860 119891

119879119879119879 The entropy is then given by the sum of minus119891

119905log2119891119905where

the sum is over the trimers t with 119891119905= 0

in multiplex should facilitate whole genome amplificationin a single reaction It may yield longer amplicons fromthe reaction of forward and reverse primers from differentparts (FP from 0part reacting with RP from 1part givesproductsim2 times the split size) depending on the polymeraseprocessivity and the duration of the extension step andshould facilitate assembly across amplified regionsThis helpsalleviate cases where a primer cannot be found for one partin an outlier genome due to 119879

119898 homopolymers primer

dimer Δ119866 and so forth since primers from different partsmay amplify across the region However since primers ofoverlapping regions can also produce amplicons shorter (lessthan 119909 bp) than the desired amplicon of length between 119904 minus 119909and 119904 bp (eg RP of 0part with the FP from 1part) a step toremove short amplicons before sequencing may be desiredIn our experimental test with MHV the primers from parts0 2 and 4 were combined in one reaction and the primersfrom parts 1 and 3 were combined in another so that shortproducts would not be produced

We used the script simulate PCRpl (httpssourceforgenetprojectssimulatepcr [7]) to predict all PCR ampliconsfrom the multiplex degenerate primers compared to thetarget sequences and to the NCBI nt database This scriptis run automatically from the run tiled primers code after itpredicts primers It is set to predict amplicons up to twice themaximum amplicon length specified by the user

22 Computational Examples Computationally predictedtiled primer sets were generated for the viruses and primerspecifications provided in Table 1 MSAs were created withMUSCLE [5] Two settings of split size 119904 and overlap size 119909were used long amplicons with 119904 = 10 000 119909 = 500 or shortamplicons of 119904 = 3000119909 = 500The choice of which set to usecould depend upon the product lengths the polymerase canamplify and the duration of the extension step of PCRThesefairly long amplicons are provided as theoretical examplesUsers may run run tiled primers with shorter amplicons(eg 119904 = 400 bp) to divide the MSA into many more parts

One amplicon per target sequence per region was desired(PriMux option file with - primer selection iterations = 1)Table 1 shows the average genome or segment length thenumber of genomes available for each target the consensusamong those sequences and the total number of primers toamplify all overlapping regions of all genomes All productsfrom the nt database under 7800 bp (shorter amplicon) or26 kb (longer amplicon) were predicted with simulate PCRto identify potential amplification of nontarget organisms(Tables 3 and 4)

23 Murine Hepatitis Virus Example Run tiled primers wasused to design primers for selected regions of the coronavirusmurine hepatitis virus (strain MHV-1) genome followingpassage in the lab for a separate project in which deepsequencing of selected regions following lab passage was per-formed In other work attempting to amplify passaged RNAviruses finding robust primers based on the original genomewas difficult due tomutationswhichmodified primer bindingsites [8] It was hoped that run tiled primers would helpavoid selecting primers in mutational hotspots by taking intoaccount strain variation across multiple available genomesfor the species since run tiled primers seeks maximallyconserved primers in the available sequences

Input to run tiled primers was an alignment of 22MHVgenomes (genome identities provided as supplementaryinformation) created using MUSCLE [5] Regions tiled werethe Nsp1 Nsp3 Nsp14 and several genes at the 31015840 end ofthe genome (regions file provided in supplementary infor-mation) using the primer parameters in Table 2 Primer setswere predicted to produce overlapping amplicons for theseregions from all MHV genomes and a subset of primerspredicted to amplify theMHV-1 orMHV strain JHMgenomewas selected Some primers that were predicted to amplifythe JHM strain but not the MHV-1 strain were included inthe multiplex to check for possible evolutionary change ofthe original sequence toward the annotated reference JHMsequence or cross reactionswith primer-genomemismatches

Advances in Bioinformatics 5

Table 3 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 3 kb amplicons In a multiplex of the 3 kb-amplicon tiled primers for a given organism of the possible reactions producing products only a small number of primer combinationsare predicted to amplify regions in nontarget organisms Counts show the number of unique primer combinations in a multiplex that yieldproducts for any sequence in the NCBI nt nucleotide database The numerator is for any nontarget organism in nt and the denominator isfor any target or nontarget organism in nt that is nonspecifictotal of the possible primer combinations in the multiplex predicted to yieldproduct when compared against nt Vastly more amplicons are produced from target organisms indicating any contaminating nontargetspecies should be a small minority of amplified product

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 01934 mdashCCHF L 03753 mdashRVF S 0137 mdashRVF M 0356 mdashRVF L 0753 mdashEbola 12657 Zea mays clone BAC ZMMBBb0342E21Marburg 01511 mdashHendra 0206 mdashNipah 0286 mdashJunin L 069 mdashMachupo L 0153 mdashJunin S 084 mdashMachupo S 032 mdash

JEV 79515 RocioWest Nile

NW Arena S 561543

IppyLassaLuna

Lymphocytic choriomeningitisMobalaMopeia

NW Arena L 0819 mdash

OW Arena S 732509

AllpahuayoAmapari

Bear canyonChapareCupixi

DandenongFlexal

GuanaritoJuninLatinoLujo

MachupoMethylococcus capsulatus str Bath

ParanaPiritalSabia

TamiamiWhitewater Arroyo

OW Arena L 11826 DandenongTBEV 04925 mdash

6 Advances in Bioinformatics

Table 4 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons As in Table 3 but for themultiplexes of the 10 kb-amplicon tiled primers

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 0261 mdashCCHF L 0253 mdashRVF S 0137 mdashRVF M 0487 mdashRVF L 0195 mdashEbola 0534 mdashMarburg 0123 mdashHendra 050 mdashNipah 074 mdashJunin L 012 mdashMachupo L 07 mdashJunin S 095 mdashMachupo S 032 mdashJEV 01554 mdashNW Arena S 1337 Human chromosome 14 BAC C-2555K7 of library CalTech-DNW Arena L 086 mdashOW Arena S 0316 mdashOW Arena L 0131 mdashTBEV 0189 mdash

Samples from MHV-1 infected mice were provided byDr Richard Bowen at Colorado State University TheMHV-1strain used to infect the mice was obtained from AmericanType Culture Collection (Manassas VA) and viral stock waspropagated in murine fibroblast 17Cl-1 cells then used toinfect C3H mice via intranasal route Mice were sacrificedfour days after inoculation and bronchoalveolar lavage (BAL)fluid was collected RNAwas extracted from the BAL samplesusing Invitrogen TRIZOL reagent as per the manufacturerrsquosinstructions RNA was converted to cDNA using SuperscriptIII (Invitrogen) and random hexamers according to themanufacturerrsquos protocol

Multiplexed primer sets were designed to cover theNsp3 and 31015840 genes with 3 primer pairs per genomic regionamplified when possible (total number of primers testedin two multiplex reactions was 53 Table S1) The primerswere tested in the lab first by testing the primer pairsin individual reactions then as multiplexed reactions Noeffort was made to optimize the PCR cycling conditionsRT-PCR conditions were as follows reverse transcriptionwas performed using random hexamers and the SuperscriptIII RT reverse transcriptase kit (Invitrogen) The MHV-1 cDNA templates were amplified using the Q5 Hot StartHigh-Fidelity DNA Polymerase kit (New England BioLabsIpswich MA) following manufacturerrsquos instructions PCRconditions consisted of 98∘C for 30 s followed by 35 cyclesof 98∘C for 10 s 60∘C for 20 s and 72∘C for 1min The finalcycle was 72∘C for 2min

Two multiplex reactions were set up with each contain-ing a group of nonoverlapping primer sets (Figure 2) Forexample multiplex ldquoArdquo included primer sets A C E G and

I and multiplex ldquoBrdquo had primer sets B D F and H Bystaggering the primer sets into different multiplex reactionsthe amplification of overlapping primer regions created bythe reverse primer from one set with the forward primer ofthe overlapping adjacent primer set was eliminatedWithoutthis strategy these overlapping primer sets would dominatethe PCR reaction due to the small size of these amplicons

The amplification of each primer pair in the multiplexwas tested using a seminested PCR strategy to verify thatthe correct specific amplicons were being produced fromeach multiplex of primers for a given region (Figure 2 TableS2) The multiplex PCR products served as templates forPCR reactions with primer pairs that included the reverseprimer of one region paired with the forward primer fromthe downstream adjacent region to determine if the templategenerated from the multiplex was present To ensure thatthe PCR product was generated from the multiplex producttemplate rather than genomic DNA carried over from theinitial sample the multiplex product template was diluted1 10000 or excised from a gel and purified prior to use asa template

3 Results and Discussion

All the primers for both (119904 119909) settings are provided asSupplementary data as are the predicted amplicon start andend positions in each target genome from a multiplex ofthe primers for a given viral target set Tiled amplificationof these viruses required from 2 to 116 primers (Table 1)Primers are predicted to be specific to the target organismsfor the most part although not exclusively (Tables 3 and 4)

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Multiplex Degenerate Primer Design for ...

4 Advances in Bioinformatics

Table 2 Parameters used for primer design in in silico examples and MHV example presented here

In silico primer settings MHV primer settingsPrimer length range 18ndash25 18ndash27119879

119898range allowed1 60ndash65∘C 58ndash65∘C

Number degenerate bases allowed per primer 5 3Minimum distance of degenerate base to 31015840 end of primer 3 nt 3 ntMinimum trimer entropy allowed (to avoid repetitive sequence)2 35 33Maximum length of homopolymer allowed 4 nt 5 ntGC range allowed 20ndash80 20ndash80Minimum primer dimer Δ119866 minus6 kcalmol minus15 kcalmolMinimum hairpin Δ119866 minus5 kcalmol minus12 kcalmolPrimer selection iterations 1 31119879119898is calculated using Unafold [6]

2Low complexity regions (repetitive sequence) are excluded from consideration as primers by setting a minimum entropy threshold for a primer candidateThe entropy 119878

119894of a sequence was computed by counting the numbers of occurrences of 119899

119860119860119860 119899119860119860119862 119899

119879119879119879of the 64 possible trimers in the probe sequence

and dividing by the total number of trimers yielding the corresponding frequencies 119891119860119860119860 119891

119879119879119879 The entropy is then given by the sum of minus119891

119905log2119891119905where

the sum is over the trimers t with 119891119905= 0

in multiplex should facilitate whole genome amplificationin a single reaction It may yield longer amplicons fromthe reaction of forward and reverse primers from differentparts (FP from 0part reacting with RP from 1part givesproductsim2 times the split size) depending on the polymeraseprocessivity and the duration of the extension step andshould facilitate assembly across amplified regionsThis helpsalleviate cases where a primer cannot be found for one partin an outlier genome due to 119879

119898 homopolymers primer

dimer Δ119866 and so forth since primers from different partsmay amplify across the region However since primers ofoverlapping regions can also produce amplicons shorter (lessthan 119909 bp) than the desired amplicon of length between 119904 minus 119909and 119904 bp (eg RP of 0part with the FP from 1part) a step toremove short amplicons before sequencing may be desiredIn our experimental test with MHV the primers from parts0 2 and 4 were combined in one reaction and the primersfrom parts 1 and 3 were combined in another so that shortproducts would not be produced

We used the script simulate PCRpl (httpssourceforgenetprojectssimulatepcr [7]) to predict all PCR ampliconsfrom the multiplex degenerate primers compared to thetarget sequences and to the NCBI nt database This scriptis run automatically from the run tiled primers code after itpredicts primers It is set to predict amplicons up to twice themaximum amplicon length specified by the user

22 Computational Examples Computationally predictedtiled primer sets were generated for the viruses and primerspecifications provided in Table 1 MSAs were created withMUSCLE [5] Two settings of split size 119904 and overlap size 119909were used long amplicons with 119904 = 10 000 119909 = 500 or shortamplicons of 119904 = 3000119909 = 500The choice of which set to usecould depend upon the product lengths the polymerase canamplify and the duration of the extension step of PCRThesefairly long amplicons are provided as theoretical examplesUsers may run run tiled primers with shorter amplicons(eg 119904 = 400 bp) to divide the MSA into many more parts

One amplicon per target sequence per region was desired(PriMux option file with - primer selection iterations = 1)Table 1 shows the average genome or segment length thenumber of genomes available for each target the consensusamong those sequences and the total number of primers toamplify all overlapping regions of all genomes All productsfrom the nt database under 7800 bp (shorter amplicon) or26 kb (longer amplicon) were predicted with simulate PCRto identify potential amplification of nontarget organisms(Tables 3 and 4)

23 Murine Hepatitis Virus Example Run tiled primers wasused to design primers for selected regions of the coronavirusmurine hepatitis virus (strain MHV-1) genome followingpassage in the lab for a separate project in which deepsequencing of selected regions following lab passage was per-formed In other work attempting to amplify passaged RNAviruses finding robust primers based on the original genomewas difficult due tomutationswhichmodified primer bindingsites [8] It was hoped that run tiled primers would helpavoid selecting primers in mutational hotspots by taking intoaccount strain variation across multiple available genomesfor the species since run tiled primers seeks maximallyconserved primers in the available sequences

Input to run tiled primers was an alignment of 22MHVgenomes (genome identities provided as supplementaryinformation) created using MUSCLE [5] Regions tiled werethe Nsp1 Nsp3 Nsp14 and several genes at the 31015840 end ofthe genome (regions file provided in supplementary infor-mation) using the primer parameters in Table 2 Primer setswere predicted to produce overlapping amplicons for theseregions from all MHV genomes and a subset of primerspredicted to amplify theMHV-1 orMHV strain JHMgenomewas selected Some primers that were predicted to amplifythe JHM strain but not the MHV-1 strain were included inthe multiplex to check for possible evolutionary change ofthe original sequence toward the annotated reference JHMsequence or cross reactionswith primer-genomemismatches

Advances in Bioinformatics 5

Table 3 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 3 kb amplicons In a multiplex of the 3 kb-amplicon tiled primers for a given organism of the possible reactions producing products only a small number of primer combinationsare predicted to amplify regions in nontarget organisms Counts show the number of unique primer combinations in a multiplex that yieldproducts for any sequence in the NCBI nt nucleotide database The numerator is for any nontarget organism in nt and the denominator isfor any target or nontarget organism in nt that is nonspecifictotal of the possible primer combinations in the multiplex predicted to yieldproduct when compared against nt Vastly more amplicons are produced from target organisms indicating any contaminating nontargetspecies should be a small minority of amplified product

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 01934 mdashCCHF L 03753 mdashRVF S 0137 mdashRVF M 0356 mdashRVF L 0753 mdashEbola 12657 Zea mays clone BAC ZMMBBb0342E21Marburg 01511 mdashHendra 0206 mdashNipah 0286 mdashJunin L 069 mdashMachupo L 0153 mdashJunin S 084 mdashMachupo S 032 mdash

JEV 79515 RocioWest Nile

NW Arena S 561543

IppyLassaLuna

Lymphocytic choriomeningitisMobalaMopeia

NW Arena L 0819 mdash

OW Arena S 732509

AllpahuayoAmapari

Bear canyonChapareCupixi

DandenongFlexal

GuanaritoJuninLatinoLujo

MachupoMethylococcus capsulatus str Bath

ParanaPiritalSabia

TamiamiWhitewater Arroyo

OW Arena L 11826 DandenongTBEV 04925 mdash

6 Advances in Bioinformatics

Table 4 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons As in Table 3 but for themultiplexes of the 10 kb-amplicon tiled primers

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 0261 mdashCCHF L 0253 mdashRVF S 0137 mdashRVF M 0487 mdashRVF L 0195 mdashEbola 0534 mdashMarburg 0123 mdashHendra 050 mdashNipah 074 mdashJunin L 012 mdashMachupo L 07 mdashJunin S 095 mdashMachupo S 032 mdashJEV 01554 mdashNW Arena S 1337 Human chromosome 14 BAC C-2555K7 of library CalTech-DNW Arena L 086 mdashOW Arena S 0316 mdashOW Arena L 0131 mdashTBEV 0189 mdash

Samples from MHV-1 infected mice were provided byDr Richard Bowen at Colorado State University TheMHV-1strain used to infect the mice was obtained from AmericanType Culture Collection (Manassas VA) and viral stock waspropagated in murine fibroblast 17Cl-1 cells then used toinfect C3H mice via intranasal route Mice were sacrificedfour days after inoculation and bronchoalveolar lavage (BAL)fluid was collected RNAwas extracted from the BAL samplesusing Invitrogen TRIZOL reagent as per the manufacturerrsquosinstructions RNA was converted to cDNA using SuperscriptIII (Invitrogen) and random hexamers according to themanufacturerrsquos protocol

Multiplexed primer sets were designed to cover theNsp3 and 31015840 genes with 3 primer pairs per genomic regionamplified when possible (total number of primers testedin two multiplex reactions was 53 Table S1) The primerswere tested in the lab first by testing the primer pairsin individual reactions then as multiplexed reactions Noeffort was made to optimize the PCR cycling conditionsRT-PCR conditions were as follows reverse transcriptionwas performed using random hexamers and the SuperscriptIII RT reverse transcriptase kit (Invitrogen) The MHV-1 cDNA templates were amplified using the Q5 Hot StartHigh-Fidelity DNA Polymerase kit (New England BioLabsIpswich MA) following manufacturerrsquos instructions PCRconditions consisted of 98∘C for 30 s followed by 35 cyclesof 98∘C for 10 s 60∘C for 20 s and 72∘C for 1min The finalcycle was 72∘C for 2min

Two multiplex reactions were set up with each contain-ing a group of nonoverlapping primer sets (Figure 2) Forexample multiplex ldquoArdquo included primer sets A C E G and

I and multiplex ldquoBrdquo had primer sets B D F and H Bystaggering the primer sets into different multiplex reactionsthe amplification of overlapping primer regions created bythe reverse primer from one set with the forward primer ofthe overlapping adjacent primer set was eliminatedWithoutthis strategy these overlapping primer sets would dominatethe PCR reaction due to the small size of these amplicons

The amplification of each primer pair in the multiplexwas tested using a seminested PCR strategy to verify thatthe correct specific amplicons were being produced fromeach multiplex of primers for a given region (Figure 2 TableS2) The multiplex PCR products served as templates forPCR reactions with primer pairs that included the reverseprimer of one region paired with the forward primer fromthe downstream adjacent region to determine if the templategenerated from the multiplex was present To ensure thatthe PCR product was generated from the multiplex producttemplate rather than genomic DNA carried over from theinitial sample the multiplex product template was diluted1 10000 or excised from a gel and purified prior to use asa template

3 Results and Discussion

All the primers for both (119904 119909) settings are provided asSupplementary data as are the predicted amplicon start andend positions in each target genome from a multiplex ofthe primers for a given viral target set Tiled amplificationof these viruses required from 2 to 116 primers (Table 1)Primers are predicted to be specific to the target organismsfor the most part although not exclusively (Tables 3 and 4)

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Multiplex Degenerate Primer Design for ...

Advances in Bioinformatics 5

Table 3 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 3 kb amplicons In a multiplex of the 3 kb-amplicon tiled primers for a given organism of the possible reactions producing products only a small number of primer combinationsare predicted to amplify regions in nontarget organisms Counts show the number of unique primer combinations in a multiplex that yieldproducts for any sequence in the NCBI nt nucleotide database The numerator is for any nontarget organism in nt and the denominator isfor any target or nontarget organism in nt that is nonspecifictotal of the possible primer combinations in the multiplex predicted to yieldproduct when compared against nt Vastly more amplicons are produced from target organisms indicating any contaminating nontargetspecies should be a small minority of amplified product

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 01934 mdashCCHF L 03753 mdashRVF S 0137 mdashRVF M 0356 mdashRVF L 0753 mdashEbola 12657 Zea mays clone BAC ZMMBBb0342E21Marburg 01511 mdashHendra 0206 mdashNipah 0286 mdashJunin L 069 mdashMachupo L 0153 mdashJunin S 084 mdashMachupo S 032 mdash

JEV 79515 RocioWest Nile

NW Arena S 561543

IppyLassaLuna

Lymphocytic choriomeningitisMobalaMopeia

NW Arena L 0819 mdash

OW Arena S 732509

AllpahuayoAmapari

Bear canyonChapareCupixi

DandenongFlexal

GuanaritoJuninLatinoLujo

MachupoMethylococcus capsulatus str Bath

ParanaPiritalSabia

TamiamiWhitewater Arroyo

OW Arena L 11826 DandenongTBEV 04925 mdash

6 Advances in Bioinformatics

Table 4 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons As in Table 3 but for themultiplexes of the 10 kb-amplicon tiled primers

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 0261 mdashCCHF L 0253 mdashRVF S 0137 mdashRVF M 0487 mdashRVF L 0195 mdashEbola 0534 mdashMarburg 0123 mdashHendra 050 mdashNipah 074 mdashJunin L 012 mdashMachupo L 07 mdashJunin S 095 mdashMachupo S 032 mdashJEV 01554 mdashNW Arena S 1337 Human chromosome 14 BAC C-2555K7 of library CalTech-DNW Arena L 086 mdashOW Arena S 0316 mdashOW Arena L 0131 mdashTBEV 0189 mdash

Samples from MHV-1 infected mice were provided byDr Richard Bowen at Colorado State University TheMHV-1strain used to infect the mice was obtained from AmericanType Culture Collection (Manassas VA) and viral stock waspropagated in murine fibroblast 17Cl-1 cells then used toinfect C3H mice via intranasal route Mice were sacrificedfour days after inoculation and bronchoalveolar lavage (BAL)fluid was collected RNAwas extracted from the BAL samplesusing Invitrogen TRIZOL reagent as per the manufacturerrsquosinstructions RNA was converted to cDNA using SuperscriptIII (Invitrogen) and random hexamers according to themanufacturerrsquos protocol

Multiplexed primer sets were designed to cover theNsp3 and 31015840 genes with 3 primer pairs per genomic regionamplified when possible (total number of primers testedin two multiplex reactions was 53 Table S1) The primerswere tested in the lab first by testing the primer pairsin individual reactions then as multiplexed reactions Noeffort was made to optimize the PCR cycling conditionsRT-PCR conditions were as follows reverse transcriptionwas performed using random hexamers and the SuperscriptIII RT reverse transcriptase kit (Invitrogen) The MHV-1 cDNA templates were amplified using the Q5 Hot StartHigh-Fidelity DNA Polymerase kit (New England BioLabsIpswich MA) following manufacturerrsquos instructions PCRconditions consisted of 98∘C for 30 s followed by 35 cyclesof 98∘C for 10 s 60∘C for 20 s and 72∘C for 1min The finalcycle was 72∘C for 2min

Two multiplex reactions were set up with each contain-ing a group of nonoverlapping primer sets (Figure 2) Forexample multiplex ldquoArdquo included primer sets A C E G and

I and multiplex ldquoBrdquo had primer sets B D F and H Bystaggering the primer sets into different multiplex reactionsthe amplification of overlapping primer regions created bythe reverse primer from one set with the forward primer ofthe overlapping adjacent primer set was eliminatedWithoutthis strategy these overlapping primer sets would dominatethe PCR reaction due to the small size of these amplicons

The amplification of each primer pair in the multiplexwas tested using a seminested PCR strategy to verify thatthe correct specific amplicons were being produced fromeach multiplex of primers for a given region (Figure 2 TableS2) The multiplex PCR products served as templates forPCR reactions with primer pairs that included the reverseprimer of one region paired with the forward primer fromthe downstream adjacent region to determine if the templategenerated from the multiplex was present To ensure thatthe PCR product was generated from the multiplex producttemplate rather than genomic DNA carried over from theinitial sample the multiplex product template was diluted1 10000 or excised from a gel and purified prior to use asa template

3 Results and Discussion

All the primers for both (119904 119909) settings are provided asSupplementary data as are the predicted amplicon start andend positions in each target genome from a multiplex ofthe primers for a given viral target set Tiled amplificationof these viruses required from 2 to 116 primers (Table 1)Primers are predicted to be specific to the target organismsfor the most part although not exclusively (Tables 3 and 4)

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Multiplex Degenerate Primer Design for ...

6 Advances in Bioinformatics

Table 4 Number of nontarget amplicons predicted in a multiplex reaction of tiled primers for 10 kb amplicons As in Table 3 but for themultiplexes of the 10 kb-amplicon tiled primers

Organism Nontarget ampliconstotal amplicons Nontarget amplicon source organismCCHF S 0160 mdashCCHF M 0261 mdashCCHF L 0253 mdashRVF S 0137 mdashRVF M 0487 mdashRVF L 0195 mdashEbola 0534 mdashMarburg 0123 mdashHendra 050 mdashNipah 074 mdashJunin L 012 mdashMachupo L 07 mdashJunin S 095 mdashMachupo S 032 mdashJEV 01554 mdashNW Arena S 1337 Human chromosome 14 BAC C-2555K7 of library CalTech-DNW Arena L 086 mdashOW Arena S 0316 mdashOW Arena L 0131 mdashTBEV 0189 mdash

Samples from MHV-1 infected mice were provided byDr Richard Bowen at Colorado State University TheMHV-1strain used to infect the mice was obtained from AmericanType Culture Collection (Manassas VA) and viral stock waspropagated in murine fibroblast 17Cl-1 cells then used toinfect C3H mice via intranasal route Mice were sacrificedfour days after inoculation and bronchoalveolar lavage (BAL)fluid was collected RNAwas extracted from the BAL samplesusing Invitrogen TRIZOL reagent as per the manufacturerrsquosinstructions RNA was converted to cDNA using SuperscriptIII (Invitrogen) and random hexamers according to themanufacturerrsquos protocol

Multiplexed primer sets were designed to cover theNsp3 and 31015840 genes with 3 primer pairs per genomic regionamplified when possible (total number of primers testedin two multiplex reactions was 53 Table S1) The primerswere tested in the lab first by testing the primer pairsin individual reactions then as multiplexed reactions Noeffort was made to optimize the PCR cycling conditionsRT-PCR conditions were as follows reverse transcriptionwas performed using random hexamers and the SuperscriptIII RT reverse transcriptase kit (Invitrogen) The MHV-1 cDNA templates were amplified using the Q5 Hot StartHigh-Fidelity DNA Polymerase kit (New England BioLabsIpswich MA) following manufacturerrsquos instructions PCRconditions consisted of 98∘C for 30 s followed by 35 cyclesof 98∘C for 10 s 60∘C for 20 s and 72∘C for 1min The finalcycle was 72∘C for 2min

Two multiplex reactions were set up with each contain-ing a group of nonoverlapping primer sets (Figure 2) Forexample multiplex ldquoArdquo included primer sets A C E G and

I and multiplex ldquoBrdquo had primer sets B D F and H Bystaggering the primer sets into different multiplex reactionsthe amplification of overlapping primer regions created bythe reverse primer from one set with the forward primer ofthe overlapping adjacent primer set was eliminatedWithoutthis strategy these overlapping primer sets would dominatethe PCR reaction due to the small size of these amplicons

The amplification of each primer pair in the multiplexwas tested using a seminested PCR strategy to verify thatthe correct specific amplicons were being produced fromeach multiplex of primers for a given region (Figure 2 TableS2) The multiplex PCR products served as templates forPCR reactions with primer pairs that included the reverseprimer of one region paired with the forward primer fromthe downstream adjacent region to determine if the templategenerated from the multiplex was present To ensure thatthe PCR product was generated from the multiplex producttemplate rather than genomic DNA carried over from theinitial sample the multiplex product template was diluted1 10000 or excised from a gel and purified prior to use asa template

3 Results and Discussion

All the primers for both (119904 119909) settings are provided asSupplementary data as are the predicted amplicon start andend positions in each target genome from a multiplex ofthe primers for a given viral target set Tiled amplificationof these viruses required from 2 to 116 primers (Table 1)Primers are predicted to be specific to the target organismsfor the most part although not exclusively (Tables 3 and 4)

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Multiplex Degenerate Primer Design for ...

Advances in Bioinformatics 7

ORF1a ORF1b 2a HE Spike 4 5 E M N

AF AR CF CR

BF BR DF DR

Primer sets included in PCR mix A

Primer sets included in PCR mix B

EF ER GF GR IF IR

HF HR FF FR

In separate seminested PCR reactions forward primers were paired with a reverse primer from an overlapping reaction to verify that product was generated for each overlapping region

5998400

3998400

Figure 2 Diagram of the murine hepatitis virus (MHV) genome regions for which primer sets were testedThe approximate position of eachregion amplified by primer sets is shown (MHV genome is not drawn to scale) Each multiplex reaction consisted of primer sets that do notoverlap in regions amplified Each region is amplified using 3 forward primers and 3 reverse primers (Table S1 see Supplementary Materialavailable online at httpdxdoiorg1011552014101894) For example the A primer set consists of 3 forward primers (A1F A2F and A3F)and 3 reverse primers (A1R A2R and A3R) To verify that each region is amplified in the multiplex reaction a second set of seminestedPCRs were performed using the amplicons from themultiplex reaction as a template For example to ensure region A was amplified the PCRproduct from the A mix multiplex was diluted 1 10000 and used as template in a PCR reaction with AR1 primer paired with BF2 (Table S2)Primers are labeled according to genome region (A-I) and primer direction (F = forward R = reverse)

The few cases of off-target amplification come from closelyrelated organisms in the same family such asOldWorld (OW)and New World (NW) Arenaviruses or other Flavivirusesamplified by the Japanese encephalitis virus (JEV) multiplexThe three exceptions were a single amplicon of 2830 bpfrom a BAC clone of Zea mays (maize) from the Ebola 3 kbmultiplex a single amplicon of 3610 bp from Methylococcuscapsulatus str Bath from the OW Arena S segment 3 kbmultiplex and a single amplicon of 851 bp from a humanBAC from a library at CalTech All three of these predictednontarget amplicons result from a single primer in each ofthose reactions performing as both forward primer (FP) andreverse primer (RP) Nonetheless the primer multiplexesdescribed here should strongly favor the preferential enrich-ment of desired targets

Deriving each primer set required multiple sequencealignment and a call to run tile primers in the currentPriMux software distribution (httpsourceforgenetpro-jectsprimux) In comparison primer design with the JCVIpipeline for any of these target sets would require the follow-ing steps (1) inspection of a phylogeny for the full target setto build multiple smaller clade-level sets with no more than10 sequence variation (2) realignment of the clade-levelsets (3) running of the JCVI pipeline on each clade set (4)assessing which target sequences are not amplified after onedesign round and rerun the pipeline on those sequences foreach clade (5) and repeating step 4 until all target sequencesare predicted to be amplified

4 MHV ResultsMultiplexed primers were tested in the lab as primer pairs inindividual reactions then as multiplexed reactions Twenty-two of the primer pairs worked and four failed to give a prod-uct and were paired with other primers in subsequent testing

or if necessary replaced with an alternative primer Ampli-cons were detected in the expected size ranges confirmingamplification of the expected regions from the multiplexedsets (Figure S1) In some cases extra bands were presentbut they were generally smaller than the targeted size thiswas common when the template cDNA was obtained from aclinical sample rather than high titer cell culture derived viralstock from this studyThe PCR products generated with thesehighly multiplexed assays were then sequenced using Illu-mina ultradeep sequencing with a high fidelity polymeraseThese primers yielded high coverage averaging 150000x ofthe genomic regions amplified by the multiplex primers

5 ConclusionsSoftware is described to generate tiled multiplex and degen-erate amplification primers to span entire genomes or regionsof many variant sequences This tool should facilitate theamplification of overlapping products across whole genomesor user-specified regions of target sets with high levels ofvariation Applications include target enrichment for viraldiscovery of new members in a viral family from a complexhost background improving high throughput sequencingsensitivity and coverage of a rapidly evolving virus orenriched coverage of variants in a gene family

Conflict of InterestsThe authors declare that there is no conflict of interestsregarding the publication of this paper

AcknowledgmentsThis work was supported by the Department of HomelandSecurity Bioforensics Program through contract HSHQPM-10-X-00078P00001 and the Defense Threat Reduction

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Multiplex Degenerate Primer Design for ...

8 Advances in Bioinformatics

Agency through Contract DTRA10027IA-3497 to LawrenceLivermore National Laboratory This work performed underthe auspices of the US Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344

References

[1] M M H Yang A Singhal S R Rassekh S Yip P Eydoux andC Dunham ldquoPossible differentiation of cerebral glioblastomainto pleomorphic xanthoastrocytoma an unusual case in aninfantrdquo Journal of Neurosurgery Pediatrics vol 9 no 5 pp 517ndash523 2012

[2] E Schulz A Valentin P Ulz et al ldquoGermline mutations in theDNAdamage response genes BRCA1 BRCA2 BARD1 andTP53in patients with therapy related myeloid neoplasmsrdquo Journal ofMedical Genetics vol 49 no 7 pp 422ndash428 2012

[3] D A Hysom P Naraghi Arani M Elsheikh A C Carrillo P LWilliams and S N Gardner ldquoSkip the alignment degeneratemultiplex primer and probe design using k-mer matchinginstead of alignmentsrdquo PLoS ONE vol 7 no 4 Article IDe34560 2012

[4] K Li S Shrivastava A Brownley et al ldquoAutomated degeneratePCR primer design for high-throughput sequencing improvesefficiency of viral sequencingrdquo Virology Journal vol 9 article261 2012

[5] R C Edgar ldquoMUSCLE multiple sequence alignment with highaccuracy and high throughputrdquo Nucleic Acids Research vol 32no 5 pp 1792ndash1797 2004

[6] N R Markham and M Zuker ldquoUNAFold software for nucleicacid folding and hybridizationrdquo Methods in Molecular Biologyvol 453 pp 3ndash31 2008

[7] S N Gardner and T Slezak ldquoSimulate PCR for ampliconprediction and annotation from multiplex degenerate primersand probesrdquo BMC Bioinformatics vol 15 article 237 2014

[8] M K Borucki J E Allen H Chen-Harris et al ldquoThe role ofviral population diversity in adaptation of bovine coronavirusto new host environmentsrdquo PLoS ONE vol 8 no 1 Article IDe52752 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Multiplex Degenerate Primer Design for ...

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology