Research Article A mathematical model for suppression subtractive...

19
Comparative and Functional Genomics Comp Funct Genom 2002; 3: 405–422. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.206 Research Article A mathematical model for suppression subtractive hybridization Chetan Gadgil 1 , Anette Rink 2 , Craig Beattie 2 and Wei-Shou Hu 1 * 1 Department of Chemical Engineering and Materials Science, 421 Washington Avenue SE, University of Minnesota, Minneapolis, MN 55455, USA 2 Department of Animal Biotechnology, School of Veterinary Medicine, University of Nevada, Reno, NV 89557, USA *Correspondence to: Wei-Shou Hu, Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455, USA. E-mail: [email protected] Received: 27 February 2002 Accepted: 1 August 2002 Abstract Suppression subtractive hybridization (SSH) is frequently used to unearth differen- tially expressed genes on a whole-genome scale. Its versatility is based on combining cDNA library subtraction and normalization, which allows the isolation of sequences of varying degrees of abundance and differential expression. SSH is a complex process with many adjustable parameters that affect the outcome of gene isolation. We present a mathematical model of SSH based on DNA hybridization kinetics for assessing the effect of various parameters to facilitate its optimization. We derive an equation for the probability that a particular differentially expressed species is successfully isolated and use this to quantify the effect of the following parameters related to the cDNA sample: (a) mRNA abundance; (b) partial sequence complementarity to other species; and (3) degree of differential expression. We also evaluate the effect of parameters related to the process, including: (a) reaction times; and (b) extent of driver excess used in the two hybridization reactions. The optimum set of pro- cess parameters for successful isolation of differentially expressed species depends on transcript abundance. We show that the reaction conditions have a significant effect on the occurrence of false-positives and formulate strategies to isolate specific subsets of differentially expressed genes. We also quantify the effect of non-specific hybridization on the false-positive results and present strategies for spiking cDNA sequences to address this problem. Copyright 2002 John Wiley & Sons, Ltd. Keywords: suppression subtractive hybridization; mathematical model; hybridiza- tion kinetics Introduction Developmental processes such as ageing, metamor- phosis, and embryo development are associated with changes in gene expression (Hill et al., 2000; Lee et al., 1999; White et al., 1999). Cells exposed to different extracellular environments, at differ- ent metabolic levels or pathophysiologic states, also exhibit different profiles of gene expression (Alizadeh and Staudt, 2000; Bittner et al., 2000; DeRisi et al., 1997; Oh and Liao, 2000). The trans- formation of genotype to a variety of phenotypes is characterized by differential gene expression from the same repertoire of sequence information. An important first step in the elucidation of the molec- ular mechanisms responsible for altered physiolog- ical states or developmental pathways is the iden- tification of genes that are differentially regulated at the transcriptional level. Several methods to profile gene expression have been developed. The expression profile for each sample may be estimated separately and then compared by methods that depend on specific hybridization of probes to DNA microarrays (Lip- shutz et al., 1999) or on the counting of tags or signatures of DNA fragments (Brenner et al., 2000; Velculescu et al., 1995). Differences in gene expression between two samples can be compared Copyright 2002 John Wiley & Sons, Ltd.

Transcript of Research Article A mathematical model for suppression subtractive...

Page 1: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Comparative and Functional GenomicsComp Funct Genom 2002; 3: 405–422.Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.206

Research Article

A mathematical model for suppressionsubtractive hybridization

Chetan Gadgil1, Anette Rink2, Craig Beattie2 and Wei-Shou Hu1*1 Department of Chemical Engineering and Materials Science, 421 Washington Avenue SE, University of Minnesota, Minneapolis, MN55455, USA2 Department of Animal Biotechnology, School of Veterinary Medicine, University of Nevada, Reno, NV 89557, USA

*Correspondence to:Wei-Shou Hu, Department ofChemical Engineering andMaterials Science, University ofMinnesota, Minneapolis, MN55455, USA.E-mail: [email protected]

Received: 27 February 2002Accepted: 1 August 2002

AbstractSuppression subtractive hybridization (SSH) is frequently used to unearth differen-tially expressed genes on a whole-genome scale. Its versatility is based on combiningcDNA library subtraction and normalization, which allows the isolation of sequencesof varying degrees of abundance and differential expression. SSH is a complex processwith many adjustable parameters that affect the outcome of gene isolation. We presenta mathematical model of SSH based on DNA hybridization kinetics for assessing theeffect of various parameters to facilitate its optimization. We derive an equationfor the probability that a particular differentially expressed species is successfullyisolated and use this to quantify the effect of the following parameters related tothe cDNA sample: (a) mRNA abundance; (b) partial sequence complementarity toother species; and (3) degree of differential expression. We also evaluate the effectof parameters related to the process, including: (a) reaction times; and (b) extentof driver excess used in the two hybridization reactions. The optimum set of pro-cess parameters for successful isolation of differentially expressed species dependson transcript abundance. We show that the reaction conditions have a significanteffect on the occurrence of false-positives and formulate strategies to isolate specificsubsets of differentially expressed genes. We also quantify the effect of non-specifichybridization on the false-positive results and present strategies for spiking cDNAsequences to address this problem. Copyright 2002 John Wiley & Sons, Ltd.

Keywords: suppression subtractive hybridization; mathematical model; hybridiza-tion kinetics

Introduction

Developmental processes such as ageing, metamor-phosis, and embryo development are associatedwith changes in gene expression (Hill et al., 2000;Lee et al., 1999; White et al., 1999). Cells exposedto different extracellular environments, at differ-ent metabolic levels or pathophysiologic states,also exhibit different profiles of gene expression(Alizadeh and Staudt, 2000; Bittner et al., 2000;DeRisi et al., 1997; Oh and Liao, 2000). The trans-formation of genotype to a variety of phenotypes ischaracterized by differential gene expression fromthe same repertoire of sequence information. An

important first step in the elucidation of the molec-ular mechanisms responsible for altered physiolog-ical states or developmental pathways is the iden-tification of genes that are differentially regulatedat the transcriptional level.

Several methods to profile gene expression havebeen developed. The expression profile for eachsample may be estimated separately and thencompared by methods that depend on specifichybridization of probes to DNA microarrays (Lip-shutz et al., 1999) or on the counting of tagsor signatures of DNA fragments (Brenner et al.,2000; Velculescu et al., 1995). Differences in geneexpression between two samples can be compared

Copyright 2002 John Wiley & Sons, Ltd.

Page 2: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

406 C. Gadgil et al.

directly by methods such as differential display(Liang and Pardee, 1992), two-colour microarrayhybridization (Brown and Botstein, 1999), subtrac-tive cloning techniques (Sagerstrom et al., 1997),and combinations of these (Pardinas et al., 1998;Yang et al., 1999). These approaches have beensuccessfully used to identify genes differentiallyexpressed in two populations that exhibit largechanges in expression levels, or genes that areexpressed at high concentrations in terms of num-ber of copies per cell. Closed systems such as DNAmicroarrays require genomic sequence informationin order to identify differentially expressed tran-scripts. Open systems have the flexibility of iden-tifying uncatalogued sequences. However, manytechniques have a low efficiency of identifying raregenes that are differentially expressed (Martin andPardee, 2000). This problem is exacerbated whenthe change in expression level of rare transcriptsis small. Since genes expressed at low levels alsoplay a role in establishing differentiated pheno-types, their identification is essential for a completemechanistic understanding of cellular changes.

Suppression subtractive hybridization

Suppression subtractive hybridization (SSH), atechnique to identify a set of genes differentiallyexpressed in two cell samples, has the promise ofovercoming some of these difficulties (Diatchenkoet al., 1996). The singular advantage of SSH liesin the ability to identify differentially expressedgenes, irrespective of the level of expression, inthe absence of sequence information. SSH has beenused to investigate differential expression in a vari-ety of experimental systems, including malignantmelanoma (Hipfel et al., 2000), liver regeneration(Groenink and Aad, 1996), embryo development(Simpson et al., 1999) and honeybee larval devel-opment (Evans and Wheeler, 1999). SSH identifieddifferentially expressed sequences with no matchesin the public databases in all these systems.

The SSH process normalizes the levels of rareand abundant genes, and subtracts genes expressedin both samples. Genes upregulated in one sample(referred to as tester) relative to the other sample(called the driver) can be identified. The SSH pro-cess (Figure 1) entails two rounds of hybridizationfollowed by two PCR reactions (Diatchenko et al.,1996). Poly A+ mRNA is isolated from total RNAand reverse-transcribed to give a double-stranded

cDNA pool. The cDNA is digested by RsaI, result-ing in fragments 0.1–2 kb long. This step reducesthe size distribution of cDNA species and createsblunt ends for adaptor ligation. The tester is dividedinto two equal parts (referred to as tester A andtester B) and ligated with different adaptors (adap-tor A and adaptor B) at the 5′ end of each fragment.In the first set of hybridizations (hybridization 1Aand hybridization 1B), an excess of driver sam-ple is added to each tester fraction separately, andthe reactions are allowed to proceed under identi-cal conditions. Among species present at the sameconcentration in the tester, those present in similaror higher levels in the driver will form duplexesat a faster rate than those whose concentrationin the driver is lower. This leads to an enrich-ment of single-stranded species that are present ata higher level in the tester. Due to the second-orderhybridization process, normalization of the concen-tration of single-stranded species is also achieved,as abundant species form duplexes at a higher ratethan rare species. In the second hybridization, theend products of hybridization 1A and hybridization1B are mixed and additional excess single-strandeddriver is added for further subtraction. Unsub-tracted single stranded species from hybridization1A and hybridization 1B form duplexes in whichone strand has adaptor A and the other strand hasadaptor B. The duplex species formed during thetwo hybridization steps are shown in Figure 1.

After the hybridization reactions, end-filling ofduplexes with adaptor overhangs is carried out toform blunt-ended DNA. The duplexes are thenamplified by PCR using adaptor A and adaptor B asprimers. This leads to a differential amplification,depending on the nature of the duplex. Thoseduplexes in which the two strands have differentadaptors are exponentially amplified in the PCRreactions. Duplexes in which both strands haveidentical adaptors at both ends form panhandle-like structures because of the self-complementarynature of the adaptors and are not amplified.Duplexes with an adaptor only at one end arelinearly amplified.

The PCR products are then ligated into vectorsthat are used to transform Escherichia coli. In asuccessful SSH the frequency of the sequencesisolated from E. coli clones is greater for genesthat are expressed at a higher level in the tester. Toidentify genes that are downregulated in the sample

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 3: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 407

Two rounds of PCR with primers and

Tester

Adaptor A Adaptor B

Tester A

Hybridization 1B Products

Driverx E1

Tester B

Driverx E2

LinearAmplification No Amplification

ExponentialAmplification

Tester-Driver Driver-Driver

Tester Homoduplex

Tester Heteroduplex

Hybridization 1A Products

thyb1B

68 °C

thyb2

68 °C

thyb1A

68 °C

MBiMA

i MDi

[MBi MD

i ]

[MBi MD

i ]

[MAi MD

i ]

[MAi MD

i ]

[MAi MA

i ]

[MAi MB

i ]

[MAi MA

i ]

[MBi MB

i ]

[MBi MB

i ]

[MDi MD

i ][MDi MD

i ]

[MDi MD

i ]

Single Strands

Driver-Driver

Single Strands

Ligation into vectors

Hybridization 2 Products

End-filling Reaction for duplexes

Figure 1. Schematic of the SSH process depicting the cDNA species formed during the hybridization reactions

used as the tester, a reverse SSH is carried out byswitching the samples used as tester and driver.

SSH accomplishes normalization and subtractionby taking advantage of the different rates ofhybridization of cDNA strands for different genes,depending on their abundance level and the degree

of (differential) expression. The extent of hybridiza-tion is governed by the hybridization temperature,hybridization times and the driver : tester ratio.The effect of these operating parameters on theefficiency of normalization and subtraction, andthus the probability that a particular differentially

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 4: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

408 C. Gadgil et al.

expressed gene is isolated, differs depending oneach gene’s abundance level and extent of differ-ential expression. As the conditions that lead tothe highest probability of successful isolation varywith the abundance, level of differential expression,length of the transcript and degree of sequencesimilarity to other transcripts, there may not be aunique optimal condition of SSH for isolating alldifferentially expressed genes. The large number ofparameters that affect the outcome of SSH suggestthe use of a mathematical model to facilitate theselection of experimental conditions.

Mathematical models for subtractivehybridization

Attempts to model various techniques of isolatingdifferentially expressed transcripts have assumedDNA hybridization to occur as a simple irreversiblebimolecular reaction, and used the hybridizationmodel developed by Wetmur and co-workers (Wet-mur, 1976; Wetmur and Davidson, 1968). Ermo-laeva et al. (1996; Ermolaeva and Wagner, 1995)developed a mathematical model for a subtrac-tive hybridization process where the tester con-tains only three differentially expressed transcripts,which are completely absent in the driver, andpresented an analytical solution for the case oflarge driver excess. The subtractive hybridizationmodel of Cho and Park (1998) explored the effectof differing tester : driver ratios on target identi-fication in cases where target sequences presentin the tester are totally absent in the driver. Mil-ner et al. (1995) have presented a model for sub-tractive hybridization of genomic deletion mutantsand enrichment of upregulated sequences, and pre-dicted target enrichment in genomic subtractionsconsistent with experimental results. This model,however, does not present an analysis of the prob-ability of isolation of a particular differentiallyexpressed sequence.

Since it is more likely that genes are differ-entially regulated from a basal level rather thanbeing switched on or off (Gurskaya et al., 1996),a mathematical model for a process that identi-fies sequences differentially expressed between twocell samples should consider all possible levelsof regulation. Microarray experiments and studiesusing serial analysis of gene expression (SAGE)(Zhang et al., 1997) have clearly demonstrated arange in the level of gene expression, with the

majority of genes exhibiting limited if not neg-ligible differential expression (Sagerstrom et al.,1997). Therefore, any attempt to model the SSHprocess should also consider the concentrations ofnon-differentially expressed species that remain atthe end of the process. This concentration can thenbe used to estimate the number of false-positives,and the probability that the process will isolatea particular differentially expressed species. In acDNA pool, there exist several sequences that havea partial sequence homology to each other. The for-mation of chimeric cDNAs during the SSH processhas been observed (Zhang et al., 2000). A compre-hensive mathematical model for a process basedon DNA hybridization should account for non-specific hybridizations between strands that havepartial sequence homology.

In this report, we present a model for theSSH process that enables analysis of the effectsof process variables, such as the amount ofdriver excess and the hybridization times on theprobability of identifying differentially expressedcDNA species having a certain abundance, rel-ative expression level, and degree of sequencesimilarity to other sequences in the hybridiza-tion mix.

Model development

We represent single-stranded cDNA for species i(i = 1, 2, . . . nT) by MA

i , MBi or MD

i where thesuperscript denotes the type of adaptor (adaptorA, adaptor B, or no adaptor) and nT is the totalnumber of distinct sequences. After hybridizationreactions, each single-stranded species Mp

i (p = A,B or D) may form three types of duplexes Mp

i Mqj :

(a) homo-duplexes with complete sequence match(i = j) and with the same adaptors on both strandsor both strands from the driver (p = q); (b) homo-duplexes of strands that are perfectly complemen-tary (i = j) except for the adaptors (p �= q); and (c)heteroduplexes of partially complementary strands(i �= j). We assume that the behaviour of the senseand antisense strands of each species will be com-pletely symmetrical. Therefore, the model focuseson one strand.

The mass balance equations for the singlestrands Mp

i and duplex Mpi Mq

j during the hybridiza-tion reactions are given by Equations 1 and 2,

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 5: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 409

respectively:

d [Mpi ]

dt= −

∑q=A,B,D

nT∑j=1

(kfp,qi,j [Mp

i ][Mqj ])

+∑

q=A,B,D

nT∑j=1

(kbp,qi,j [Mp

i Mqj ]) (1)

d [Mpi Mq

j ]

dt= 1

1 + δijδpq(kfp,q

i,j [Mpi ][Mq

j ]

− kbp,qi,j [Mp

i Mqj ]) (2)

where δ is the Kroneker delta function (δ = 1 ifsubscripts are equal, 0 otherwise). The first termin each equation represents duplex formation (eachspecies Mp

i may react with all other single strandsMq

j to form the corresponding duplex Mpi Mq

j ) andthe second term represents the reverse reaction,i.e. duplex melting. Such equations are formu-lated for all species and solved simultaneously toobtain the concentrations of various single-strandedand duplex molecules during the course of thehybridization reactions.

Model development for ideal hybridization

In our initial analysis, it is assumed that only per-fectly complementary strands react irreversibly toform duplexes (i.e. kfp,q

i,j = 0 for i �= j and kbp,qi,j =

0∀i,j). The rate constant for the duplex forma-tion reaction is assumed to be the same for allspecies, i.e. kfp,q

i,i = constant = kf. Initial concen-trations for the single-stranded and duplex speciesat the beginning of the hybridization reactions aregiven in Table 1. Using these assumptions to sim-plify Equations 1 and 2, the mass balance equationsfor each step of the hybridization processes areformulated and listed in the Appendix. The con-centration of duplexes with two different adaptors,A and B, at the end of the hybridization processcan be expressed in terms of the initial concen-tration of the single-stranded species [MA

i ]0,1, therelative expression ratio κi, the hybridization rateconstant kf and the hybridization time (thyb1, thyb2)and driver excess (E1, E2) for the two reactions as:

[MAi MB

i ]|thyb2

= [MAi ]2

0,1kfthyb2κ3i

6(2κi + [MAi ]0,1 kf thyb1(E1 + κi))([MA

i ]20,1

×E2thyb1thyb2(E1 + κi) k2f + [MA

i ]0,1κi×(3(E1 + κi) thyb1 + 2(E1 + E2 + κi)

× thyb2) kf + 6 κ2i )

(3)The duplex concentration remains unchanged dur-ing the end-filling reaction. During the finalPCR reaction, species are amplified exponentially,amplified linearly, or not amplified depending onthe nature of the adaptors. After the PCR step, the

Table 1. Initial conditions for hybridization reactions

Species Hybridization 1A Hybridization 1B Hybridization 2

[MAi ] [MA

i ]0,1 = [MAi ]tester2 0

[MAi ]∗3

[MBi ] 0

[MAi ]tester2

[MAi ]∗3

[MDi ]

[MAi ]tester2 κi

E1[MA

i ]tester2 κi

E1

2[MD

i ]0,1

κiE2 + 2 [MD

i ]∗

3

[MAi MA

i ] 0 0[MA

i MAi ]∗

3

[MBi MB

i ] 0 0[MA

i MAi ]∗

3

[MDi MD

i ] 0 02 [MD

i MDi ]∗

3

[MAi MD

i ] 0 0[MA

i MDi ]∗

3

[MBi MD

i ] 0 0[MB

i MDi ]∗

3[MA

i MBi ] 0 0 0

∗ Concentration at the end of hybridization 1, calculated from equations A.6–A.10.

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 6: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

410 C. Gadgil et al.

total DNA corresponding to gene i available forligation is:

[Di] = [MAi MB

i ]|thyb2 × 2nPCR + ([MAi MD

i ]|thyb2

+ [MBi MD

i ]|thyb2) × nPCR + [MAi ]|thyb2

+ [MBi ]|thyb2 (4)

In a typical SSH, the number of PCR cycles (nPCR)is high, and hence 2nPCR � nPCR. The right-handside of Equation 4 is dominated by the first term,and can be expressed as:

[Di] = [MAi MB

i ]|thyb2 × 2nPCR (5)

After the PCR reaction, a subtracted cDNA libraryis constructed and Ncol colonies are picked forfurther analysis as putative differentially expressedgenes. The probability ps that species s is amongthose Ncol colonies depends on the fraction (fs) ofthe PCR product corresponding to species s. Theprobability that, of Ncol colonies, none correspondsto species s is (1 − fs)

Ncol . Hence, the probabilitythat at least one colony corresponds to s is:

ps = 1 − (1 − fs)Ncol (6)

where fs is the ratio of the DNA concentrationcorresponding to species s available for ligationto the total DNA concentration for all nT speciesavailable for ligation:

fs = DsnT∑

i=1

Di

(7)

Substituting Equation 5 in Equation 7, we get theexpression for the probability of identification ofspecies s for ideal hybridizations as:

fs =[MA

s MBs ]

∣∣thyb2

nT∑i=1

[MAi MB

i ]

∣∣∣∣thyb2

(8)

The mathematical framework outlined here pro-vides an analytical expression for the concentrationof the different duplex species that are formed afterthe two hybridization steps as a function of theinitial concentrations of the cDNA single strands

and the reaction conditions (Equation 3). Substi-tuting this expression in Equation 8, the probabilityof isolation of a particular differentially expressedgene can be obtained.

Non-specific hybridization of partiallycomplementary strands

To account for non-specific hybridizations in whichsingle strands that are not perfectly complemen-tary hybridize to form heteroduplexes of the typeMp

i Mqj (i �= j), we consider the case where there

exist two species, i and j, with varying degreesof sequence complementarity. Equations 1 and 2are solved simultaneously to simulate the system.If the number of species that have partially com-plementary sequences is Ns , it can be shown thatwe have to solve 4.5 Ns (Ns + 1) simultaneousordinary differential equations to fully simulate thesystem. Since this number scales as the square ofthe species involved, the problem quickly becomescomputationally intractable with just a few species.

Through numerical simulations of these modelequations, the concentration of duplex species thatare formed from the hybridization of one strandwith adaptor A with another strand with adaptorB ([MA

i MBj ]) can be simulated. An assumption is

made that such a heteroduplex is not dissociatedduring the end-filling step. The heteroduplexes willthen be amplified by the PCR process. The totalamount of DNA available for ligation after the PCRsteps, given for ideal hybridizations by Equation 4,can be rewritten for this situation as:

[Di] =

Ns∑j=1

[MAi MB

j ]

∣∣∣∣thyb2

+ [MAi MB

i ]

∣∣∣∣thyb2

× 2nPCR

+

Ns∑j=1

[MAi MD

j ]

∣∣∣∣thyb2

+Ns∑j=1

[MBi MD

j ]

∣∣∣∣thyb2

× nPCR + [MAi ]

∣∣thyb2

+ [MBi ]

∣∣thyb2

(9)

As 2nPCR � nPCR, the value of Di can be approxi-mated as:

[Di] =

Ns∑j=1

[MAi MB

j ]

∣∣∣∣thyb2

+ [MAi MB

i ]

∣∣∣∣thyb2

× 2nPCR (10)

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 7: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 411

This value can then be substituted in Equation 7to obtain:

fs =

Ns∑j=1

[MAi MB

j ]

∣∣∣∣thyb2

+ [MAi MB

i ]

∣∣∣∣thyb2

nT∑i=1

Ns∑j=1

[MAi MB

j ]

∣∣∣∣thyb2

+ [MAi MB

i ]

∣∣∣∣thyb2

(11)

From the numerically computed values of [MAi

MBj ]|thyb2 , Equation 11 can be used to calculate the

fraction of DNA corresponding to the species iand hence the probability of identification of atleast one colony containing the sequence can beestimated using Equation 6.

Simulation parameters

To use the developed equations for simulationof the SSH process, the reaction rate constantsand initial concentrations of Mi have to be deter-mined. The cDNA is digested with RsaI, lead-ing to single strands with an average molecu-lar mass of ∼150 kDa, corresponding to a lengthof 470 bp. The experimentally observed range ofstrand lengths is 200–2000 bp. Using a total cDNAconcentration in the tester of 2 µg cDNA and takingreagent dilution into account, the total cDNA con-centration is calculated to be 1 × 10−7 M. PolyA+mRNA in a typical mammalian cell is dividedinto three abundance classes: abundant, intermedi-ate and rare species (Hastie and Bishop, 1976). Thenumber of species in each class and their relativeabundance is shown in Table 2. This classificationis also computationally convenient as it enables theestimation of the average initial concentration of aspecies in a particular class from the total cDNAconcentration (Table 2).

Table 2. Abundance classes of mRNA in a typicalmammalian cell

mRNAClass

Number ofsequences

Abundance(copies/cell)

Conc. of eachin tester

Abundant 10 12 500 2.5 × 10−9 M

Intermediate 750 300 5.5 × 10−11 M

Rare 12 000 15 3 × 10−12 M

The concentrations of species Mi in the driverare the corresponding tester concentrations mul-tiplied by the excess ratio E and divided by thedifferential expression ratio κi. As equal volumesof tester and driver are mixed at the start ofthe hybridization process, the initial concentrationof tester and driver species in the hybridizationmixture is half that in each fraction. The secondorder rate constant (kf) was taken to be 1 × 106

M−1s−1 (Craig et al., 1971; Ermolaeva and Wag-ner, 1995). The renaturation rate constant for non-specific hybridization depends on the percentagesequence identity. Vernier and co-workers (Vernieret al., 1996) report that the values of the renatura-tion rate constant decreases to 98%, 80%, and 77%of the rate for renaturation of perfectly comple-mentary strands, respectively, for sequences shar-ing 94%, 83% and 77% sequence identity. Thesevalues are used for simulation of the associationrates for non-specific hybridization. Anderson andYoung (1985) report that the duplex dissociationrate increases by a factor of two for every 10% mis-match of the single-strand sequences. Based on thisdata and results on melting of chimeric duplexesreported elsewhere (Gotoh et al., 1995; Spiegel-man et al., 1973), the values of the dissociationconstant for 500 bp strands having partial homolo-gies of 94% and 77% have been estimated as 1×10−5 s−1 and 5 ×10−3 s−1, respectively, andused for simulating non-specific DNA hybridiza-tion.

The DNA hybridization process is never com-plete for finite hybridization times, and there is afinite probability of isolating a species that is notdifferentially expressed. The denominator of Equa-tion 8 is the sum of the concentrations of duplexesof the type MA

i MBi . Some of these duplexes corre-

spond to cDNA present in a higher concentrationin the tester and others represent cDNA presentin equal or lower concentration in the tester thanthe driver. The latter category of genes can leadto false-positive results. To assess the probabilityof obtaining false-positive results, we divide thetotal number of genes nT into genes that are dif-ferentially expressed (nA, nI and nR, correspondingto differentially expressed abundant, intermediateand rare species, respectively), and genes that arenot differentially expressed (n∗

A, n∗I and n∗

R). Theduplexes with different adaptors but from genesthat are not differentially expressed (MA

i∗MBj∗) give

rise to false-positive results. The fraction of the

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 8: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

412 C. Gadgil et al.

tester homoduplex with different adaptors A and Bon the two strands for a particular species s (Equa-tion 8) can be rewritten as:

fs = [MAs MB

s ]nT∑

i=1

[MAi MB

i ]

= [MAs MB

s ]

n∗R[MA

R∗MBR∗] + n∗

I [MAI∗MB

I∗] + n∗A[MA

A∗MBA∗]

+nR[MARMB

R] + nI[MAI MB

I ] + nA[MAAMB

A]

= [MAs MB

s ]

baseline + nR[MARMB

R] + nI[MAI MB

I ]+na[MA

a MBa ]

;

baseline = n∗R[MA

R∗MBR∗] + n∗

I [MAI∗MB

I∗]

+ n∗A[MA

A∗MBA∗] (12)

The value baseline is the total concentration oftester homoduplex with different adaptors A andB on the two strands from genes that are not dif-ferentially regulated. These sequences lead to theformation of false-positives. The value of the base-line depends on the number of genes whose expres-sion levels in the tester and driver are the same.A survey of the literature reveals a large varia-tion in the number of differentially expressed genesamong samples from different sources. Expressionanalysis of 8740 rat genes using high-density DNAarray technology revealed that 873 genes exhibitstatistically significant differences in gene expres-sion levels during nephrogenesis (Stuart et al.,2001). An analysis of publicly available microarraydata (http://ep.ebi.ac.uk/EP/EPCLUST/) showsthat approximately 20% of genes selected formicroarray construction to investigate differencesin gene expression between normal and cancer cellswere differentially expressed. In closely related celltypes (B and T lymphocytes), 2% of the genes werefound to be differentially expressed using a sub-tractive hybridization approach (Sagerstrom et al.,1997). In a survey of the whole genome usingSAGE, approximately 1.5% of expressed geneswere found to be unequally expressed in normaland cancer cells (Zhang et al., 1997).

Like SAGE, SSH is an open system where allexpressed transcripts are probed to isolate differ-entially expressed genes. We are interested in thestudy of cells in different metabolic states and

spheroid formation in hepatocytes, i.e. probing dif-ferences in closely related cell types. Hence weassume that ∼1.5% of genes will be differentiallyexpressed and estimate the number of genes thatare not differentially expressed. We have assumedthat 11 800 rare, 740 intermediate and 10 abun-dant genes are present in equal concentrations inthe tester and driver samples, i.e. nR∗ = 11800,nI∗ = 740 and nA∗ = 10. These values are used tocalculate the baseline concentration.

All symbolic calculations of partial derivativeswere carried out using Mathematica 4.0 (WolframResearch, Champaign, IL). All numerical calcula-tions were carried out using Matlab 5.3.1 (TheMathWorks, Natick, MA).

Results

Equation 3 is used to estimate the concentration oftester homoduplex with different adaptors on thetwo strands MA

i MBi relative to the total concentra-

tion of false-positives, and to investigate the effectof a change in the reaction conditions. The base-line value varies with the conditions used for SSH.For easy comparison among different conditions,the concentrations MA

i MBi are normalized to the

baseline and denoted as �i. A large value of �i(�i � 1) indicates a high likelihood of isolatinga clone corresponding to species i . Conversely, alower �i represents a high likelihood that a largenumber of false-positive clones will be obtainedbefore species i is isolated. The results shown arefor strands of various initial concentrations cor-responding to rare, intermediate-abundance, andabundant mRNA. Results are shown for levels ofdifferential regulation corresponding to κi = 1, 10,100 and 1000. The initial concentration in thedriver sample is calculated by dividing the ini-tial concentration in the tester sample by κi. Asthe number of sequences that are not differentiallyexpressed does not change in the reverse subtrac-tion process, the results for an abundant specieswith κi = 1000 correspond to both an abundantsequence that is downregulated 1000-fold, and arare sequence that is upregulated 1000-fold.

Ideal hybridization

Figure 2 shows the normalized concentration (�i)of MA

i MBi duplexes at the end of the second

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 9: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 413

κ = 1 κ = 10 κ = 100 κ = 10000

log(

ψ)

-7

-1

-2

-3

-4

-5

-6

Figure 2. Ratio of concentration duplex with differentadaptors A and B on each strands to concentration ofbaseline (ψ) for sequences belonging to the abundant( ), intermediate-abundance ( ) and rare ( ) genes asa function of the relative differential expression ratio (κ).Simulations are carried out for thyb1 = 8h, thyb2 = 12h, andE1 = E2 = 30

hybridization as a function of the relative differen-tial expression (κi) for values of the reaction param-eters recommended by the Clontech PCR-Select

protocol (E = 30, thyb1 = 8 h, thyb2 = 12h) (Clon-tech Manual, 1999). It is seen from the figure thatthe value of �i for differentially expressed speciesis less than 1 (i.e. <0 in the log scale, as shownin Figure 2) for all abundance classes. In otherwords the probability of obtaining a false-positiveclone is higher than that of obtaining a particu-lar true positive. As is seen from the graph, thevalue of �i, and hence the probability that a par-ticular species i is identified by the SSH procedure,depends greatly on κi and the initial concentration.For example, for abundant species that are not dif-ferentially expressed (κi = 1), the value of �i ismuch lower than that of rare species with κi = 1.Thus, in the absence of non-specific hybridization,abundant species are efficiently eliminated by theSSH process under recommended process condi-tions. This leads to the conclusion that under theseconditions, the bulk of the false-positive sequencesresulting from the SSH process will consist ofrare sequences that are not differentially expressed.However, for abundant sequences that are down-regulated by a large extent (κi = 1000), the valueof �i is close to one, and higher than the corre-sponding value for intermediate and rare sequences.

This implies that the SSH process is biased towardssequences that are differentially expressed to alarge extent. However, there is poor efficiency ofidentifying rare sequences that are downregulatedeven further. Even an on–off regulation of raresequences (as approximated by the bar correspond-ing to rare sequences with κi = 1000) does notlead to an improvement in the efficiency. For mod-erate differential expression levels (κi = 10–100),the SSH process is most successful in identifyingrare sequences that are upregulated 100-fold, orintermediate abundance sequences that are down-regulated 100-fold.

The effect of hybridization times and excessratios on the relative concentrations of the MA

i MBi

duplex was evaluated by varying the value of onereaction parameter while keeping all the others con-stant. The normalized partial derivative of �i withrespect to the parameter being changed, Zparameter =(parameter)

�i

∂�i∂(parameter) , describes how the prob-

ability of isolating a true differentially expressedtranscript varies with a change in the parameter. Avalue of Z = +1 implies a 100% increase in therelative concentration of MA

i MBi to baseline due to

a 100% increase in the parameter value. Shown inFigure 3 are plots of such partial derivatives withrespect to the excess ratios (E1 and E2) and reac-tion times (thyb1 and thyb2) for the first and secondhybridization as a function of κi. Figure 3a showsthat, except for transcripts with a low (κi < 20)level of differential expression, ZE1 is positive, andtherefore increasing E1 is beneficial as it results inan increase in the �i corresponding to differen-tially expressed species and leads to fewer false-positives.

The effect of changing E2 on the relative MAi MB

iconcentration is shown in Figure 3b. For raretranscripts, ZE2 > 0 for all values of κi, and thevalue of �i increases with an increase in the excessratio for hybridization 2. However, for abundanttranscripts, the trend is opposite, i.e. �i decreasesas the relative concentration as E2 is increased froma value of 30. For intermediate-abundance species,the effect of changing E2 depends on the degree towhich they are differentially expressed (κi). Thereis a beneficial effect for highly upregulated species(κi > 30) but a negative effect on species with alower differential expression ratio. The magnitudeof this increase is not high (<0.5), showing thatthe number of false-positives is not very sensitive

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 10: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

414 C. Gadgil et al.

1.5

-1

-0.5

0

ZE

1

0 200 400 600κ

800 1000

0.5

1

1.5

-1

-0.5

0

Zt h

yb1

0 200 400 600κ

800 1000

0.5

1

1

-0.5

0

Zt h

yb2

0 200 400 600κ

800 1000

0.5

1

-1

-0.5

0ZE

2

0 200 400 600κ

800 1000

0.5

(a)

(c)

(b)

(d)

Figure 3. Normalized partial derivatives (Z) of the duplex : baseline ratio with respect to process parameters processconditions thyb1, thyb2, E1 and E2 plotted as a function of the differential expression ratio (κ). The three lines in each plotrepresent abundant ( ) intermediate abundance ( . . . . ), and rare (- - - - ) species

to changes in E2 from the value of 30. Thus,doubling the excess ratio will increase the relativeconcentration of rare transcripts by a maximumof 50% from their original level. Increasing theratio beyond this value is impractical in situationswith a constraint on the amount of cDNA availablefor analysis.

Figures 3c and 3d are plots of the normalizedpartial derivatives with respect to thyb1 and thyb2.Increasing thyb1 leads to an improvement in �ifor rare species and a decrease in �i for differ-entially expressed abundant species. For interme-diate abundance species with a low κi (<50), theeffect of increasing thyb1 is to decrease �i, but forhighly upregulated intermediate abundance species,there is a small increase in �i with increase inthyb1. A similar effect is seen from the plot of thenormalized partial derivative with respect to thyb2(Figure 3d), except that the magnitude is muchsmaller, indicating that the relative concentration(�i) is less sensitive to changes in thyb2 from itsvalue of 12h.

Significantly larger excess ratios are used inother subtractive hybridization based processessuch as cDNA-representational difference analy-sis (cDNA-RDA; Hubank and Schatz, 1994) andnormalized library construction (Ko, 1990; Patan-jali, 1991; Sasaki et al., 1994). Simulation resultsfor SSH with hybridization times and excessratios similar to those used in these proceduresare presented in Figure 4. The effect of signifi-cant decreases in hybridization time and excessratios are also presented. The beneficial effect ofincreasing the excess ratios on �i for differentiallyexpressed species irrespective of abundance levelsis clear from the increase observed, even at a lowhybridization time of 15 min. On the other hand,increasing thyb1 and thyb2 alone does not yield a sig-nificant improvement in �i. The graphs illustratethe beneficial effect of decreasing thyb1 and thyb2for better identification of differentially expressedabundant species, and increasing thyb1 and thyb2for better detection of differentially expressed rarespecies. Similar calculations for other combinations

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 11: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 415

0

0

thyb1 = thyb2 = 15 min

E1

= E

2 =

1E

1 =

E2

= 30

E1

= E

2 =

300

log(

ψi)

log(

ψi)

log(

ψi)

log(

ψi)

log(

ψi)

log(

ψi)

log(

ψi)

log(

ψi)

log(

ψi)

thyb1 = thyb2 = 12 h thyb1 = thyb2 = 24 h

log(κ) log(κ) log(κ)

321 0 321 0 321

0 321 0 321 0 321

0 321 0 321 0 321

-5

-1

-2

-3

-4

0

-1

-2

-3

-4

0

-1

-2

-3

-4

-5

0

0

-2

-4

-1

-2

-3

-4

0

0

-2

-4

-6

-2

-6

-4

0

0

-2

-4

-6

-2

-4

-6

Figure 4. Effect of large changes in excess ratios and hybridization times on ψ for abundant ( ), intermediate-abundance( ), and rare ( ) genes as a function of the relative differential expression ratio (κ)

of reaction conditions can be easily carried outusing this model.

Effect of non-specific hybridization

The effect of non-specific hybridization is exploredusing this model. The analysis presented for idealhybridization can be repeated for conditions wherethe presence of partially complementary sequencesleads to the formation of chimeric duplexes (het-eroduplexes). These sequences may each belong todifferent abundance classes and have different κvalues. Here we present a detailed analysis of onepossible situation where both non-specifically inter-acting sequences Mα and Mβ are not differentiallyexpressed (κα = κα = 1) and hence contribute tothe false-positives obtained in the SSH process.Thus, the objective of changing process conditionshere is to minimize �α and �β , or at least toreduce those to values corresponding to the false-positives for ideal hybridizations (�i for sequenceswith κi = 1). In the following discussion, subscripts

α and β are used to represent sequences with partialsequence similarity.

Figure 5a shows the effect of the presence ofan abundant species, Mα , with 94% sequencehomology to another abundant sequence, Mβ , onthe false-positives that are obtained. The false-positives that are obtained in the absence of non-specific hybridization are also shown for compari-son. As seen in Figure 5a, in the presence of non-specific hybridization, the concentration of abun-dant sequences that are not differentially expressed(κα = 1) is much higher than that of rare sequenceswith κα = 1. This is the exact opposite of thecase where there is no non-specific interaction,where the concentration of false-positive sequencesthat are abundant is much lower than that of raresequences. It is also seen that in the presenceof non-specific hybridization, �α for an abundantsequence with κα = 1 is more than 100 timesgreater than the �i value for any rare species withκi = 1 in the absence of non-specific interactions.

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 12: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

416 C. Gadgil et al.

-6Present

Nonspecific Interactions

log(

ψα)

Absent

-5

-4

-3

-2

-1

0(a)

0 10 100Spike Ratio

log(

ψα)

1000

-5

-4

-3

-2

-1

0(b)

0 10 100

Spike Ratio

log(

ψα)

1000

-4

-3

-2

-1

0(c)

0 10 100

Spike Ratio

log(

ψα)

= lo

g(ψ

β)

1000

-5

-4

-3

-2

-1

0(d)

Figure 5. Effect of non-specific interactions on SSH performance. (a) Effect on the nature of false-positives of the presenceand absence of non-specific hybridizations. (b) Effect of spiking one of the interacting species α on �α . (c) Effect of spikingone of the interacting species β on the other species �α . (d) Effect of spiking both interacting species α and β on �α (or�β). Figures show ψ for abundant ( ), intermediate-abundance ( ) and rare ( ) genes. Simulations are carried out forthyb1 = 8h, thyb2 = 12h, and E1 = E2 = 30

The simulation results presented in Figure 5ashow that the false-positive pool obtained will bedominated by such an abundant sequence withκα = 1 and partial homology to another abundantsequence. It should be noted that although resultsshown in this figure are for strands correspondingto the sequence Mα , both the sequences Mα andMβ that are partially complementary are affectedthe same way during the SSH process if they areboth present in the same concentrations in the testerand driver. Hence, any analysis presented here forthe sequence Mα is equally applicable to Mβ andboth sequences will occur to the same extent in thefalse-positive pool.

Non-specific hybridization results in false-positive sequences and decreased probabilityof isolating genes that are truly differentiallyexpressed. This masking effect of partiallycomplementary, abundant sequences can bereduced by increasing the subtraction efficiency

through addition of the specific sequence Mα tothe driver. The increase in the concentration of thedriver strands for that sequence increases the rateof formation of tester–driver duplexes. The effectof such spiking is shown in Figure 5b. The figuredepicts �α values that are obtained when the driveris spiked with increasing amounts (no spiking, 10,100 and 1000-fold spiking) of the sequence Mα . Itis seen that there is a beneficial effect of spikingMα . At a level of a 1000-fold spiking, �α decreasesto a level comparable to that in the absence of non-specific interaction.

As indicated above, there is a symmetrical rela-tionship between species Mα and Mβ allowing usto interpret these results for the sequence Mβ aswell. Thus, when Mβ is spiked, �β will decrease.However, the effect of spiking one sequence (sayMα) on false-positives arising from another non-specifically interacting sequence (Mβ) may notalways be positive. Such effects are simulated using

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 13: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 417

the model and results are presented in Figure 5c,which shows the effects of spiking Mβ on �α. Thevalues of �α for species of different abundance lev-els and κα = 1 either remains the same or increaseswith increasing spiking of Mβ . Thus, the simulationresults show that there is a detrimental effect onfalse-positives arising from Mα when Mβ is spikedat increasing levels. With an increase in the amountof Mβ that is spiked, the concentration of Mα false-positives increases slightly.

The obvious solution to this problem is spikingboth Mα and Mβ in the driver. The effect of such aspiking is depicted in Figure 5d. It is seen that thevalue of �α decreases when increasing amounts ofMα and Mβ are spiked. The decrease is less than thedecrease seen when only Mα is spiked to the sameextent. This will also be true for �β , and the false-positives arising from both Mα and Mβ will bereduced by this spiking strategy. Although spikingboth sequences clearly achieves a beneficial effectby decreasing the number of false-positives withincreased spiking levels, the effect is attenuatedcompared to that observed in Figure 5b and anyincrease in concentration of spike must be morethan 1000-fold to achieve a reduction in the numberof false-positives to a level comparable to that inthe absence of any non-specific interactions.

Discussion

The probability that a differentially expressedspecies s will be correctly identified as such bysuppression subtractive hybridization depends onfs , the fraction of the MA

i MBi duplexes correspond-

ing to this species to all such duplexes present at theend of the second hybridization. We have derivedan analytical expression to calculate this concentra-tion. The concentration, and hence the probabilityof identification, depends on two categories of fac-tors: system factors, such as abundance, degree ofdifferential expression (κs) of the gene, the concen-tration and percentage similarity of non-specificallyinteracting sequences, and abundance and numberof genes that are not differentially expressed in thetwo samples; and reaction conditions, such as thedriver excess (E1 and E2) used in each hybridiza-tion, and the hybridization times thyb1 and thyb2.

We have used previously reported results onthe number of differentially expressed species toestimate the concentration of DNA that will lead to

false-positives. The results for a different numberof genes with unchanged expression level can beeasily computed. The nature of the results presentedhere does not change appreciably if a different(lower) number of species is assumed to be presentin equal concentrations in the tester and driver(results not shown). We have used reported valuesof the hybridization rate constants to carry out thesimulations. Simulations using values that differby one order of magnitude from this value showthat the results are qualitatively identical to thosepresented here.

We used the model to predict the effect ofchanges in the reaction conditions on the probabil-ity of identification of genes from different abun-dance classes that are differentially expressed atvarious levels. The model predicts that, for the pro-cess conditions typically used, the SSH methodwill lead to the identification of some differen-tially expressed genes, but will also yield a highnumber of false-positives. A number of studieshave reported false-positive results as high as 90%(Nemeth et al., 2000a, 2000b).

The probability of identification is also a func-tion of transcript length. Each cDNA is digestedwith RsaI, which results in multiple fragments ofthe same gene being present. As these fragmentsundergo independent hybridization reactions, theprobability of identification of a particular gene isapproximately proportional to the number of frag-ments, and hence to transcript length. Conversely,a long gene that is not differentially expressed willhave a higher contribution to the number of false-positives than is estimated by the model. A num-ber of studies have reported that all the isolatedsequences at the end of the SSH process are unique(Glienke et al., 2000; Grillari et al., 2000; Sandhuet al., 2000; Shen and Gudas, 2000; Wang et al.,1999). This study suggests that if there is a highefficiency of subtraction, redundant clones will beobtained, either in the form of multiple colonieshaving the same insert, or colonies with differentfragments of the same gene.

From the analytical solution to the coupled set ofdifferential equations, it is possible to calculate thenormalized partial derivative of the relative concen-tration of the MA

i MBi duplex with respect to process

parameters. To increase the probability of isolat-ing a differentially expressed gene i , the processparameter values should be optimized to increase�i. The value of the partial derivative provides a

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 14: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

418 C. Gadgil et al.

quantitative estimate of the effect of a proposedchange in one process parameter. It should be notedthat the value of the partial derivative, by definition,is true only in a small interval near the values ofthe parameters at which it is estimated. Using theanalytical expression presented here, such a cal-culation is easy to implement for values differentfrom those presented here.

In addition to determining the extent of the false-positives that result from the SSH process, thenature of the false-positives can be explored usingour model. In the absence of non-specific inter-actions, �i for abundant species with κi = 1 isless than the corresponding concentration for rarespecies. The basis of this counterintuitive observa-tion lies in the slower kinetics of duplex formationfor residual rare single-stranded species with κi = 1compared to the corresponding abundant sequencesand does appear in the transient concentration pro-files (data not shown).

Effect of non-specific hybridization

In the cDNA mixture used for the SSH pro-cess, sequences corresponding to proteins with con-served motifs exhibit partial complementarity toeach other. Chimeric duplexes formed betweenstrands having partial sequence homology havebeen observed at the end of the SSH process, insome cases at levels as high as 2% of all duplexes(Zhang et al., 2000). Such non-specifically inter-acting sequences include those that are differen-tially expressed, and those that are present at equalconcentrations in the tester and driver. In our sim-ulation we specifically address the issue of false-positives caused by the latter type of sequences.Simulation results predict that if there exist twoabundant sequences with at least 94% homology,the false-positive pool will be dominated by thesesequences. Experimental results from an investi-gation (Korke et al., unpublished results) of theexpressed species in a hybridoma cell culture revealthat the predominant false-positive obtained dur-ing a SSH belonged to a class of moleculescalled intracisternal A particles, reiterated murineretrovirus-like elements (Dupressoir et al., 1999)with high intrasequence similarity (Leib-Moschet al., 1992; Rynditch et al., 1998). Microarray andNorthern blot analysis reveals that these sequencesare equally expressed in the hybridoma samplesunder consideration in the SSH study (Korke

et al., unpublished results). The observation that alarge fraction of the false-positive pool consists ofIAP sequences supports the simulation results pre-sented here.

The addition of specific sequences to the driverin order to reduce the concentration of a particularsequence in the final products has been reported inother subtractive hybridization approaches such ascDNA RDA (Hubank and Schatz, 1994). There areseveral possibilities in the choice of the sequencethat is to be added or ‘spiked’ to the driver. Either,or both partially complementary sequences, or aconsensus sequence, or a concatenated sequencemay be used. We used the mathematical model tosimulate the efficacy of these approaches in reduc-ing the concentration of the target sequence in thefinal product mix. The effect of spiking any oneof the two sequences has a detrimental effect onfalse-positives resulting from the sequence that isnot spiked. However, the concentration of duplexescorresponding to the spiked sequence decreasesappreciably. Spiking a consensus sequence is akinto spiking an interacting sequence, and hence theeffect on both sequences will be detrimental. Spik-ing a sequence that is a concatenation of the twosequences will have an effect that depends on thestability of the duplex containing a large danglingend that will be formed. If this duplex is as stable asa perfectly complementary duplex, the effect willbe beneficial. Otherwise, as is thermodynamicallymore likely, if the duplex thus formed has a sig-nificant melting rate, the effect will be analogousto adding a partially complementary sequence. Thebest result is obtained when both sequences arespiked. Higher spiking levels (>1000-fold) haveto be used, but a reduction in the levels of bothsequences to levels representative of false-positivesin the absence of non-specific interactions canbe achieved.

Simulation of non-specific interactions has beencarried out assuming a homology of 94% betweenthe two sequences. Simulations were also carriedout assuming a homology of 77%, for which ratedata was available. The rate of the forward reactionis one-hundredth that of perfectly complementarysequences, and the melting rate is high. The false-positive rate obtained is the same as in the caseof ideal hybridization (results not shown). Thus,non-specific hybridization between sequences withpartial sequence homology of less than 80% doesnot affect the SSH performance.

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 15: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 419

In an actual experiment, there might be morethan two sequences with a high homology. Fora particular sequence α, the key step that deter-mines �α is the formation and slow dissociationof heteroduplexes formed from MA

α (or MBα ) and

a driver strand of the interacting sequence MDβ ,

as is observed from an examination of the tem-poral kinetics during the hybridization processes(results not shown). The exact composition of theinteracting sequence is not important, and there-fore we contend that the simulations for a one-interactor case may be taken as representative ofa situation where multiple non-specifically inter-acting sequences are present. The effect of non-specific interactions on species that are differen-tially expressed and/or under different process con-ditions can be easily simulated using the modelpresented here. It is seen that for extreme processconditions (thyb = 48 h, E = 300), the presence ofan abundant non-differentially expressed sequencewith 94% sequence complementarity increases �α

values for differentially expressed rare and inter-mediate abundance species. However, the con-centration of false-positives also increases (resultsnot shown).

It is seen that there is a differential effect, bothqualitative and quantitative, of changing processconditions on the probability of isolating differ-entially expressed transcripts depending on theirabundance, degree of differential expression andthe presence of non-specific hybridization. Forsome transcripts, �i increases with a positivechange in the process parameter and for some therelative concentration decreases. Thus, the modelaids in probing the effect of a proposed changeon the transcript class of interest. This mathemat-ical model will serve as a tool to carry out virtualSSH experiments to determine the best conditionsfor the particular sample under consideration. Theframework presented here can also be used for theanalysis of the efficiency of other procedures forthe isolation of differentially expressed genes.

Acknowledgements

This work was supported in part by grants from the NationalScience Foundation (BES-97272), National Aeronauticsand Space Administration (#NAG8-134), and the AcademicHealth Center, University of Minnesota. We wish to thankthe Minnesota Supercomputing Institute for use of comput-ing resources.

References

Alizadeh AA, Staudt LM. 2000. Genomic-scale gene expressionprofiling of normal and malignant immune cells. Curr OpinImmunol 12: 219–225.

Anderson MLM, Young BD. 1985. Quantitative filter hybridiza-tion. In Nucleic Acid Hybridisation: a Practical Approach,BD Hames, SJ Higgins (eds). IRL Press: Oxford and Washing-ton, DC; 73–111.

Bittner M, Meltzer P, Chen Y. et al. 2000. Molecular classificationof cutaneous malignant melanoma by gene expression profiling.Nature 406: 536–540.

Brenner S, Johnson M, Bridgham J. et al. 2000. Gene expressionanalysis by massively parallel signature sequencing (MPSS) onmicrobead arrays. Nature Biotechnol 18: 630–634.

Brown PO, Botstein D. 1999. Exploring the new world of thegenome with DNA microarrays. Nature Genet 21: 33–37.

Cho TJ, Park SS. 1998. A simulation of subtractive hybridization.Nucleic Acids Res 26: 1440–1448.

Clontech Manual. 1999. Clontech PCR-Select cDNA SubtractionKit User Manual. Clontech Laboratories: Palo Alto, CA.

Craig ME, Crothers DM, Doty P. 1971. Relaxation kinetics ofdimer formation by self-complementary oligonucleotides. J MolBiol 62: 383–401.

DeRisi JL, Iyer VR, Brown PO. 1997. Exploring the metabolicand genetic control of gene expression on a genomic scale.Science 28: 681–685.

Diatchenko L, Lau YF, Campbell AP. et al. 1996. Suppressionsubtractive hybridization: a method for generating differentiallyregulated or tissue-specific cDNA probes and libraries. Proc NatlAcad Sci USA 93: 6025–6030.

Dupressoir A, Barbot W, Loireau MP, Heidmann T. 1999.Characterization of a mammalian gene related to the yeast ccr4general transcription factor and revealed by transposon insertion.J Biol Chem 274: 31068–31075.

Ermolaeva OD, Lukyanov SA, Sverdlov ED. 1996. The mathe-matical model of subtractive hybridization and its practicalapplication. Proc Int Conf Intell Syst Mol Biol 4: 52–58.

Ermolaeva OD, Wagner MC. 1995. SUBTRACT: a computerprogram for modeling the process of subtractive hybridization.Comput Appl Biosci 11: 457–462.

Evans JD, Wheeler DE. 1999. Differential gene expressionbetween developing queens and workers in the honey bee. Apismellifera. Proc Natl Acad Sci USA 96: 5575–5580.

Glienke J, Schmitt AO, Pilarsky C. et al. 2000. Differential geneexpression by endothelial cells in distinct angiogenic states. EurJ Biochem 267: 2820–2830.

Gotoh M, Hasegawa Y, Shinohara Y, Shimizu M, Tosu M. 1995.A new approach to determine the effect of mismatches on kineticparameters in DNA hybridization using an optical biosensor.DNA Res 2: 285–293.

Grillari J, Hohenwarter O, Grabherr RM, Katinger H. 2000.Subtractive hybridization of mRNA from early passage andsenescent endothelial cells. Exp Gerontol 35: 187–197.

Groenink M, Aad CJ. 1996. Isolation of delayed early genesassociated with liver regeneration using the Clontech PCR-Select subtraction technique. CLONTECHniques XI: 7–8.

Gurskaya NG, Diatchenko L, Chenchik A. et al. 1996. EqualizingcDNA subtraction based on selective suppression of polymerasechain reaction: cloning of Jurkat cell transcripts induced by

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 16: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

420 C. Gadgil et al.

phytohemagglutinin and phorbol 12-myristate 13-acetate. AnalBiochem 240: 90–97.

Hastie ND, Bishop JO. 1976. The expression of three abundanceclasses of messenger RNA in mouse tissues. Cell 9: 761–774.

Hill AA, Hunter CP, Tsung BT, Tucker-Kellogg G, Brown EL.2000. Genomic analysis of gene expression in C. elegans .Science 290: 809–812.

Hipfel R, Schittek B, Bodingbauer Y, Garbe C. 2000. Specificallyregulated genes in malignant melanoma tissues identified bysubtractive hybridization. Br J Cancer 82: 1149–1157.

Hubank M, Schatz DG. 1994. Identifying differences in mRNAexpression by representational difference analysis of cDNA.Nucleic Acids Res 22: 5640–5648.

Ko MS. 1990. An ‘equalized cDNA library’ by the reassociationof short double-stranded cDNAs. Nucleic Acids Res 18:5705–5711.

Lee CK, Klopp RG, Weindruch R, Prolla TA. 1999. Geneexpression profile of ageing and its retardation by caloricrestriction. Science 285: 1390–1393.

Leib-Mosch C, Bachmann M, Brack-Werner R. et al. 1992.Expression and biological significance of human endogenousretroviral sequences. Leukemia 6: 72S–75S.

Liang P, Pardee AB. 1992. Differential display of eukaryoticmessenger RNA by means of the polymerase chain reaction.Science 257: 967–971.

Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. 1999. Highdensity synthetic oligonucleotide arrays. Nature Genet 21:20–24.

Martin KJ, Pardee AB. 2000. Identifying expressed genes. ProcNatl Acad Sci USA 97: 3789–3791.

Milner JJ, Cecchini E, Dominy PJ. 1995. A kinetic model forsubtractive hybridization. Nucleic Acids Res 23: 176–187.

Nemeth E, Millar LK, Bryant-Greenwood G. 2000a. Fetal mem-brane distention — II. Differentially expressed genes regulatedby acute distention in vitro. Am J Obstet Gynecol 182: 60–67.

Nemeth E, Tashima LS, Yu ZX, Bryant-Greenwood GD. 2000b.Fetal membrane distention — I. Differentially expressed genesregulated by acute distention in amniotic epithelial (Wish) cells.Am J Obstet Gynecol 182: 50–59.

Oh MK, Liao JC. 2000. DNA microarray detection of metabolicresponses to protein overproduction in Escherichia coli . MetabEng 2: 201–209.

Pardinas JR, Combates NJ, Prouty SM, Stenn KS, Parimoo S.1998. Differential subtraction display: a unified approach forisolation of cDNAs from differentially expressed genes. AnalBiochem 257: 161–168.

Patanjali SR, Parimoo S, Weissman SM. 1991. Construction of auniform-abundance (normalized) cDNA library. Proc Natl AcadSci USA 88: 1943–1947.

Rynditch AV, Zoubak S, Tsyba L, Tryapitsina-Guley N, Ber-nardi G. 1998. The regional integration of retroviral sequencesinto the mosaic genomes of mammals. Gene 222: 1–16.

Sagerstrom CG, Sun BI, Sive HL. 1997. Subtractive cloning: past,present, and future. Ann Rev Biochem 66: 751–783.

Sandhu H, Dehnen W, Roller M, Abel J, Unfried K. 2000. mRNAexpression patterns in different stages of asbestos-inducedcarcinogenesis in rats. Carcinogenesis 21: 1023–1029.

Sasaki YF, Ayusawa D, Oishi M. 1994. Construction of anormalized cDNA library by introduction of a semi-solidmRNA–cDNA hybridization system. Nucleic Acids Res 22:987–992.

Shen J, Gudas LJ. 2000. Molecular cloning of a novel retinoicacid-responsive gene, Ha1r-62, which is also upregulated inHoxA-1-overexpressing cells. Cell Growth Diff 11: 11–17.

Simpson KS, Adams MH, Behrendt-Adam CY, Ben Baker C,McDowell KJ. 1999. Identification and initial characterizationof calcyclin and phospholipase a(2) in equine conceptuses. MolReprod Dev 53: 179–187.

Spiegelman GB, Haber JE, Halvorson HO. 1973. Kinetics ofribonucleic acid-deoxyribonucleic acid membrane filter hybridi-zation. Biochemistry 12: 1234–1242.

Stuart RO, Bush KT, Nigam SK. 2001. Changes in global geneexpression patterns during development and maturation of therat kidney. Proc Natl Acad Sci USA 98: 5649–5654.

Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. 1995. Serialanalysis of gene expression. Science 270: 484–487.

Vernier P, Mastrippolito R, Helin C. et al. 1996. Radioimagerquantification of oligonucleotide hybridization with DNAimmobilized on transfer membrane: application to theidentification of related sequences. Anal Biochem 235: 11–19.

Wang XK, Li X, Yaish-Ohad S. et al. 1999. Molecular cloning andexpression of the rat monocyte chemotactic protein-3 gene: apossible role in stroke. Mol Brain Res 71: 304–312.

Wetmur JG. 1976. Hybridization and renaturation kinetics ofnucleic acids. Ann Rev Biophys Bioeng 5: 337–361.

Wetmur JG, Davidson N. 1968. Kinetics of renaturation of DNA.J Mol Biol 31: 349–370.

White KP, Rifkin SA, Hurban P, Hogness DS. 1999. Microarrayanalysis of Drosophila development during metamorphosis.Science 286: 2179–2184.

Yang GP, Ross DT, Kuang WW, Brown PO, Weigel RJ. 1999.Combining SSH and cDNA microarrays for rapid identificationof differentially expressed genes. Nucleic Acids Res 27:1517–1523.

Zhang J, Underwood LE, D’Ercole AJ. 2000. Formation ofchimeric cDNAs during suppression subtractive hybridizationand subsequent polymerase chain reaction. Anal Biochem 282:259–262.

Zhang L, Zhou W, Velculescu VE. et al. 1997. Gene expressionprofiles in normal and cancer cells. Science 276: 1268–1272.

Notation

baseline Total concentration of tester homod-uplex with different adaptors A andB on the two strands correspondingto sequences that are not differentiallyexpressed

Di Total DNA for sequence i available forligation

E1 Fold-excess driver added for the firstset of hybridizations

E2 Fold-excess driver added for the sec-ond hybridization

kbp,qi,j Rate constant of melting of the duplex

Mpi Mq

j (s−1)

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 17: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Model for suppression subtractive hybridization 421

kfp,qi,j Rate constant of duplex formation

when Mpi and Mq

i form (Mpi Mq

j )

l/(moles s)Mp

i Concentration of single-strandedcDNA species i labelled with adaptorp (moles of strands/l)

Mpi Mq

j Concentration of duplex formed by thehybridization of Mp

i and Mqj . (moles/l)

nPCR Number of PCR cyclesnT Total number of speciesps Probability of identification of a partic-

ular sequence s as being differentiallyexpressed

t Time (s)thyb1 Time that the first hybridization is

allowed to proceedthyb2 Time that the second hybridization is

allowed to proceedZparameter Normalized partial derivative of �

with respect to the parameterκi Differential expression ratio (ratio of

concentrations of a particular species iin tester and driver)

�i Duplex : baseline ratio[MA

i MBi ]

[baseline]

Subscripts

i and j Species i and j, respectively; i, j =1, . . . , nT

α and β Species α and β that have partialsequence homology

0,1 Concentrations at the start of the firsthybridization

0,2 Concentrations at the start of the sec-ond hybridization

Superscripts

A Tester species with adaptor AB Tester species with adaptor BD Species present in driver (no adaptor)* Sequences with κi = 1

Appendix: ideal hybridization model

The balance equations for species i during hybridi-zation 1A are:

d[MAi ]

dt= −kf[M

Ai ]2 − kf[M

Ai ][MD

i ] (A.1)

d[MDi ]

dt= −kf[M

Di ]2 − kf[M

Ai ][MD

i ]

(A.2)

d([MAi MA

i ]

dt= kf[M

Ai ]2

2(A.3)

d([MDi MD

i ])

dt= kf[M

Di ]2

2(A.4)

d([MAi MD

i ])

dt= kf[M

Ai ][MD

i ] (A.5)

At the start of this hybridization (t = 0), all speciesare in the form of single strands. The initialconditions are listed in Table 1. A key factoraffecting the outcome of SSH is the extent ofdifferential expression between the tester and driversamples. We define the differential expression ratioκi as the ratio of the mRNA concentration ofspecies i in the tester to that in the driver. Theexcess ratio E is defined as the ratio of theconcentration of total cDNA present in the driverto the total cDNA concentration in the tester.These equations can be solved for [MA

i ] and [MDi ]

to yield the concentrations of single-stranded tester(A.6), driver (A.7), and duplex species (A.8–A.10).

[MAi ] = [MA

i ]0,1

([MAi ]0,1 + [MD

i ]0,1)kf t + 1;

[MDi ] = [MD

i ]0,1

([MAi ]0,1 + [MD

i ]0,1)kf t + 1(A.6,7)

[MAi MA

i ] = {[MAi ]0,1}2kf t

([MAi ]0,1 + [MD

i ]0,1)kf t + 1(A.8)

[MAi MD

i ] = [MAi ]0,1[MD

i ]0,1kf t

([MAi ]0,1 + [MD

i ]0,1)kf t + 1;

[MDi MD

i ] = {[MDi ]0,1}2kf t

([MAi ]0,1 + [MD

i ]0,1)kf t + 1(A.9,10)

Setting t = thyb1 in Equations A.6–A.10, the con-centrations at the end of the first hybridization canbe obtained. In hybridizations 1A and 1B, the initialconcentration of corresponding species is identical,since the tester is divided into two equal parts. Asthe reaction conditions for both hybridizations areidentical, hybridization 1B can be represented bythe same set of equations (Equations A.1–A.10)with the superscript B replacing A.

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 18: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

422 C. Gadgil et al.

The balance equations for single stranded species,homoduplexes with identical adaptors, and homod-uplexes with different adaptors during the secondhybridization are given by Equations A.11, A.12and A.13, respectively:

d[Mpi ]

dt= −kf[M

Ai ][Mp

i ] − kf[MDi ][Mp

i ]

− kf[MBi ][Mp

i ]; p = A,B,D (A.11)

d([Mpi Mp

i ])

dt= kf[M

pi ]2

2; p = A,B,D (A.12)

d[Mpi Mq

i ]

dt= kf[M

pi ][Mq

i ]; p, q = A, B, D; p �= q

(A.13)At the start of the second hybridization, equalvolumes of the products of hybridizations 1A and1B (without melting), and fresh melted single-stranded driver are mixed. The concentration isthus reduced by a factor of three. Substituting[MB

i ] = [MAi ] in Equation A.11, we can write the

balance equation for species MAi and MD

i as:

d[MAi ]

dt= −2kf[M

Ai ]2 − kf[M

Ai ][MD

i ];d[MD

i ]

dt= −kf[M

Di ]2 − 2 kf[M

Ai ][MD

i ]

(A.14; 15)

Solving for MAi and MD

i , with initial conditionslisted in Table 1, we obtain:

[MAi ] = [MA

i ]0,2

(2[MAi ]0,2 + [MD

i ]0,2)kf t + 1;

[MDi ] = [MD

i ]0,2

(2[MAi ]0,2 + [MD

i ]0,2)kf t + 1(A.16; 17)

The rate expression for the concentration of MAi MB

iis given by Equation A.13. As [MB

i ] = [MAi ], we

can write:

d[MAi MB

i ]

dt= kf[M

Ai ][MB

i ] = kf[MAi ][MA

i ] (A.18)

Substituting Equation A.16 in Equation A.18 andintegrating using the initial conditions given inTable 1, we get:

[MAi MB

i ] = {[MAi ]0,2}2kf t

(2[MAi ]0,2 + [MD

i ]0,2)kf t + 1(A.19)

Substituting for [MAi ]0,2 and [MD

i ]0,2 in Equa-tion (A.19) an analytical expression for [MA

i MBi ]

at the end of the second hybridization (t = thyb2) interms of the initial concentration, κi, E1, E2, thyb1and thyb2 can be obtained (Equation 3).

Copyright 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 405–422.

Page 19: Research Article A mathematical model for suppression subtractive hybridizationdownloads.hindawi.com/journals/ijg/2002/590612.pdf · 2019-08-01 · Suppression subtractive hybridization

Submit your manuscripts athttp://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttp://www.hindawi.com

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

Microbiology