Review A single nucleotide polymorphism based approach for ......the frequency of each allele should...

13
Mutation Research 573 (2005) 83–95 Review A single nucleotide polymorphism based approach for the identification and characterization of gene expression modulation using MassARRAY Christian Jurinke , Mikhail F. Denissenko, Paul Oeth, Matthias Ehrich, Dirk van den Boom, Charles R. Cantor SEQUENOM Inc., 3595 John Hopkins Court, San Diego, CA, USA Received 24 September 2004; accepted 10 January 2005 Available online 2 March 2005 Abstract Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation. Their abundance and the ease with which they can be assayed have lead to their use in applications beyond simple genotyping. One such applica- tion is the quantitative determination of transcript levels associated with distinct alleles or haplotypes found in promoters and coding regions of genes. These changes in expression due to allelic variation are often associated with additional ge- nomic or transcript modifications such as DNA methylation or RNA editing. Here, we describe the use of an integrated genetic analysis platform, based on matrix-assisted laser desorption/ionisation-time-of-flight (MALDI-TOF) to first, dis- cover coding SNPs (cSNPs); second, use these cSNPs to identify and analyze allele-specific expression; and third, from this knowledge to further analyze methylation patterns as a putative cause for the allele-specific expression. An established model involving allele-specific expression profiles of the human tumor protein 73 (TP73) gene is presented as an exam- ple to outline and validate data obtained from the MassARRAY platform. The availability of a single integrated platform to assay stable and dynamic variation at the genomic and transcript level greatly simplifies complex functional genomic studies. © 2005 Elsevier B.V. All rights reserved. Keywords: Allele-specific expression; SNP discovery; SNP genotyping; Methylation; Gene expression; TP73; MALDI-TOF MS; MassARRAY Corresponding author. Tel.: +1 858 202 9173. E-mail address: [email protected] (C. Jurinke). 0027-5107/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.mrfmmm.2005.01.007

Transcript of Review A single nucleotide polymorphism based approach for ......the frequency of each allele should...

Page 1: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

Mutation Research 573 (2005) 83–95

Review

A single nucleotide polymorphism based approach for theidentification and characterization of gene expression

modulation using MassARRAY

Christian Jurinke∗, Mikhail F. Denissenko,Paul Oeth, Matthias Ehrich, Dirk van den Boom, Charles R. Cantor

SEQUENOM Inc., 3595 John Hopkins Court, San Diego, CA, USA

Received 24 September 2004; accepted 10 January 2005Available online 2 March 2005

Abstract

Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation. Their abundance and theease with which they can be assayed have lead to their use in applications beyond simple genotyping. One such applica-tion is the quantitative determination of transcript levels associated with distinct alleles or haplotypes found in promotersand coding regions of genes. These changes in expression due to allelic variation are often associated with additional ge-nomic or transcript modifications such as DNA methylation or RNA editing. Here, we describe the use of an integrated

t, dis-ird, fromtablished-tformenomic

RRAY

genetic analysis platform, based on matrix-assisted laser desorption/ionisation-time-of-flight (MALDI-TOF) to firscover coding SNPs (cSNPs); second, use these cSNPs to identify and analyze allele-specific expression; and ththis knowledge to further analyze methylation patterns as a putative cause for the allele-specific expression. An esmodel involving allele-specific expression profiles of the human tumor protein 73 (TP73) gene is presented as an example to outline and validate data obtained from the MassARRAY platform. The availability of a single integrated plato assay stable and dynamic variation at the genomic and transcript level greatly simplifies complex functional gstudies.© 2005 Elsevier B.V. All rights reserved.

Keywords:Allele-specific expression; SNP discovery; SNP genotyping; Methylation; Gene expression; TP73; MALDI-TOF MS; MassA

∗ Corresponding author. Tel.: +1 858 202 9173.E-mail address:[email protected] (C. Jurinke).

0027-5107/$ – see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.mrfmmm.2005.01.007

Page 2: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

84 C. Jurinke et al. / Mutation Research 573 (2005) 83–95

Contents

1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842. SNP discovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853. Genotyping and allelotyping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854. Analysis of allele-specific expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865. Analysis of the methylation status of 5′-UTR and promoter region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

1. Introduction

The matrix-assisted laser desorption/ionization(MALDI) time-of-flight (TOF) mass spectrometry(MS) based MassARRAY platform is widely known forits capabilities to accurately analyze single nucleotidepolymorphisms (SNPs). In the last several years therange of applications for this platform, however, wassignificantly expanded. Added functionality includesapplications for the estimation of allele frequencies inpooled DNA samples[1], discovery of SNPs[2] andquantitative analysis of gene expression[3]. Here, wedemonstrate the utility of this platform for identifica-tion and characterization of mechanisms that modulategene expression levels.

Allelic imbalance in gene expression levels can becaused by a variety of different factors, such ascis-acting regulatory polymorphisms in coding, intronicor regulatory sequences[4] as well as DNA methy-lation or histone acetylation[5,6]. A recent study byLo et al. has demonstrated that among 602 analyzedgenes that were heterozygous in several individuals, in54% one allele was preferentially expressed, and over50% of those cases showed greater than four-fold vari-ation in expression[7]. Previously, Yan et al. reportedthat 6 out of 13 human genes showed significant dif-ferences in expression between the two alleles, one ofthe analyzed genes being theTP73gene[8]. In addi-tion, the same group has also demonstrated that evensmall changes in the allelic distribution of theAPCgenepM cifice manl , iti cific

expression is poised to play an increasingly importantrole in functional genomics studies. SNPs in codingregions are the most easily accessible markers to ana-lyze allele-specific expression. Consequently, there isa growing need for sensitive and precise techniquesto discover coding SNPs and to analyze and quantifyallele-specific expression.

Our approach to provide an integrated tool for thestudy of allele-specific expression is based on SNPsas surrogate markers. We followed the scheme out-lined inFig. 1. In the first step, MassCLEAVE, a high-throughput tool for discovering SNPs and other se-quence changes, is used to identify coding SNPs. In asecond step, MassEXTEND, a primer-extension geno-typing assay, is used to analyze individual samples toidentify informative cases. In the third step, the selectedsamples are further analyzed on the transcript level us-ing a quantitative MassEXTEND assay to measure theallelic ratios. The final step of the experimental pro-cedure includes the analysis of CpG islands in the 5′-untranslated region (UTR) of the gene to explore theircytosine-methylation status, and the analysis of SNPsin the promoter region.

In this review, we present data from the analysisof TP73 expression to exemplify our approach. TheTP73gene was first described by Kaghad et al.[11].The authors identified a gene with remarkable sequencesimilarity to the DNA-binding, transactivation, andoligomerization domains of the tumor suppressor geneTP53. These and other data prompted researchers toc las-s htb inge Thea ate

lay an important role in colon cancer development[9].ore recently, Pastinen et al. reported allele-spe

xpression in 18% of 129 genes assayed in a huymphoblastoid cell line[10]. Based on these studiess clear that detection and quantitation of allele-spe

lassifyTP73as a tumor suppressor gene. This cification implies that down-regulation of TP73 mige a contributing factor in the chain of transformvents leading to the creation of cancerous cells.im of this contribution, however, is not to investig

Page 3: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

C. Jurinke et al. / Mutation Research 573 (2005) 83–95 85

Fig. 1. Workflow diagram for MassARRAY based idebtification andcharacterization of gene expression modulation.

and describe the function and regulation of TP73 ex-pression in detail—but to use TP73 as an example whiledescribing our step-wise experimental approach.

2. SNP discovery

MassCLEAVE, the SNP discovery method ofthe MassARRAY platform, employs four individualbase-specific cleavage reactions, and their subsequenanalysis by MALDI-TOF MS. Sequence variations arededuced from discrepancies between the experimen-tally determined masses in each individual reaction andcalculated masses from in silico generated cleavagepatterns of a reference sequence.

To generate the base specific cleavage reactions, tar-get sequences are first PCR amplified using primers thatincorporate T7 promoter sequences as non-annealing 5′extensions. Then in vitro transcription is performed us-

ing a mutant T7 RNA polymerase, which is deprivedof the ability to discriminate between rNTP and dNTPsubstrates[12]. The base-specific cleavage productsare generated by enzymatic cleavage of transcripts inwhich one specific base is present in the 2′-deoxy form.This chimeric structure permits cleavage in a definitebase-specific manner with RNase A, which exhibitsbase-specificity for C and U. When transcription is per-formed in forward and reverse directions, this schemeallows base-specific cleavage after each of the fourbases (cleavages at A and G are mimicked by cleavagesat C and U of the reverse strand). The MassCLEAVESNP discovery methodology is schematically outlinedin Fig. 2.

With this method, we analyzed an amplicon gen-erated from exon 5 of theTP73gene.Fig. 3 depictsdetails of the assay design. Human lymphobalst celllines specified by Yan et al. in their previous study ofTP73 allele-specific expression (CCC) were culturedand total RNA and genomic DNA was isolated.Fig. 4provides raw data of MassCLEAVE reactions obtainedfrom six different samples of genomic DNA. Three ofthe analyzed samples (seeFig. 4(B)) are heterozygousfor a C/T SNP at position 629 (rs1801174). This SNPis reported in dbSNP (build 122) to have a high het-erozygosity and has also been used by Yan et al. intheir previous investigation of TP73 allele-specific ex-pression[8]. Our results confirm the report of Yan etal. on the genotype of those cell lines. These results,together with previous reports on the MassCLEAVEm chf re atb

3

il-a mul-t hesc is tog ter-e iss -i ac-t tionp DI-T thod

t

ethodology[13], validate this experimental approaor investigative projects where data on cSNPs aest only partially available.

. Genotyping and allelotyping

If no information on the SNP frequency is avable, or larger populations need to be screened for

iple cSNPs, two efficient MassARRAY approacan generate this information: the first approachenotype each individual in the population of inst. The concept of genotyping with MassARRAYchematically outlined inFig. 5. Genotype discrimnation is based on a limited primer-extension reion. This reaction generates allele-specific terminaroducts that are subsequently analyzed in a MALOF mass spectrometer. The MassEXTEND me

Page 4: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

86 C. Jurinke et al. / Mutation Research 573 (2005) 83–95

Fig. 2. Schematic representation of the MassCLEAVE process. Details are provided in the text.

enables individual genotyping on a high-throughputscale with high multiplexing levels (including the PCR)up to 12-plex routinely obtained (seeFig. 6). The Mas-sARRAY system has two available preset configura-tions. The benchtop configuration allows for a dailythroughput of up to eight 384-well plates (or 36,864genotypes); the high-throughput configuration handlesup to twenty 384-well plates (92,160 genotypes).

While individual genotyping seems to be the moststraightforward method, a different approach shouldalso be considered, depending on the population sizeand the number of SNPs to be analyzed. This secondapproach relies on the equimolar combination of dif-ferent individual DNAs into one sample pool that isamplified by PCR followed by MassEXTEND analy-sis. The allele frequency is calculated from the ratioof the two alleles, by determination of their respectivepeak areas in the mass spectrum. This methodologyhas been thoroughly validated and successfully used ina number of studies[14–17], most recently to inves-tigate genetic variations that predict Type 2 Diabetessusceptibility[18]. A recent publication by Downeset al. [19] provides a variance component analysis ofthis process. The authors found that the MassARRAYprocess generates significantly less variance than thepyrosequencing approach, and that the use of smaller

sub pools is more cost efficient than the use of largepools and replications.

To further confirm and solidify the results obtainedby MassCLEAVE, we also used genotyping by Mas-sEXTEND to analyze the samples and obtained con-firmatory results (data not shown). For small studies,this step could be omitted, but for studies were largerpopulations are to be analyzed or the frequency of anon-characterized cSNP needs to be established, thisstep is mandatory.

The choice of SNPs for a particular study needs to bebased on the allele frequency in the actual population ofinterest. One or several SNPs can be selected to investi-gate potential allele-specific expression patterns. If theaim of the study is to obtain the most complete knowl-edge of the allelic expression patterns (for example, tocompare between smaller case and control groups) it isdesirable to select multiple SNPs with high allele fre-quency in order to assure that as many individuals aspossible are included in the analysis.

4. Analysis of allele-specific expression

The process for analyzing allele-specific ex-pression is similar to that used for conducting

Page 5: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

C. Jurinke et al. / Mutation Research 573 (2005) 83–95 87

Fig. 3. Gene structure ofTP73and assay design for the SNP discovery with MassCLEAVE. PCR primer binding sequences are underlined, theposition of rs1801174 is indicated in red.

disease-association studies with SNPs on pooled DNApopulations[14–17,20]with the exception that cDNAas opposed to genomic DNA is used as a template forPCR. As with pooled DNA studies, allele frequenciesare calculated based on the peak area representing eachrespective primer extension product under investiga-tion. The ratio of alleles from one population of cDNAsamples is compared to that of another population todetermine if any statistically significant difference ex-ists between them. Individual samples may be testedfor allele-specific expression on a case-by-case basis.When assaying for allele-specific expression, it is im-portant to verify that no extreme assay bias exists atthe genomic DNA level for the loci under investiga-tion. This is achieved by assaying both cDNA and ge-nomic DNA samples from the same individual for thesame cSNP. Since genomic DNA alleles exist at a 1:1

ratio in heterozygote individuals (assuming no LOH),the frequency of each allele should be equal (0.5:0.5).If any allelic bias is observed on the genomic level,then it should be corrected for when comparing alleleratios in the cDNA as is done for pooled DNA stud-ies[21]. Allele-specific expression is thus indicated byan allelic ratio in the cDNA that significantly differsfrom the ratio obtained from genomic DNA after cor-recting for any allelic bias.Fig. 7schematically depictsthe procedure for measuring allele-specific expressionwith MassARRAY. The spectra inFig. 7represent cellline GM10834, which also has been reported to showTP73 allele-specific expression by Yan et al.[8].

In addition to comparing the ratio of alleles to deter-mine differences in transcript levels between individ-uals and/or populations, the number of molecules ofeach cDNA species present in the reaction can also

Page 6: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

88 C. Jurinke et al. / Mutation Research 573 (2005) 83–95

Fig. 4. Raw data from MassCLEAVE analysis of six different samples. The spectra in box A are T-forward reaction results from the analysis ofthree homozygote samples. The spectra in box B are T-forward reaction results from heterozygote samples (red: GM 10864; green: GM 10834;blue: GM 108616). Doted lines indicate expected mass signals: orange: C allele; pink: T allele; grey: salt adduct; green: main signal.

be determined using MassARRAY. This simultane-ous quantitative and allele-specific expression analy-sis is achieved using competitive PCR combined withhME and MALDI-TOF MS[22]. This highly accuratemethodology has been successfully used for quantita-tive gene expression analysis (QGE) on the MassAR-RAY system[20,23,24]. For QGE, a competitor or in-ternal standard is designed to match the PCR amplicon

from the cDNA strand under investigation (60–100 bp)except that an artificial third allele is introduced rela-tive to the two SNP alleles. This third allele allows forthe resolution of a primer extension product distinctfrom the two cDNA alleles. The competitive template istitrated over a range of concentrations. A standard curveis then generated based on the relationship of competi-tor concentration to cDNA frequency.Fig. 8 provides

Page 7: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

C. Jurinke et al. / Mutation Research 573 (2005) 83–95 89

Fig. 5. Schematic representation of the MassEXTEND genotyping assay.

the titration curves for sample GM10834 showing al-lele specific expression for TP73. The point at whichthe cDNA frequency intersects the standard curve at 0.5represents the concentration at which the competitorand cDNA are present at equimolar ratio and indicatesthe concentration of the cDNA. Nonlinear regression

analysis is used to calculate this point (EC50) for eachstandard curve.Fig. 9 and Table 1show the allele-specific expression results for TP73 in three individ-uals.Fig. 9 also shows the C:T allele ratio for cDNArelative to the respective genomic DNA. The three sam-ples show clear allele-specific expression for TP73.

Fig. 6. Sample spectrum from a 12-plex genotyping assay obtained with MassEXTEND. Dotted vertical lines represent expected alleles.

Page 8: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

90 C. Jurinke et al. / Mutation Research 573 (2005) 83–95

Fig. 7. Schematic representation of MassARRAY workflow for allele-specific expression analysis.

Mechanisms of differential allelic expression notrelated to genomic imprinting are currently not com-pletely understood. Based on recent studies it appearsthat expression of allelic variants of any given genemay differ [7–10]. New high-throughput technologiesfor conducting such studies, such as MassARRAY,should help to clarify this biological phenomenon.

Table 1Concentration and number of molecules for C and T alleles of rs1801174 in cDNA of three different samples

cDNA sample C alleleconcentration(fM)

T alleleconcentration(fM)

C allele numberof molecules

T allele numberof molecules

Allele ratio (C:T)

GM 10864 28.3 6.0 84,900 17,970 4.72GM 10834 17 39 51,300 117,000 0.44GM 12616 16.3 11.6 48,900 34,800 1.41

However, differential expression of alleles has beenstudied intensively in regards to genomic imprinting(for review see[5]). The gametes each contribute aset of DNA sequences to the zygote, but epigeneticimprinting of a subset of genes determines that onlyone of the parent’s alleles is expressed relative to theparental origin. Genomic imprinting is an epigenetic

Page 9: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

C. Jurinke et al. / Mutation Research 573 (2005) 83–95 91

Fig. 8. Results for SNP (rs 1801174) analysis of cDNA from sample GM10834 using competitive PCR. The dotted line represents the point atwhich competitor and the respected allele are present at the same concentration. The T allele (line with square) is expressed at a higher levelthan the C allele (line with circle) in this particular sample.

form of gene regulation, which controls the parent-dependent gene expression of imprinted genes duringgametogenesis and embryonic development. Imprint-ing involves differential methylation of alleles in onesex cell lineage but not in the other. Methylation mayalso play an important role in allele-specific expres-sion of genes involved in disease pathways, such ascancer. An important role in carcinogenesis is verylikely since the differential methylation of promoterregions and exons is known to regulate transcription.Based on this hypothesis, we assayed for methyla-tion of the 5′-UTR of TP73 in those samples exhibit-

ing differential allele expression at the 629C/T site inexon 5.

5. Analysis of the methylation status of 5′-UTRand promoter region

Genomes store information not only in the se-quence of the nucleotide bases, but also in the co-valent modifications of selected bases. One of thesemodifications is cytosine methylation. Cytosine methy-lation occurs within CpG dinucleotides, which are

Fig. 9. Ratios of C:T alleles for cDNA from samples GM 10864, GM 10834, and GM 12616 in direct comparison to ratios obtained fromgenomic DNA of the same samples using rs1801174.

Page 10: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

92 C. Jurinke et al. / Mutation Research 573 (2005) 83–95

generally underrepresented in the human genome andaccumulated in distinct areas called CpG islands. Alarge proportion of CpG islands are found in the pro-moter regions of genes. The conversion of cytosineto 5-methylcytosine in promoter-associated CpG is-lands has been linked to changes in chromatin struc-ture and usually results in transcriptional silencing ofthe associated gene. On a broader scale, alteration ofcytosine methylation has been linked to mammaliandevelopment, parental imprinting and X-chromosomeinactivation[25], suppression of parasitic DNA[26]and numerous cancer types[27–30]. Several methodsfor analysis of DNA methylation have been developedand among these bisulfite treatment of genomic DNAhas been the most prominent. This method takes ad-vantage of the sequence change introduced by selec-tive chemical conversion of non-methylated cytosine.Treatment of DNA with sodium bisulfite deaminatesnon-methylated cytosine to uracil, whereas methylatedcytosine remains unaffected. Various methods can thenbe employed to identify the resulting C→ T changes,such as Sanger sequencing and methylation-specificPCR (MSP)[31–35].

For methylation analysis of the 5′-UTR andpromoter region of theTP73gene, we utilized a newapproach employing the concept of base-specific cleav-age as introduced earlier in this review for the discov-ery of sequence polymorphisms. The region of interest,however, is now PCR-amplified from bisulfite treatedgenomic DNA. The conversion of cytosine to uracild spe-c iono is.T itesa tiona

thec ange

introduced during bisulfite treatment. In general,methylation events fall into three categories: (1) methy-lation introduces a new cleavage site resulting in new,shorter products; (2) a non-cleavable nucleotide re-places an existing cleavage site, resulting in a new,longer product; and (3) methylation generates a se-quence change in an existing cleavage product that doesnot affect cleavage, but generates a mass shift.

For discovery of methylation sites in the 5′-UTRof TP73 and assessment of the degree of methylation,we focused on two specific cleavage reactions. Methy-lation sites are readily discovered in those cleavagereactions, since each methylated cytosine generates anew cleavage site. A completely non-methylated DNA,for example, will not exhibit any cytosine residuesin the corresponding PCR product and RNA tran-script. Hence, a C-specific cleavage will not gener-ate any cleavage products and corresponding masssignals. A methylated DNA, on the contrary, willresult in an RNA transcript that contains several cleav-able nucleotides (the methylated cytosine residues areprotected from conversion) and hence generates de-fined cleavage products, which can be used for theidentification of methylated CpGs in the amplifiedregion.

While the C-specific cleavage reaction is an efficientmethod for discovery of methylated CpGs, this reac-tion does not allow for relative quantitation. To assessthe degree of methylation in our cell-culture samples,we used a T-specific reaction on the reverse strand tran-s –As nver-s –As thosec mul-t sedi canb mass

TP

D

1 2

G 0G 0G 0

uring bisulfite treatment leads to a distinct baseific cleavage pattern that allows for the identificatf methylation during the MALDI-TOF MS analyshis concept allows the discovery of methylation ss well as the semi-quantitative analysis of methylat single or multiple CpG positions[36].

Methylation events have a distinct effect onleavage pattern, because of the C–T sequence ch

able 2ercentage methylation per CpG site in the 5′-UTR of TP73

NA CpG

2 12 13 14

M 10864 0 45 45 10M 10834 0 49 40 30M 12616 30 0 0 0

s

cript. Here methylation events are identified as Gequence changes (complementary to the C–T coion introduced during bisulphite treatment). The Gequence changes lead to a 16 Da mass shift ofleavage products harboring methylated CpGs (oriples of 16 Da when more than one CpG is enclon one cleavage product). A relative quantitatione achieved by using the peak area ratio of those

7 18 19 24 27 3

0 40 40 10 10 10 0 20 10 0 3

0 0 10 0 10

Page 11: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

C. Jurinke et al. / Mutation Research 573 (2005) 83–95 93

Fig. 10. Results from the methylation analysis of four samples (GM 10864: blue; GM 108699: green; GM 10834: yellow; GM 108616: red).The results represent data for CpG 12 and 13. The signal at 4653d represents non-methylated DNA, whereas the mass signal at 4685d representsDNA that is methylated at CpG sites #12 and 13. The depicted spectra are the sum of 20 individual shots from one location, the quanitativeanalysis is perfomed from the sum of spectra from five different locations.Table 2summarizes the data for all methylated CpGs.

signals representing non-methylated and methylatedDNA.

As an initial screen, we amplified a 410 bp regionof the 5′-untranslated region of TP73 (nts 1828-2237,UCSC entry chr1:3349230-3352819). This region har-bors 36 CpG sites. With the base-specific cleavage as-say described above, we identified 10 CpGs subjectto methylation in the cell culture samples analyzed.Table 2provides an overview of the methylated CpGs.The CpGs are numbered according to their position inthe amplified region. We also quantitated the relativeamount of methylation in these samples. The percent-age of methylation determined in each of the samplesis listed for each of the CpG sites.Fig. 10 exempli-fies the identification and relative quantitation for CpG12 and 13 within the 5′-UTR of theTP73gene. Themass signal indicating methylation is marked with anarrow. The peak area ratio between this mass signal(at 4685 Da) and the corresponding mass signal (at4653 Da) allows estimating the percentage of methyla-tion. Although the methylation of CpGs in the 5′-UTRvaries between the samples, the methylation status ofthis region of theTP73gene could not be correlated

with the function and degree of allele-specific expres-sion of the same samples. Additional CpG islands inthe promoter/5′-UTR or coding region will need to beanalyzed to further explore the potential influence ofmethylation on allele-specific expression of TP73, in-cluding experiments to analyze the methylation haplo-type. Furthermore, analyzing the 5′-UTR for the pres-ence of SNPs that affect transcription efficiency is an-other path forward. Since these SNPs are not detectablein the transcript, a different method needs to be usedfor their analysis. Knight et al. have devised such amethod based on the MassARRAY platform[37]. Thishaplotype-specific procedure is based on chromatin im-munoprecipitation (ChIP) to analyze the amount ofphosphorylated RNA polymerase bound to each of thetwo alleles of a gene in vivo. A schematic overviewof this procedure, called Haplo–ChIP, is provided inFig. 11. Knight et al. recently reported allele-specificrepression of lymphotoxin-� by activated B cell factor-1 [38] and increased in vivo transcription of an IL-8haplotype associated with respiratory syncytial virusdisease-susceptibility[39] using this MassARRAYprocedure.

Page 12: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

94 C. Jurinke et al. / Mutation Research 573 (2005) 83–95

Fig. 11. Schematic representation of the Haplo–ChIP assay with MassARRAY.

6. Conclusion

Allele-specific expression is an important factor inour understanding of inherited inter-individual differ-ences and disease susceptibility. The MassARRAYplatform provides an integrated system to performSNP discovery, genotyping, gene expression and allele-specific expression studies. In addition, the SNP dis-covery tool can be used to screen and analyze themethylation status of CpG islands.

References

[1] K.L. Mohlke, M.R. Erdos, L.J. Scott, T.E. Fingerlin, A.U. Jack-son, K. Silander, P. Hollstein, M. Boehnke, F.S. Collins, High-throughput screening for evidence of association by using massspectrometry genotyping on DNA pools, Proc. Natl. Acad. Sci.U.S.A. 99 (2002) 16928–16933.

[2] R. Hartmer, N. Storm, S. Boecker, C.P. Rodi, F. Hillenkamp, C.Jurinke, D. van den Boom, RNase T1 mediated base-specificcleavage and MALDI-TOF MS for high-throughput compara-tive sequence analysis, Nucleic Acids Res. 31 (9) (2003) e47.

[3] C. Ding, C.R. Cantor, A high-throughput gene expression anal-ysis technique using competitive PCR and matrix-assisted laser

desorption ionization time-of-flight MS, Proc. Natl. Acad. Sci.U.S.A. 100 (2003) 3059–3064.

[4] R.C. Hardison, J. Oeltjen, W. Miller, Long human–mouse se-quence alignments reveal novel regulatory elements: a reasonto sequence the mouse genome, Genome Res. 7 (1997) 959–966.

[5] R. Jaenisch, A. Bird, Epigenetic regulation of gene expression:how the genome integrates intrinsic and environmental signals,Nat. Genet. 33 (2003) 245–254.

[6] N. Dillon, R. Festenstein, Unraveling heterochormatin: compe-tition between positive and negative factors regulates accessi-bility, Nat. Genet. 32 (2002) 432–437.

[7] H.S. Lo, Z. Wang, Y. Hu, H.H. Yang, S. Gere, K.H. Beutow,M.P. Lee, Allelic variation in gene expression is common in thehuman genome, Genome Res. 13 (2003) 1855–1862.

[8] H. Yan, W. Yuan, V.E. Velculescu, B. Vogelstein, K.W. Kinzler,Allelic variation in human gene expression, Science 297 (2002)1143.

[9] H. Yan, et al., Small changes in expression affect predispositionto tumorigenisis, Nat. Genet. 30 (2002) 25–26.

[10] T. Pastinen, et al., A survey of genetic and epigenetic variationaffecting human gene expression, Physiol. Genomics 2 (2004)184–193.

[11] Kaghad, et al., Monoallelically expressed gene related to p53at 1p36, a region frequently deleted in neuroblastoma and otherhuman cancers, Cell 90 (1997) 809–819.

[12] R. Sousa, R. Padilla, A mutant T7 RNA polymerase as a DNApolymerase, EMBO J. 14 (1995) 4609–4621.

Page 13: Review A single nucleotide polymorphism based approach for ......the frequency of each allele should be equal (0.5:0.5). If any allelic bias is observed on the genomic level, then

C. Jurinke et al. / Mutation Research 573 (2005) 83–95 95

[13] P. Stanssens, et al., High-throughput MALDI-TOF discoveryof genomic sequence polymorphisms, Genome Res. 14 (2004)339–346.

[14] K.H. Buetow, M. Edmonson, R. MacDonald, R. Clifford, P.Yip, J. Kelley, D.P. Little, R. Strausberg, H. Koester, C.R. Can-tor, A. Braun, High-throughput development and characteriza-tion of a genomewide collection of gene-based single nucleotidepolymorphism markers by chip-based matrix-assisted laser des-orption/ionization time-of-flight mass spectrometry, Proc. Natl.Acad. Sci. U.S.A. 98 (2001) 581–584.

[15] M. Werner, M. Sych, N. Herbon, T. Illig, I.R. Konig, M. Wjst,Large-scale determination of SNP allele frequencies in DNApools using MALDI-TOF mass spectrometry, Hum. Mutat. 20(2002) 57–64.

[16] K.L. Mohlke, M.R. Erdos, L.J. Scott, T.E. Fingerlin, A.U. Jack-son, K. Silander, P. Hollstein, M. Boehnke, F.S. Collins, High-throughput screening for evidence of association by using massspectrometry genotyping on DNA pools, Proc. Natl. Acad. Sci.U.S.A. 99 (2002) 16928–16933.

[17] A. Bansal, D. van den Boom, S. Kammerer, C. Honisch, G.Adam, C.R. Cantor, P. Kleyn, A. Braun, Association Testingby DNA pooling—an effective initial screen, Proc. Natl. Acad.Sci. U.S.A. 99 (2002) 16871–16874.

[18] K. Silander, et al., Genetic variation near the hepatocyte nu-clear factor-4 a gene predicts susceptibility to type-2 Diabetes,Diabetes 53 (2004) 1141–1149.

[19] K. Downes, B.J. Barratt, P. Akan, S.J. Bumpstead, S.D. Taylor,D.G. Clayton, P. Deloukas, SNP allele frequency estimation inDNA pools and variance components analysis, Biotechniques36 (2004) 840–845.

[20] K. Tang, P. Oeth, S. Kammerer, M.F. Denissenko, J. Ekblom, C.Jurinke, D. van den Boom, A. Braun, C.R. Cantor, Mining dis-ease susceptibility genes through SNP analyses and expressionprofiling using MALDI-TOF mass spectrometry, J. ProteomeRes. 3 (2004) 218–227.

[ ec-sis,

[ Si-alysis

[ nal-laserci.

[24] C. Ding, C.R. Cantor, Quantitative analysis of nucleic acids-thelast few years of progress, J. Biochem. Mol. Biol. 37 (2004)1–10.

[25] E. Li, Chromatin modification and epigenetic reprogrammingin mammalian development, Nat. Rev. Genet. 3 (2002) 662–673.

[26] C.P. Walsh, J.R. Chaillet, T.H. Bestor, Transcription of IAP en-dogenous retroviruses is constrained by cytosine methylation,Nat. Genet. 20 (1998) 116–117.

[27] J.F. Costello, et al., Aberrant CpG-island methylation hasnon-random and tumour-type-specific patterns, Nat. Genet. 24(2000) 132–138.

[28] P.A. Jones, S.B. Baylin, The fundamental role of epigeneticevents in cancer, Nat. Rev. Genet. 3 (2002) 415–428.

[29] J.F. Costello, C. Plass, Methylation matters, J. Med. Genet. 38(2001) 285–303.

[30] A.P. Feinberg, Cancer epigenetics takes center stage, Proc. Natl.Acad. Sci. U.S.A. 98 (2001) 392–394.

[31] M. Frommer, et al., A genomic sequencing protocol that yields apositive display of 5-methylcytosine residues in individual DNAstrands, Proc. Natl. Acad. Sci. U.S.A. 89 (1992) 1827–1831.

[32] S.J. Clark, et al., High sensitivity mapping of methylated cy-tosines, Nucleic Acids Res. 22 (1994) 2990–2997.

[33] R. Paulin, et al., Urea improves efficiency of bisulfite-mediatedsequencing of 5′-methylcytosine in genomic DNA, NucleicAcids Res. 26 (1998) 5009–5010.

[34] C. Grunau, S.J. Clark, A. Rosenthal, Bisulfite genomic sequenc-ing: systematic investigation of critical experimental parame-ters, Nucleic Acids Res. 29 (2001) E65–E75.

[35] J.G. Herman, et al., Methylation-specific PCR: a novel PCRassay for methylation status of CpG islands, Proc. Natl. Acad.Sci. U.S.A. 93 (1996) 9821–9826.

[36] Ehrich et al., submitted for publication.[37] J.C. Knight, B.J. Keating, K.A. Rockett, D.P. Kwiatkowski, In

vivo characterization of regulatory polymorphisms by allele-net.

[ ificr-1,

[ p-ivotory004)

21] C. Jurinke, P. Oeth, D. van den Boom, MALDI-TF mass sptrometry: a versatile tool for high-performance DNA analyMol. Biotech. 26 (2004) 147–163.

22] C. Ding, E. Maier, A.A. Roscher, A. Braun, C.R. Cantor,multaneous quantitative and allele-specific expression anwith real competitive PCR, BMC Genet. 5 (2004) 8.

23] C. Ding, C.R. Cantor, A high-throughput gene expression aysis technique using competitive PCR and matrix-assisteddesorption ionization time-of-flight MS, Proc. Natl. Acad. SU.S.A. 100 (2003) 3059–3064.

specific quantitation of RNA polymerase loading, Nat. Ge33 (2003) 469–475.

38] J.C. Knight, B.J. Keating, D.P. Kwiatkowski, Allele-specrepression of lymphotoxin-alpha by activated B cell factoNat. Genet. 36 (2004) 394–399.

39] D. Hacking, J.C. Knight, K. Rockett, H. Brown, J. Framton, D.P. Kwiatkowski, J. Hull, I.A. Udalova, Increased in vtranscription of an IL-8 haplotype associated with respirasyncytical virus disease-susceptibility, Genes Immun. 4 (2274–282.