DNA-Microarray Analyses...DNA-Microarray Analyses Marcel J.Scheideler Jörg D.Hoheisel process takes...

2
Screening 5/2002 23 22 Screening 5/2002 D NA-microarray technology is dramatically changing studies in molecular biology, permit- ting analyses on a global scale. There is a lot of enthusiasm about its power, since it brings us few (of still very many) steps closer to the under- standing of the very complex and in- terrelated biological processes that define cellular life. However, despite – or rather because of – all the well- founded enthusiasm, one should also be aware of the technique’s limita- tions in order to take proper advan- tage of the great chances offered by this technology. One important accomplishment in the bio-technical and bio-medical area is the development and diversification of mi- croarray technologies. Basically, microar- ray assays are nothing else but paral- lelised blotting techniques. In contrast to earlier methods, generally the status of blotted and free-floating partners is turned on its head in the analysis in or- der to achieve parallelism. Currently, wide-spread applications include studies on transcriptional variations and geno- typing experiments for diagnostic pur- poses, for example.Various processes for production and detection exist.The prin- ciple, however, remains the same. Pieces of nucleic acids are arranged in an or- dered grid that allows an immediate allo- cation of a binding event to the relevant sequence. Upon incubation with a usually rather complex mixture of sample mole- cules, mostly again nucleic acids, a sorting DNA-Microarray Analyses Marcel J. Scheideler Jörg D. Hoheisel process takes place by formation of a du- plex between complementary molecules. The result is visualised as an intensity pattern produced by attached labels such as fluorescence dyes. A thorough quality assessment of data sets of this size and complexity is not a trivial task, especially if done by others.Therefore, false or only partially correct information could quickly accumulate. In such a scenario, the technical advance would not neces- sarily enhance science but – in the ex- treme case – could actually even slow it down. Careful consideration of microar- ray data, alertness to the problems at- tached, defined standards and the free exchange of the full data sets by means of a central depository are therefore necessary to make use of the techniques’ full potential. Microarray design Microarrays come in various formats, mainly different in the kind of sensor molecule. Each design has its own char- acteristics in sensitivity, accuracy and re- producibility but also with respect to the kind of assays which are sensibly possi- ble. However, an evaluation of microarray systems and their results requires not only a close look at design and produc- tion – still frequently the emphasis of quality-assuring measures – but at the complete workflow involved (Fig. 1). Short oligonucleotides have a high speci- ficity. For quantification, however, redun- dant information is necessary, since indi- vidual oligomers differ strongly in their hybridisation behaviour.Also, their selec- tion can be anything but trivial.These ef- fects are less pronounced with oligonu- cleotides longer than about 50 nucleotides, but at a cost of specificity, rendering them useless for typing exper- iments. PCR-products or other DNA- fragments, finally, are least affected by sta- bility and kinetic differences but most prone to cross-hybridisation events. Another difference in design is set by the actual application. Global analyses are mostly aiming at the accumulation of new (basic) knowledge, such as informa- tion on genes involved in the establish- ment of a disease.Alternatively, function- ally defined sets of probes may be assembled on an array in order to cut cost and simplify analysis. However, pre- selection risks that important and rele- vant information is missed. For routine screening of large sample numbers, usu- ally focussed arrays with a well-defined probe set of little complexity are being used. A third categorisation factor is, whether the design is based on known sequence or not.To date,95 completed genome se- quences are available in public databases, of which 16 are archaeal, 69 bacterial and 10 eukaryotic genomes, and another 529 projects are underway (http://ergo.inte- gratedgenomics.com/GOLD). Knowledge of genomic sequence allows the con- scious selection of capture probes, if the annotation is sufficient. Alternatively, one can design arrays without sequence infor- mation. For transcriptional profiling, for example, this comprises arrays made of cDNAs or genomic shotgun clones. Pre- Fig. 1: Listing of parameters that affect microarray-based analyses.The representa- tion is far from being complete. Benefits, Promises and Problems

Transcript of DNA-Microarray Analyses...DNA-Microarray Analyses Marcel J.Scheideler Jörg D.Hoheisel process takes...

Page 1: DNA-Microarray Analyses...DNA-Microarray Analyses Marcel J.Scheideler Jörg D.Hoheisel process takes place by formation of a du-plex between complementary molecules. The result is

Screening 5/2002 2322 Screening 5/2002

DNA-microarray technology isdramatically changing studiesin molecular biology, permit-

ting analyses on a global scale.Thereis a lot of enthusiasm about its power,since it brings us few (of still verymany) steps closer to the under-standing of the very complex and in-terrelated biological processes thatdefine cellular life. However, despite –or rather because of – all the well-founded enthusiasm, one should alsobe aware of the technique’s limita-tions in order to take proper advan-tage of the great chances offered bythis technology.

One important accomplishment in thebio-technical and bio-medical area is thedevelopment and diversification of mi-croarray technologies. Basically, microar-ray assays are nothing else but paral-lelised blotting techniques. In contrast toearlier methods, generally the status ofblotted and free-floating partners isturned on its head in the analysis in or-der to achieve parallelism. Currently,wide-spread applications include studieson transcriptional variations and geno-typing experiments for diagnostic pur-poses, for example.Various processes forproduction and detection exist.The prin-ciple, however, remains the same. Piecesof nucleic acids are arranged in an or-dered grid that allows an immediate allo-cation of a binding event to the relevantsequence. Upon incubation with a usuallyrather complex mixture of sample mole-cules, mostly again nucleic acids, a sorting

DNA-Microarray Analyses

Marcel J. ScheidelerJörg D. Hoheisel

process takes place by formation of a du-plex between complementary molecules.The result is visualised as an intensitypattern produced by attached labels suchas fluorescence dyes. A thorough qualityassessment of data sets of this size andcomplexity is not a trivial task, especiallyif done by others.Therefore, false or onlypartially correct information couldquickly accumulate. In such a scenario,the technical advance would not neces-sarily enhance science but – in the ex-treme case – could actually even slow itdown. Careful consideration of microar-ray data, alertness to the problems at-tached, defined standards and the freeexchange of the full data sets by meansof a central depository are thereforenecessary to make use of the techniques’full potential.

Microarray design

Microarrays come in various formats,mainly different in the kind of sensormolecule. Each design has its own char-acteristics in sensitivity, accuracy and re-producibility but also with respect to thekind of assays which are sensibly possi-ble. However, an evaluation of microarraysystems and their results requires notonly a close look at design and produc-tion – still frequently the emphasis ofquality-assuring measures – but at thecomplete workflow involved (Fig. 1).Short oligonucleotides have a high speci-ficity. For quantification, however, redun-dant information is necessary, since indi-vidual oligomers differ strongly in their

hybridisation behaviour.Also, their selec-tion can be anything but trivial.These ef-fects are less pronounced with oligonu-cleotides longer than about 50nucleotides, but at a cost of specificity,rendering them useless for typing exper-iments. PCR-products or other DNA-fragments, finally, are least affected by sta-bility and kinetic differences but mostprone to cross-hybridisation events.

Another difference in design is set by theactual application. Global analyses aremostly aiming at the accumulation ofnew (basic) knowledge, such as informa-tion on genes involved in the establish-ment of a disease.Alternatively, function-ally defined sets of probes may beassembled on an array in order to cutcost and simplify analysis. However, pre-selection risks that important and rele-

vant information is missed. For routinescreening of large sample numbers, usu-ally focussed arrays with a well-definedprobe set of little complexity are beingused.

A third categorisation factor is, whetherthe design is based on known sequenceor not.To date, 95 completed genome se-quences are available in public databases,of which 16 are archaeal, 69 bacterial and10 eukaryotic genomes, and another 529projects are underway (http://ergo.inte-gratedgenomics.com/GOLD). Knowledgeof genomic sequence allows the con-scious selection of capture probes, if theannotation is sufficient. Alternatively, onecan design arrays without sequence infor-mation. For transcriptional profiling, forexample, this comprises arrays made ofcDNAs or genomic shotgun clones. Pre-

Fig. 1: Listing of parameters that affectmicroarray-based analyses. The representa-tion is far from being complete.

Benefits, Promises and Problems

Page 2: DNA-Microarray Analyses...DNA-Microarray Analyses Marcel J.Scheideler Jörg D.Hoheisel process takes place by formation of a du-plex between complementary molecules. The result is

Screening 5/2002 25

For analyses on dynamically varying – pa-rameters e.g. the amount of RNA pre-sent in a tissue – sample handling has alarge effect on the eventual analysis. Onlyan appropriate data annotation, which inturn requires a commonly accepted on-tology, will really make data comparable,irrespective of the problem caused bythe various procedures and platformsthat are in use. The MGED standards(www.mged.org) formulated for tran-scriptional analyses, for example, onlyrepresent an initial step toward that goal.What is eventually needed for quantita-tive analyses are algorithms that permitthe determination of the absoluteamount of each individual molecule type,an achievement not beyond imagination.

Data analysis

Microarray experiments provide un-precedented quantities of data.Their in-terpretation should therefore be basedon three main processes: statistical qual-ity analysis, data integration and a pre-sentation that makes it accessible to hu-man thinking. Because of the shearwealth of information, only statisticalprocedures allow an assessment and fil-tering of microarray data. Data integra-tion is essential, since tying connectionsthat make subsequent interpretation fea-sible.This requires a modular data ware-house concept combining the storage ofdata like raw signal intensities, gene an-notations and their functional categoriesas well as experimental annotations. Forlater queries, the annotations should bedetermined by a pre-defined and cata-logued vocabulary. Sophisticated compu-tational tools for data visualisation andreduction of data complexity are impor-tant to make the information accessible

to a human mind, at least as long asthere are no automaticexpert systems. Clus-

tering, such as correspondence analysis,and algorithms that project the dataonto biologically relevant informationlike pathways, molecular functions, cellu-lar components or promoter sites repre-sent tools to such end.

The capability of combining freely indi-vidual experiments gets increasingly im-portant the more data becomes avail-able. Only then, new queries are madepossible, which had not been consideredwhen producing the results in the firstinstance. Currently, most information re-mains unused, even if the results are pub-lished, since most data are irrelevant tothe specific cellular phenomenon that isdiscussed in detail. Only a recycling ofdata will fully release the gigantic poten-tial of microarray analyses.

Personalised medicine

Frequently, it is claimed that microarrayswill soon allow a more personal diseasediagnosis followed by a tailor-made med-ical treatment. Expression analyses oncancer tissues, for example, demon-strated that patients that seemingly suf-fered from the same form of cancer ac-tually fall into distinct molecularsub-classes. Chip-based comparativegenome hybridisation (CGH) producedpatterns of amplification or loss of ge-nomic regions, which were different be-tween groups of patients and indicativeof disease development, therefore beinguseful to prognostic purposes. Whilesuch approaches are very valuable andexciting, one should be aware that morecomprehensive analyses, based on sam-ple sizes, collection procedures and an-notation practices of epidemiologists areneeded in many cases before results ofwider implication can be expected. Thesame is true for typing analyses. In addi-tion, a set of single nucleotide polymor-phisms (SNPs) needs to be defined foreach assay. Millions of SNPs are known inthe human genome. Which of these arehighly relevant, however, is still being de-termined.And also the production and –once applied simultaneously to the ar-rays – detection of large sample numbersis not trivial, but nevertheless needed fortaking advantage of the parallelism of mi-croarrays. Advanced procedures such asan on-chip polymerase reaction arebound to make this profiling routinelypossible. In an adaptation to SNP-typing,even cellular regulative processes can bestudied. Un-methylated cytosine, for ex-ample is transformed into uracil and then– upon PCR-amplification – thymidinewhen treated with bisulfite, while methy-

lated cytosine remains unaffected.Thereby, a polymorphism is artificially in-troduced and made detectable by com-paring treated and untreated samples.However, will the above-mentioned stud-ies immediately lead to personal diseasetreatments or even drugs? The answer is"no”. For once, no company currentlyhas the capacity to create all these drugs.Second, even if individualised drugs couldbe produced, most likely they would notbe produced for the astronomical costinvolved. Solving this dilemma, as possiblyby means of combinatorial techniques, iscertainly a promising field for innovativedevelopments. Without immediate per-sonal remedy, is having a personal molec-ular diagnosis therefore a waste of time?Again, the answer is a clear and emphatic"no”.While tailor-made drugs for individ-ual patients might be far off still or evennever materialise, knowledge about mol-ecular sub-groups will be useful rightaway. One only needs to think about thehuge number of drugs that never made itto the patients due to side-effects. Fre-quently enough, however, these side-ef-fects were restricted to few probants orpatients, while the vast majority experi-enced good results.The ability to identifythe group of patients, who should not betreated with a particular drug, couldtherefore increase the number of usefuldrugs immensely. Both the health of thepatients and the commercial success ofdrug companies will gain from this, thelatter being able eventually to make rev-enues from drugs they had invested inheavily without return.

Prospects

Despite of some limitations, the powerand width of DNA-microarray technol-ogy made it a predominant factor ingenomics, transcriptomics, pharmacoge-nomics and systems-biology, simultane-ously getting ever more important inpre-clinical research and meanwhile evenclinical studies. Additionally, the principleis also put to good use in other areas ofinvestigation, including studies on pro-teins, metabolic compounds or smallchemical entities.

References are available from the authors.

Dr Marcel J. ScheidelerDr Jörg D. HoheiselFunctional Genome AnalysisDeutsches KrebsforschungszentrumIm Neuenheimer Feld 50669120 HeidelbergGermanyPhone +49 6221 42 4680Fax +49 6221 42 [email protected]/funct_genome

24 Screening 5/2002

selection by partial (EST-) sequence pro-duces a viable, intermediate format. As amatter of fact, a minimal tiling path of ge-nomic fragments might well be the opti-mal format for such studies, especially onmicrobial organisms, since by definitionno open reading frame is missing and in-tergenic regions could be analysed on thevery same microarray. For genotyping ex-periments, the equivalent is representedby comprehensive oligomer arrays con-sisting, for instance, of all 16,384 possibleheptamers.

Production matters

The above sub-heading can be read twoways, both to be covered briefly. Produc-tion consists of two aspects: solid sup-port as well as capture probe synthesisand immobilisation. Nylon membraneswere initially used. Mainly for the aspectsof miniaturisation and automation, glassbecame the favourite surface.With otherdetection modes – mass spectrometry,measurement of impedance or the elec-tric potential – other surfaces will be-come more appropriate. While mono-layers are superior in most aspects, theamount of attached probe molecule canbe limiting, thus influencing kinetics andequilibrium, which in turn affect sensitiv-ity and accuracy.

Another important aspect in productionis the purity of the probe molecules. Pro-tocols are available, for example, foravoiding purification of PCR-products

prior to spotting, without a significantloss in performance.This not only avoidsthe expense of purification but also thetremendous loss of material during thisprocess. Oligonucleotide pre-fabricationallows for purification, either prior to orduring attachment, the latter achieved bytaking advantage of active groups presentonly in full-length molecules, a methodthat is also applicable to in situ synthe-sised oligomers. For some assays, impu-rity can be neglected anyway. Shorteroligonucleotide derivatives, for example,usually do not affect typing quality otherthan in terms of sensitivity.The dynamicrange of a measurement, on the otherhand, is strongly influenced.

Throughput and flexibility are other crit-ical factors.Application of pre-fabricatedprobes limits flexibility merely because ofthe cost involved in producing new mol-ecules. In situ synthesis can offer extraflexibility, if the control mechanism does.Photo-chemistry, for example, permitshigh throughput production. In case ofphotolithography, however, flexibility islimited by the expensive masks. Mask-free in situ synthesis permits a fully flexi-ble design, coming on-line in central,large-scale facilities and even for on-sitemicroarray production, the ultimate inflexibility (Fig. 2).

Hybridisation samples

In many microarray assays, genomicDNA is considered a static medium.

However, there are differences of dy-namic nature, introduced by re-arrange-ments, for example, or reversible modifi-cations such as methylation.The latter, asmany other sequence features, may alsohave an effect only in combination withstructural variants.Topological constrainsin DNA, for instance, can be essentialregulative factors.At current, they cannotbe analysed, however. Therefore, somedata obtained from microarray analysesmight only be descriptive or even notopen to interpretation at all for the lackof the appropriate assay.

Fig. 2: Typical results of transcriptional profiling(top right) or SNP-typing experiments (bottom)obtained on an oligomer array that was created byon site in situ synthesis.