Download - Hal.05

Transcript
Page 1: Hal.05

RECOMB 2005, Poster Session A, Bay 43

Integrated Design Flow for Universal DNA Tag ArraysN. Hundewale1, I. Mandoiu2, L. Perelygina3, C. Prajescu2, and A. Zelikovsky1

1CS Department, GSU, 2CSE Department, UCONN, 3Department of Biology, GSU

• DNA microarrays provide a tool for answering a wide variety of questions about the dynamics of cells– In which cell tissues and under what environmental conditions is each

gene active?– How does the activity level of a gene change with: cell cycle stage,

environmental conditions, disease, etc.?– What genes seem to be regulated together?

• Universal tag arrays (UTAs) technology – Provides unprecedented assay customization flexibility while

maintaining a high degree of multiplexing and low unit cost

• In this poster we describe an integrated design flow for genomic assays based on UTAs– We use the proposed flow to design UTA-based assays for

measuring Herpes B viral gene expression in cells derived from macaque and human hosts

– After defining a “B virus molecular signature”, the assay can provide a sensitive tool for early B virus infection diagnosis and differentiation between B herpes and the closely related herpes simplex viruses

Abstract Universal DNA Tag Arrays

• “Programmable” Array Format [Brenner 97, Morris et al. 98]– Array consists of application independent oligonucleotides called tags

– Two-part reporter probes: aplication specific primers ligated to antitags

– Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes

• Tag/Antitag Hybridization Constraints(H1) Antitags hybridize strongly to complementary tags

(H2) No antitag hybridezes to a non-complementary tag

(H3) Antitags do not cross-hybridize to each other

t1t1 t2t2 t1 t2t1

+

Mix reporter probes with genomic DNASolution phase hybridization

Solid phase hybridization

Single-Base Extension

Generic UTA-Based Assay

Bioperl

Sequences in FASTA format

ORFs in Fasta format

GenMark/ORF Finder

Probe pools

Promide

Tag/antitag sequences

PerTags

Genomic IDs

Assayparameters

Reporter probes

PrimerDel+

Hybridization Experiment and AnalysisHybridization Experiment and Analysis

Design Flow Tag Set Design

Cycle Packing Algorithm [Mandoiu&Trinca 05]• T{}1. For each cycle C in c-token factor graph G, in increasing

order of cycle length, do– If C has no c-tokens in common with T, then add tag

defined by C to T and remove C from G2. Return T

Find: maximum cardinality set of tags such that no tag/tag or tag/antitag pair shares a substring of weight c

Where: weight(A)=weight(T)=1, weight(C)=weight(G)=2, and c is a given hybridization stringency constant

Conservative formalization of (H1)-(H3) based on nucleation complex theory and 2-4 rule:

Tag AssignmentPrimer-to-tag hybridization constraints:If primer p hybridizes with tag t, then either p or t must be left un-assigned, unless p is assigned to t p

t

t’

p’

Maximum Assignable Primer Set Problem: given primer set P and tag set T, find a maximum size assignable subset of P

• Greedy primer deletion heuristic [Ben-Dor 04] • Repeatedly delete a primer of maximum weight until P becomes

assignable, where– Weight of p is sum of potentials of tags to which it hybridizes

– Potential of a tag hybridizing with k primers is 2-k

• PrimerDel+ [Mandoiu et al. 05] – Modified primer deletion heuristic (exploiting availability of several

primer candidates with equivalent functionality

Experimental Results

% Util.# arrays% Util.# arrays% Util.# arrays

76.10199.80297.8045

76.10198.90296.7341152270

78.00199.90298.0045

78.00198.70296.5341156067

72.301100.00296.1345

72.30197.20294.0641144660

2000 tags1000 tags500 tagsPool size

# poolsTm

% Util.# arrays% Util.# arrays% Util.# arrays

70.30291.10292.2645

65.40273.65388.4641152270

67.20276.00391.8645

61.15269.70386.3341156067

63.55270.95388.2645

57.05265.35382.2641144660

2000 tags1000 tags500 tagsPool size

# poolsTm

GenFlex Tags

Periodic Tags

• We have described a suite of software tools for designing genomic assays based on UTAs– Integrating design flow optimization steps yields higher multiplexing

rates and leads to reduced assay costs

• In future work we will make the entire software suite available as an online web server

References• Aymetrix, Inc., GeneFlex tag array probe set, available at the NetAffx™ Analysis Center,

http://www.affymetrix.com/analysis/• M. Atlas, N. Hundewale, L. Perelygina, and A. Zelikovsky, Proc. International Conf. of the IEEE

Engineering in Medicine and Biology (EMBC), pp. 172-175, 2004.• A. BenDor, T. Hartman, B. Schwikowski, R. Sharan, and Z. Yakhini. Towards optimally multiplexed

applications of universal DNA tag systems. Proc. 7th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 48-56, 2003

• S. Brenner. Methods for sorting polynucleotides using oligonucleotide tags. US Patent 5,604,097, 1997.• I.I. Mandoiu and D. Trinca. Exact and approximation algorithms for DNA tag set design. Proc. 16th Annual

Symposium on Combinatorial Pattern Matching (CPM), pp. 383-393, 2005. • I.I. Mandoiu, C. Prajescu, and D. Trinca. Improved tag set design and multiplexing algorithms for universal

arrays. Proc. 5th Int. Conf. on Computational Science (ICCS 2005), Part II, pp. 994-1002, 2005.• M. Borodovsky, Genemark, http://opal.biology.gatech.edu/GeneMark• ORF finder, http://www.ncbi.nih.gov/gorf/gorf.html.• S. Rahmann, Rapid large-scale oligonucleotide selection for microarrays, Proc. IEEE Computer Society

Bioinformatics Conference (CSB), 2002.

Conclusions

• Open reading frames (ORFs)– ORFs are regions of genetic material beginning with a start codon and ending with a stop codon that might code for a protein

– ORFs can be extracted by means of the genome's sequence or id using ORF Finder. A second approach is to use the GenMark family of statistical gene prediction programs [Borodovsky]

•Primer selection

-Constraints:-Homogeneity: Each primer must hybridize to its target site at the temperature selected for the experiment

-Sensitivity: Must avoid self-hybridization and ensure that primers do not form secondary structures

-Specificity: Each primer must hybridize to one particular ORF-Selection tools:

-Primer and microarray probe selection are well studied; we use the Promide tool [Rahmann 03] for selecting pools of primer candidates meeting the above constraints for each ORF

ORF and Primer Selection