Engineering a Scalable Placement Heuristic for DNA Probe Arrays
description
Transcript of Engineering a Scalable Placement Heuristic for DNA Probe Arrays
![Page 1: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/1.jpg)
Engineering a Scalable Placement Heuristic for DNA
Probe Arrays
A.B. Kahng, I.I. Mandoiu, P. Pevzner, A.B. Kahng, I.I. Mandoiu, P. Pevzner,
S. Reda (all UCSD), A. Zelikovsky (GSU)S. Reda (all UCSD), A. Zelikovsky (GSU)
![Page 2: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/2.jpg)
Outline
• DNA probe arrays and unwanted illumination• Synchronous array design (2-D placement)• Asynchronous array design (3-D placement)• Experimental results• Extensions• Conclusions
![Page 3: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/3.jpg)
Outline
• DNA probe arrays and unwanted illumination• Synchronous array design (2-D placement)• Asynchronous array design (3-D placement)• Experimental results• Extensions• Conclusions
![Page 4: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/4.jpg)
DNA Probe Arrays
• Used in wide range of genomic analyses– Gene expression monitoring, SNP mapping, sequencing by
hybridization,…
• Arrays with up to 1000x1000 probes in commercial use, 108 probes envisioned for next generation arrays– Highly scalable algorithms required for array design
![Page 5: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/5.jpg)
Simplified DNA Array Flow
Probe Selection
Array Manufacturing
Hybridization Experiment
Gene sequences, position of SNPs, etc.
Analysis of Hybridization Intensities
Mask Manufacturing
Soft/Computational Domain
Hard/Biochemistry Domain
Mask Design:
Placement & Embedding
![Page 6: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/6.jpg)
Array Manufacturing Process
Very Large-Scale Immobilized Polymer Synthesis:
1. Treat substrate with chemically protected “linker” molecules, creating rectangular array
– Site size = approx. 10x10 microns
2. Selectively expose array sites to light
– Light deprotects exposed molecules, activating further synthesis
3. Flush chip surface with solution of protected A,C,G,T
– Binding occurs at previously deprotected sites
4. Repeat steps 2&3 until desired probes are synthesized
![Page 7: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/7.jpg)
Photo-Deprotection Step
Our concern: diffraction unwanted illumination yield decrease
![Page 8: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/8.jpg)
Probe Synthesis
Nu
cle
otid
e d
epo
sitio
n s
eq
uen
ce A
CG
G M3
C M2
A M1
CG
AC
CG
AC
ACG
AG
G
AG
C
Placed probes
A
A
A
A
A
C
C
C
C
C
C
G G
G G
G G
![Page 9: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/9.jpg)
Measuring Unwanted Illumination
Nu
cle
otid
e d
epo
sitio
n s
eq
uen
ce A
CG
G M3
C M2
A M1
A
A
A
A
A
C
C
C
C
C
C
G G
G G
G G
border
Unwanted illumination border length
CG
AC
CG
AC
ACG
AG
G
AG
C
Placed probes
![Page 10: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/10.jpg)
Synchronous vs. Asynchronous Synthesis
(a) periodic deposition sequence
(b) Synchronous embedding of CTG
(c) Asynchronous leftmost embedding of CTG
(d) Another asynchronous embedding
T
GC
A
T
G
T
G
C
A
…
C
A
4-group
(a)
C
G
T
(b)
C
T
G
(c)
G
C
T
(d)
![Page 11: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/11.jpg)
Outline
• DNA probe arrays and unwanted illumination• Synchronous array design (2-D placement)• Asynchronous array design (3-D placement)• Experimental results• Extensions• Conclusions
![Page 12: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/12.jpg)
Problem Formulation (Synchronous Case)
Synchronous Array Design (2-D Placement) Problem:• Minimize placement cost of Hamming graph H
(vertices = probes, distance = Hamming)
• On 2-dimensional grid graph G2 (N x N array, edges b/w distance 1 neighbors)
H
probe
G2site
![Page 13: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/13.jpg)
2-D Placement Lower Bound
• Sum of Hamming distances to 4 closest neighbors minus weight of 4N heaviest arcs
H
probe
G2
![Page 14: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/14.jpg)
TSP+1-Threading Placement
Hubbell 90’s• Find TSP tour/path over given probes w.r.t.
Hamming distance • Thread TSP path in the grid row by row
Hannenhalli,Hubbell,Lipshutz, Pevzner’02• Place the probes according to 1-Threading • Further decreases total border by 20%
![Page 15: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/15.jpg)
Lexicographical Sorting +1-Threading
A
A
T
G
C
A
A
T
G
A
T
G
G
Radix-sort the probes in lexicographical order
1 2 3
C
C
Thread on the chip
![Page 16: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/16.jpg)
Matching Based Probe Placement
1
3
2
5
4
Select an independent (mutually nonadjacent) set of
placed probes
Re-embed using optimal
perfect matching
2
2
3
1
4
Total cost can only decrease or remain the same
Runtime: roughly proportional to square of independent set size
![Page 17: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/17.jpg)
Sliding Window Matching
There is a trade-off between solution quality and size/overlap of windows
Iterate SlidingWindowMatching over the chip until improvement drops below 0.1%
![Page 18: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/18.jpg)
Effect of Window Size on Solution Quality
Increased window size/overlap decreases number of conflicts, but increases runtime
![Page 19: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/19.jpg)
Epitaxial Placement Algorithm
• Simulates crystal-growth
• Start with arbitrary probe placed at center
• Maintain a best probe-candidate (i.e, a probe with min number of conflicts to the already placed neighbors) for each border site
• Iteratively fill the border site with minimum increase in border length
- give priority to sites with more neighbors filled
![Page 20: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/20.jpg)
Tile- and Row- Epitaxial
• Tile-epitaxial– Divide array into 100x100 tiles– Run Epitaxial within each tile– Take into account border of already placed tiles
• Row-epitaxial– Place probes by a fast method, e.g., sort+1-thread– Re-place probes row by row, sequentially filling
sites within a row– Assign to each site a probe with min number of
conflicts among the unplaced probes from following K rows
![Page 21: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/21.jpg)
2-D Placement Algorithm Comparison:
Border Conflict
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
100 200 300 500
LB
Row-EPTX
EPTX
Tile-EPTX
TSP+1Thr
SWM 6x6
![Page 22: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/22.jpg)
2-D Placement Algorithm Comparison:
Runtime
1
10
100
1000
10000
100000
1000000
100 200 300 500 1000
TSP+1Thr
Row-EPTX
EPTX
Tile-EPTX
SWM
![Page 23: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/23.jpg)
Outline
• DNA probe arrays and unwanted illumination• Synchronous array design (2-D placement)• Asynchronous array design (3-D placement)• Experimental results• Extensions• Conclusions
![Page 24: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/24.jpg)
Problem Formulation (Asynchronous Case)
• Asynchronous synthesis:– Periodic nucleotide deposition sequence, e.g., (ACTG)p
– Every probe grows asynchronously
Border length = Hamming distance between embedded probes • Asynchronous Array (3-D Placement) Design Problem:
– Minimize placement cost of embedded-probe Hamming graph H (vertices=probes, distance = Hamming b/w embedded probes)
– on 2-dimensional grid graph G2 (N x N array, edges b/w neighbors)
H
probe
G2
site
![Page 25: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/25.jpg)
Lower Bound
• Sum of distances to 4 closest neighbors minus weight of 4N heaviest arcs– Distance between two probes of length p = 2p - |Longest Common Subsequence|
• Non-tight bound: example with LB = 8 and best placement cost = 10
2M
5M
4M
AC
CT TG
GA
Optimum placement
AC
CT TG
GA1
1
1
1
1 111
Nuc
leot
ide
depo
sitio
n se
quen
ce S
=A
CT
GA
A
G
T
C
A
3M
1M
A
G
G
TT
C
C
A
(c)
![Page 26: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/26.jpg)
Optimal Probe Alignment
A
C
T
A C G T A C G TSource
Sink
• Find best alignment of probe wrt embedded neighbors• Dynamic Programming:
– Source-sink paths corresponds to feasible embeddings
– O[(probe length) x (deposition sequence length)]
• Can be extended to simultaneous alignment of two adjacent probes (2x1) with increase by O(probe length)
![Page 27: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/27.jpg)
3-D Placement Flows
- Simultaneous placement and alignment- asynchronous epitaxial (slow and low quality)
- Synchronous placement followed by in-place probe alignment (analogous to standard for VLSI flow partition)- using previous DP to do in-place probe alignment
- Synchronous placement followed by probe alignment with reshuffle (analogous to feedback loops in VLSI flows)- asynchronous sliding window matching
![Page 28: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/28.jpg)
Algorithms for In-Place Probe Alignment
• Asynchronous re-embedding after 2-dim placement– Greedy Algorithm
• While there exist probes to re-embed with gain– Optimally re-embed the probe with the largest gain
– Batched greedy: speed-up by avoiding recalculations– Chessboard Algorithm
• While there is gain– Re-embed probes in green sites– Re-embed probes in red sites
![Page 29: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/29.jpg)
Comparison of In-Place Probe Alignments
Chip size
LB TSP+1Thr Greedy Chessboard 2x1 Chessboard
%LB %LB %LB CPU %LB CPU %LB CPU
100 100 152.0 125.7 40 120.5 54 119.4 480
200 100 150.2 126.3 154 120.9 221 119.7 1915
300 100 149.1 126.7 357 121.5 522 121.6 4349
500 100 147.9 127.1 943 121.4 1423 120.2 15990
• Post-placement LB = sum of distances to adjacent probes– Distance between two probes of length p = 2p - |LCS |– Useful for assessing quality of algorithms that change probe
embeddings but do not change probe placement
![Page 30: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/30.jpg)
Outline
• DNA probe arrays and unwanted illumination• Synchronous array design (2-D placement)• Asynchronous array design (3-D placement)• Experimental results• Extensions• Conclusions
![Page 31: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/31.jpg)
3-D vs. 2-D Placement Results
Chip size
TSP+1Thr TSP+1Thr+
Chessboard
Epitaxial+
Chessboard
SyncSWM+
Chessboard
AsyncSWM
Cost Cost CPU Cost CPU Cost CPU Cost CPU
100 554849 439829 113 419069 274 433274 1 417890 875
200 2140903 1723352 1901 1624988 4441 1693658 46 1636658 3676
300 4667882 3801765 12028 --- --- 3746722 112 3615282 8406
500 12702474 10426237 109648 --- --- 10049442 302 9686918 22351
1000 --- --- --- --- --- 38898792 1307 38005039 54501
![Page 32: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/32.jpg)
3-D Placement Algorithm Comparison:
Border Conflict
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
100 200 300 500 1000
TSP+1Thr
TSP+1Thr+Chess
RowEPTX+Chess
EPTX+Chess
TileEPTX+Chess
SyncSWM+Chess
AsyncSWM+Chess
![Page 33: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/33.jpg)
3-D Placement Algorithm Comparison:
Runtime
1
10
100
1000
10000
100000
1000000
100 200 300 500 1000
TSP+1Thr+Chess
RowEPTX+Chess
EPTX+Chess
TileEPTX+Chess
SyncSWM+Chess
AsyncSWM+Chess
![Page 34: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/34.jpg)
Outline
• DNA probe arrays and unwanted illumination• Synchronous array design (2-D placement)• Asynchronous array design (3-D placement)• Experimental results• Extensions• Conclusions
![Page 35: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/35.jpg)
Practical Extensions
• Distant-dependent border conflict weights
Take into account conflicts between 2-,3-hop neighbors rather than only immediate neighbors
• Position-dependent border conflict weights
In alignment DP for two sequences take into account importance of conflicts in the middle of probes – alignment cost has weights on conflicts which depend on conflict position
• Polymorphic probes
Chip contains SNP’s, e.g. pairs of probes different in a single position – they should be placed together and alignment DP should align them simultaneously
![Page 36: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/36.jpg)
Alignment DP for 2-SNP’s
Optimal Embedding of A{C,T}T
![Page 37: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/37.jpg)
Simplified DNA Array Flow
Probe Selection
Array Manufacturing
Hybridization Experiment
Gene sequences, position of SNPs, etc.
Analysis of Hybridization Intensities
Mask Manufacturing
Soft/Computational Domain
Hard/Biochemistry Domain
Mask Design:
Placement & Embedding
![Page 38: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/38.jpg)
Enhanced DNA Array Design Flow
Probe Selection
Mask Design:
Placement & Embedding
![Page 39: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/39.jpg)
Enhanced DNA Array Design Flow
Probe Selection
Mask Design:
Placement & Embedding
Probe Pools
![Page 40: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/40.jpg)
Enhanced DNA Array Design Flow
Probe Selection
Mask Design:
Placement & Embedding
Deposition Mask Design
Probe Pools
![Page 41: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/41.jpg)
Enhanced DNA Array Design Flow
Probe Selection
Mask Design:
Placement & Embedding
Deposition Mask Design
Probe PoolsDesign Rules &Parameters
![Page 42: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/42.jpg)
Enhanced DNA Array Design Flow
Probe Selection
Mask Design:
Placement & Embedding
Deposition Mask Design
Conflict Map
Probe PoolsDesign Rules &Parameters
![Page 43: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/43.jpg)
Enhanced DNA Array Design Flow
Probe Selection
Mask Design:
Placement & Embedding
Deposition Mask Design
Test/Control Structure Design Conflict Map
Probe PoolsDesign Rules &Parameters
![Page 44: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/44.jpg)
Summary
• Contributions:– Epitaxial placement reduces by extra 10% over the previously best
known method– Asynchronous placement problem formulation– Postplacement improvement by extra 15.5-21.8%– Lower bounds– Scalable Placements (1000x1000 in 20min)
• Ongoing work– Comparison on industrial benchmarks– Experiments with algorithms for extended formulations (SNPs,
distance-dependent weights, etc.)
• Future Directions– Design flow enhancements– Nucleotide deposition sequence design– Partitioning and integration for manufacturing cost reduction
![Page 45: Engineering a Scalable Placement Heuristic for DNA Probe Arrays](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ac6550346895da2db39/html5/thumbnails/45.jpg)
Thank you!