Packing and Placement
description
Transcript of Packing and Placement
![Page 1: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/1.jpg)
Packing and Placement
Dr. Philip BriskDepartment of Computer Science and Engineering
University of California, Riverside
CS 223
![Page 2: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/2.jpg)
Packing Example (Homogeneous)
![Page 3: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/3.jpg)
Packing Example (Heterogeneous)Netlist
Architecture
Packing Solution
![Page 4: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/4.jpg)
Architecture Description and Packing for Logic Blocks with Hierarchy, Modes, and Complex
Interconnect
Jason Luu, Jason Anderson, and Jonathan RoseInternational Symposium on FPGAs, 2011
![Page 5: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/5.jpg)
AA-Pack 6.0 AlgorithmPick the un-packed mapped LUT with the largest number of attached nets
p – Netlist block ; B partially filled logic cluster
nets(p, B) – number of shared nets between p and Bext(p, B) – number of pins on p’s nets residing on netlist blocks NOT packed into Bpacked(p) – number of pins on p’s nets residing on netlist blocks packed into logic
clusters OTHER than Bnum_pins(p) – number of used pins on p (normalizes affinities across netlist blocks with
varying numbers of used pins
![Page 6: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/6.jpg)
Legality Challenges
• Handle complex logic clusters with hierarchy– Fracturable LUTs– Carry chains– Hard logic circuits
• Routability– Sparse crossbar intra-cluster routing
![Page 7: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/7.jpg)
Hierarchical Cluster Example
• Strategy: Pack each netlist block into the smallest primitive that can accommodate it
• Algorithm: Search the tree bottom-up, from right to left
![Page 8: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/8.jpg)
Ensuring Routability
• Basic Check: Does packing the netlist block into the cluster exceed I/O pin availability?
• Routability: Build routing graph and run a routing algorithm to determine legality– Routing algorithm details will be discussed next week
![Page 9: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/9.jpg)
Limitations• Focus is area optimization, not timing
• Architectural limitations– (Fracturable) LUT-based logic blocks– Fracturable arithmetic blocks (e.g., multipliers)– Memories with reconfigurable aspect ratios
• (not discussed)
• Mapping assumptions– Different block types cannot accommodate the same netlist block
• In reality, could pack a flip-flop into either a LUT- or multiplier-based block
![Page 10: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/10.jpg)
Toward Interconnect-Adaptive Packing for FPGAs
Jason Luu, Jason Anderson, and Jonathan RoseInternational Symposium on FPGAs, 2014
![Page 11: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/11.jpg)
AA-Pack 7.0• Calling the router repeatedly during packing is computationally
expensive– Speculative Packing: avoid unnecessary calls to the router– Interconnect-Aware Pin Counting: Quickly find unroutable instances
based on pin demand
• Pre-packing: Support inflexible routing structures – E.g., carry chains
• Other bells and whistles– Accurate timing model– Best-fit placement– Better support for high-fanout nets
![Page 12: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/12.jpg)
Speculative Packing
• FPGA 2011 Implementation– Call the router to check legality each time a new block
is packed into the cluster
• FPGA 2014 Implementation– Fill the logic block to capacity, then call the router
• If a legal route is found, we’re done• Otherwise, re-pack the block using the FPGA 2011 approach
– Works because the common case is that a legal route is found
![Page 13: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/13.jpg)
Interconnect-Aware Pin Counting
• Partition I/O pins into classes based on interconnect structure
• When each netlist block is packed, check the demand for each pin class
• Reject the block if demand exceeds supply for any pin class
![Page 14: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/14.jpg)
Example
![Page 15: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/15.jpg)
Properties and Limitations• An optimistic filter
– Cases that fail are not routable– Cases that pass may or may not be routable
• Sparse interconnect is approximated as fully connected
• Does not account for situations where a net routes through a sub-cluster without connecting to any primitives in that subcluster
• Internal feedback/feedforward connections within a logic cluster are discovered before packing and accounted for during pin counting
• Gives a pass/fail answer– Does not help to guide future candidate selection
![Page 16: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/16.jpg)
Pre-packing• Inflexible routing structures
– Incorrect grouping or placement of netlist blocks may fail routing
– The architect enumerates “pack patterns” to describe each structure
– Before packing, identify netlist sub-graphs that match “pack patterns”• Group them together and match them to logic cluster primitives that match
the “pack pattern”Pack Patterns• Multiply-add• Registered multiply• Registered add• Registered multiply-add
![Page 17: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/17.jpg)
Experiments
![Page 18: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/18.jpg)
Results
![Page 19: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/19.jpg)
Timing-Driven Placement for FPGAs
Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose
International Symposium on FPGAs, 2000
![Page 20: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/20.jpg)
Placement
![Page 21: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/21.jpg)
Simulated Annealing
![Page 22: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/22.jpg)
VPlace (Pre-dates this paper)
• Strategy: Minimize interconnect overhead
![Page 23: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/23.jpg)
Timing Analysis
• For a placed and routed net
• How much delay can we add to a net before it becomes critical?
![Page 24: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/24.jpg)
T-VPlace (This Paper)
• Optimize Timing + Wiring Complexity• Delay approximation– FPGAs are uniform– Store delays (Δx, Δy) in a ROM• Model a two-terminal net with source at (xsource, ysource)
and target at (xsource + Δx, ysource + Δy)
• Reduce the allowable move distance over time
α is the fraction of attempted moves that were accepted at the previous temperature
![Page 25: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/25.jpg)
Timing Cost and ObjectiveSum the timing costs of all source-sink pairs
Heavily weight critical nets
Maximum delay of all nets in the circuit
![Page 26: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/26.jpg)
Default value is 10
Annealing Schedule• Number of moves to perform at each temperature
• Vary the temperature as the algorithm progresses
• Termination criteria
α is the fraction of attempted moves that were accepted at the old temperature Told
![Page 27: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/27.jpg)
VPlace vs. T-VPlace
![Page 28: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/28.jpg)
Improving Simulated Annealing-Based FPGA Placement with Directed Moves
Kristofer Vorwerk, Andrew Kennings, and Jonathan W. Greene
IEEE Transactions on CAD 28(2): 179-192 (2009)
![Page 29: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/29.jpg)
• Motivation: an annealer may spend significant time revisiting previously explored states before it finds the lowest cost state– Coax the annealer into exploring neighbor states
that are more likely to yield an improvement
![Page 30: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/30.jpg)
Simple “Moves” (T-Vplace)
• Randomly select a cell – Move a cell to an unoccupied target location– Swap the location of two cells
• Location selection– Random shrinking window
α is the fraction of attempted moves that were accepted at the previous temperature
![Page 31: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/31.jpg)
Heuristics to Determine Source Cells
• Random– VPR
• Graph coloring– Color the netlist before placement– Chose up to 15 non-adjacent (same color) cells at a time
• Priority list– Randomly choose among the 25% worst placed cells
• Position (details to follow)• Timing cost of paths
![Page 32: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/32.jpg)
Heuristics to Determine Target Locations
• Random– VPR
• Linear assignment– Details omitted
• Median placement and variants– Details on the next slide
• Priority list
![Page 33: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/33.jpg)
Median Placement• Compute bounding boxes for all nets omitting source pins
– Take x and y minimums and maximums• Put points into vectors and sort• Define a rectangle by the median and median+1 entries in each vector • Randomly select a new target location within the rectangle
![Page 34: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/34.jpg)
Cell Rippling
Nearest empty location to B
• Rippling directions are chosen randomly
![Page 35: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/35.jpg)
Quality Factor of a Move
• pi is the probability that the move is accepted
• Use previous annealing iteration to determine the probabilities empirically
• Pprev(i) is P(i) from the previous iteration
![Page 36: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/36.jpg)
Results4 BLEs per cluster
8 BLEs per cluster
![Page 37: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/37.jpg)
Improving FPGA Placement with Dynamically Adaptive Stochastic Tunneling
Mingjie Lin and John WawryznekIEEE Transactions on CAD 29(12): 1858-1869 (2010)
![Page 38: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/38.jpg)
Simulated Annealing (Conceptual)
Stochastic Tunneling
![Page 39: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/39.jpg)
Simulated Annealing Weaknesses
• Sensitivity to parameters– Quite a few– Interactions between them not understood
• Freezing problem– Unable to escape local minima– Prevalent at low temperatures where bad moves
are accepted with a very low probability
![Page 40: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/40.jpg)
Acceptance Criteria for Bad Moves
• Simulated Annealing
• Stochastic Tunneling
“Energy” of the best solution found so far• Continually adjusted
as better solutions are found
“Energy” of the current solution being evaluated
Tunneling parameter
![Page 41: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/41.jpg)
Stochastic Tunneling (Conceptual)
![Page 42: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/42.jpg)
Stochastic Tunneling Pseudocode
![Page 43: Packing and Placement](https://reader036.fdocuments.in/reader036/viewer/2022062310/568163db550346895dd532d4/html5/thumbnails/43.jpg)
Results
Averages: 10.17 9.54 8.86 89.44 87.72 92.06 422.5 488.5 363.7