eNGINEERING cHANGE ORDERS

6
An Innovative Flow To Implement Large Scale Design Changes In The Final Stages Of Physical Implementation Manoj Kumar Dadhich Freescale Semiconductor India [email protected] Amit Bandlish Univ. of Southern California,USA [email protected] Vajeed Nimran IIT Mumbai, India [email protected] ABSTRACT A technique to implement large scale RTL Engineering Change Orders (ECOs) in System-on-Chip (SoC) designs at their final stages of physical implementation is presented. The technique aims at minimizing the design cycle time for implementing critical-path ECOs that affect large parts of a design so that designers are able to incorporate such changes without causing a major schedule impact. The proposed flow was successfully implemented and tested on a 90nm wireless Application Specific Integrated Circuit (ASIC) design that was first-pass success on silicon. Keywords ECO, clock tree synthesis, Sea-of-Gates, optimization 1. INTRODUCTION Baseband SoC design cycle times for 90 nanometer designs of the complexity of millions of gates can typically take up to six to eight months. Of this time, about three to four months are spent in freezing design specifications, getting functional Register-Transfer-Level designs and generating gate-level netlists which are optimized for timing. Effectively, the physical design activities such as clock tree synthesis, routing and timing closure thus have about two to four months to complete. Physical implementation is also a stage where logic bug frequency is at its peak [1]. Given such a scenario, a physical design engineer is invariably hard pressed for time to close on any modifications of initial conditions, such as an ECO, especially when the change involved is fairly large. In addition, as post-route verification generally progresses in tandem with the routing activities, there may be design changes up to the very last stage of physical design. Design requirements may necessitate changes in an embedded IP, say, or a critical bug may be discovered at the verification stage which requires to be fixed. Given the aggressive time-to- market schedules and customer demands, implementing large scale incremental changes or complete re-synthesis of affected modules in a design close to timing closure can become a physical designer’s nightmare. As is evident, any large change would impact not only the timing and placement of logic but also the design’s clock distribution structure and routing and pose new noise and signal integrity problems. In the case of small scale change, an ECO can be accomplished by simply modifying the input netlist and following a traditional verilog-ECO flow to incorporate the changes as such in a completely routed design. Most conventional ECO flows would be successful in handling design changes smaller than, say, a few cell changes, through verilog editing, which do not significantly affect other parts of the design. If however, the scale of change is extremely large, or the design is too complex for such fixes to be done manually, it would necessitate re-synthesis, in which case this approach would fail. The aim of the paper is to propose a flow for implementing such a large scale change in designs which are at a very late stage of their design cycle, while at the same time have minimal impact on other parts of the design. The paper is organized as follows: Section 2 describes the traditional ECO flow and its limitations which are addressed in Section 3 which describes in detail the proposed ECO flow. We share some experimental results in Section 4 and then present our conclusion in Section 5. 2. CONVENTIONAL ECO FLOW AND ITS LIMITATIONS As already emphasized, traditional ECO flows work best when changes are small. Typically, an ECO would be done by a front-end designer making manual changes to a netlist which is then used for incremental placement of the new logic. Instances associated with this and any other affected logic are then incrementally routed. This data is extracted and goes through static timing checks. The complete flow for implementing an ECO in the conventional manner is shown in figure 1.

description

VLSI

Transcript of eNGINEERING cHANGE ORDERS

Page 1: eNGINEERING cHANGE ORDERS

An Innovative Flow To Implement Large Scale Design Changes In The Final Stages Of Physical Implementation

Manoj Kumar Dadhich Freescale Semiconductor India

[email protected]

Amit Bandlish

Univ. of Southern California,USA

[email protected]

Vajeed Nimran IIT Mumbai, India

[email protected]

ABSTRACT A technique to implement large scale RTL Engineering Change Orders (ECOs) in System-on-Chip (SoC) designs at their final stages of physical implementation is presented. The technique aims at minimizing the design cycle time for implementing critical-path ECOs that affect large parts of a design so that designers are able to incorporate such changes without causing a major schedule impact. The proposed flow was successfully implemented and tested on a 90nm wireless Application Specific Integrated Circuit (ASIC) design that was first-pass success on silicon.

Keywords ECO, clock tree synthesis, Sea-of-Gates, optimization

1. INTRODUCTION Baseband SoC design cycle times for 90 nanometer designs of the complexity of millions of gates can typically take up to six to eight months. Of this time, about three to four months are spent in freezing design specifications, getting functional Register-Transfer-Level designs and generating gate-level netlists which are optimized for timing. Effectively, the physical design activities such as clock tree synthesis, routing and timing closure thus have about two to four months to complete. Physical implementation is also a stage where logic bug frequency is at its peak [1]. Given such a scenario, a physical design engineer is invariably hard pressed for time to close on any modifications of initial conditions, such as an ECO, especially when the change involved is fairly large. In addition, as post-route verification generally progresses in tandem with the routing activities, there may be design changes up to the very last stage of physical design. Design requirements may necessitate changes in an embedded IP, say, or a critical bug may be discovered at the verification stage which requires to be fixed. Given the aggressive time-to-market schedules and customer demands, implementing large scale incremental changes or complete re-synthesis of affected modules in a design close to timing closure can become a physical designer’s nightmare. As is evident, any large change would impact not only the timing and placement of logic but also the design’s clock

distribution structure and routing and pose new noise and signal integrity problems. In the case of small scale change, an ECO can be accomplished by simply modifying the input netlist and following a traditional verilog-ECO flow to incorporate the changes as such in a completely routed design. Most conventional ECO flows would be successful in handling design changes smaller than, say, a few cell changes, through verilog editing, which do not significantly affect other parts of the design. If however, the scale of change is extremely large, or the design is too complex for such fixes to be done manually, it would necessitate re-synthesis, in which case this approach would fail. The aim of the paper is to propose a flow for implementing such a large scale change in designs which are at a very late stage of their design cycle, while at the same time have minimal impact on other parts of the design. The paper is organized as follows: Section 2 describes the traditional ECO flow and its limitations which are addressed in Section 3 which describes in detail the proposed ECO flow. We share some experimental results in Section 4 and then present our conclusion in Section 5.

2. CONVENTIONAL ECO FLOW AND ITS LIMITATIONS As already emphasized, traditional ECO flows work best when changes are small. Typically, an ECO would be done by a front-end designer making manual changes to a netlist which is then used for incremental placement of the new logic. Instances associated with this and any other affected logic are then incrementally routed. This data is extracted and goes through static timing checks. The complete flow for implementing an ECO in the conventional manner is shown in figure 1.

Page 2: eNGINEERING cHANGE ORDERS

Figure1: Conventional ECO Flow

There are some major limitations of this flow in implementing large scale ECOs: Firstly, it is impractical to manually modify netlists having a large number of changes. In this case, there would be no option but to resynthesize the gate-level netlist from the module’s RTL. Secondly, since most ECOs usually involve a small logic change, most incremental placement algorithms supported by commercially available tools are congestion driven rather than timing driven. This can become a major drawback when implementing a large-scale RTL design change, as the incremental logic may end up having new timing violations. Most importantly, if a considerable part of the design has been modified, it would inevitably lead to changes in the clock trees and routing topology. Current tools for synthesis, such as Physical Compiler (PC) from Synopsys, do have ECO capabilities but practically work well only on changes of the order of 2-4% of the design. Beyond this, incremental synthesis and placement capabilities fail to provide optimal results, both from the timing and routing perspective, e.g. creation of localized over-congestion. Apart from the usual timing checks, designs in the nanometer scale face deep-submicron (DSM) issues such as signal integrity (SI) viz. delay and functional noise as well as electromigration and DFM, which depend largely on route topologies. Therefore, in a routed design which is delay and glitch noise clean and has DSM issues addressed, imple-menting such an ECO involves not only ensuring that timing is met but that it is accomplished with minimal routing changes so that noise analysis and repair cycle time after the ECO implementation does not become prohibitively large. Another important consideration is the power dissipation or leakage in the chip. This factor assumes particular importance in wireless devices, which have strict limits on power dissipation to increase battery life.

Since, intrinsically, incremental placement algorithms are not as good as full-blown algorithms, an incremental placement is bound to be left with some timing violations. It is possible that many of these violations may not be fixable, with the limited placement area available for the ECO. Such violations can then be fixed by using a multi-vt flow in which standard-vt cells, which are faster than the high-vt cells but have higher leakage, are used. Therefore, while implementing the ECO, the physical designer must also take into account the number of standard-vt cells that may need to be added to close on timing. Needless to say, it is imperative to keep the use of such cells to a minimum.

3. PROPOSED ECO FLOW In order to address the aforementioned limitations, we propose an ECO Flow for efficiently implementing large scale design changes. The design under consideration is the applications processor (AP) platform of a 90 nanometer wireless ASIC of gate count of approximately 1.4 million equivalent placeable nand gates in the Sea-of-Gates (SoG). The die size is approximately 24.144 mm2 with a rectilinear floorplan (see Figure 2 below and Table 1)

Figure 2: Applications Processor (AP) Platform

DESIGN PARAMETERS VALUES No. of Placeable Instances 346293 No. of Nets 356501 No. of hard blocks 15 No. of IO Pins 4833 Chip Size 2.6948e+07 um2

Core Size 1.4931e+07 um2

Utilization 72% No. of Clock Domains 40 Maximum Frequency of Operation 399Mhz

Table 1: AP Design Specifications The flow was successfully implemented on two modules, sdma (light gray) and scmfbc (dark gray) of this platform (Refer

Manual Netlist Modification

ECO Place new instances

ECO Route new nets

Parasitic Extraction

Static Timing Analysis (STA)

New Netlist + Old DEF (Physical information)

Page 3: eNGINEERING cHANGE ORDERS

figure 2). Sdma constitutes about 15% of the SOG area of AP platform and is one of the most timing critical modules of the platform, working at 133Mhz. Scmfbc is comparatively smaller at about 2% of the SOG area and also works at 133Mhz. See Tables 2 and 3 below.

SCMFBC DESIGN PARAMETERS VALUES

No. of Cells 6204

No. of Nets 7083

No. of hard blocks 0

Utilization 65%

Maximum Frequency of Operation 133 MHz

No. of Clock Domains 2 Table 2: SCMFBC Design Specs

SDMA DESIGN PARAMETERS VALUES

No. of Cells 52896

No. of Nets 56041

No. of hard blocks 2

Utilization 70%

Maximum Frequency of Operation 133 MHz

No. of Clock Domains 6

Table 3: SDMA Design Specs The basic steps of the proposed ECO flow have been outlined below: (See figure 3) 1. Re-synthesis of the module in which the ECO has been

implemented

2. ECO Placement of the module using First Encounter™ (FE)

from Cadence Design Systems.

3. Timing Optimization in FE

4. Routing with minimal changes to the rest of the design

5. Post-Route Timing Optimization

3.1. Re-synthesis of the module in which the ECO has been implemented The first step is to synthesize a gate-level netlist from the RTL level design. For this, Synopsys Design Compiler was used and synthesis was done in the prototyping mode. Information of the physical attributes of the design, such as utilization and frame aspect ratios, was provided for synthesis.

Figure 3: Proposed ECO Flow

The timing constraints required for standalone synthesis of the affected module were characterized from the platform-level constraints. Using these inputs, the new gate-level netlist for the module was generated which was then plugged in to the original netlist, after removing the original instantiation.

3.2. ECO Placement of the module using FE The netlist from the synthesis tool has only the affected module changed, with the rest of the design untouched. The first requirement is to place all the logic of the updated module such that the existing placement for the other parts of design can be reused while also meeting the intra-module timing. As incremental placement by the synthesis tool (PC) did not provide completely satisfactory results in terms of timing and routing, there was a need to optimize the placement of the module to close on timing quickly without introducing new routing violations. The following approach was adopted, using First Encounter from Cadence.

New RTL + Timing Constraints + Physical Information

Region Based Timing Driven Placement

Optimized Gate-level Netlist

Pre-Route Timing Optimization

Parasitic Extraction and Static Timing Analysis

Partial Clock Tree Synthesis

Restoration of Original Clock and Signal Routes

Incremental Routing

Post-Route Timing Optimization

Page 4: eNGINEERING cHANGE ORDERS

From the database prior to the ECO, the approximate area

or successfully doing a timing driven placement in FE, it was

fter correlation, the next step was to read the design

3.3. Clock tree synthesis factor that ensures quick timing

nowing this, preserving the existing clock trees as much as

or this, the first step is to find common or convergent points

occupied by the logic of the changed module was calculated. A placement guide was defined within that region and the logic of the ECOed module was assigned to that guide. A placement guide acts as a kind of a soft bound for placing the new module instances. All other instances belonging to the rest of the already placed design were fixed to prevent any movement during ECO placement. After assignment, a full-blown placement was run to place the module in the given bound. Fimportant to achieve a good timing correlation between FE and the sign-off timing analyzer (PrimeTimeTM). In order to accomplish that, a correlation flow for the two tools was developed. Parasitics were extracted under similar conditions and similar reduction parameters from both FE and the sign-off extractor (StarRCXtTM from Synopsys). The extracted parasitics were then compared in the comparison tool Ostrich (Cadence) to obtain RC correlation factors to be applied to the synthesis tool’s internal extractor before initiating the timing driven placement. Using this method, we were able to achieve a correlation of up to 95% between the two. Aconstraints for guiding the timing engine while performing logic optimization within the placement guide. NanorouteTM (Cadence) being our sign-off router, the route estimates it made during timing driven placement were closer to the final routes and thus gave more accurate results than those estimated by PC. Therefore, by using this approach we were not only able to obtain a good starting placement timing-wise, but also one which was optimized for routing.

Arguably, the most importantclosure in a complex design is the quality of the clock distribution network. The quality of a clock tree can be gauged on the following factors: timing, immunity to noise, power-efficiency and its utilization impact. The earlier in the design cycle the clock tree can meet all these requirements, the quicker would be the timing closure on the rest of the design. Kpossible was an important requirement to save time and effort that would otherwise have been spent if a complete re-synthesis of the affected clock was done. To work around this problem of complete clock tree re-synthesis, a partial clock tree synthesis approach was attempted for clocks being affected by the new module. Fin the clock trees which are feeding the ECOed module. Starting from the root clock pin , this is the lowest level of the clock tree feeding only the changed module. This point is illustrated in figure 4.

Figure 4: Clock tree Convergence Point

With information of pre-ECO delays in these subtrees, only these sections of the clock tree can be re-synthesized such that they meet the pre-ECO delays. In this way, almost all clock nets in other sections of the design can be preserved and iterations for timing as well noise closure on these clock nets can be reduced to a minimum. Routing of these new clock nets is then done in the ECO Route mode, details of which are explained in the following sections.

3.4. Pre-route timing optimization After timing driven placement of the module in FE, one round of timing optimizations is required to fine-tune the placement. As a start, all transition violations in intra-module nets are selectively fixed. This helps to reduce the magnitude of setup violations caused due to capacitive loading by extremely long routes. This also ensures that the tool does not spend unnecessarily long times in trying to optimize logic in paths which have large interconnect delays. For fixing transition, all nets of the modified module that were violating transition constraints were found and then isolated for fixing using timing optimization commands inbuilt in FE [2]. After fixing transition violations in this module, the next step is to fix setup violations in clock domains which are affecting the changed module. Timing constraints for the design were read in and all clock domains were defined. From these were selected clocks domains that either start from or end at the flops of the changed module. These selected source and destination clock domains were then defined in FE for selective optimization. To ensure that fixing setup violations in these domains did not deteriorate timing on other domains, all paths common to other unaffected clock domains were excluded from optimization. Timing optimization was then performed on these domains. As already mentioned, an important consideration while logic optimization are the constraints on leakage current. In order not to exceed the design requirements on this parameter, only high-vt cells were used for optimization. Though this restriction did have a slightly negative impact in terms of number of components added for final optimization, we were

FF

FF

FF

FF

FF

FF

FF

FF

FF

MUX

FF

FF

FF

}

Page 5: eNGINEERING cHANGE ORDERS

able to stay within the leakage limits defined and fix almost all setup violations.

3.5. Routing Clock nets being the most timing critical are generally routed before signal nets to ensure shortest routes and minimum detouring. While routing the ECO clock nets too, all efforts should be made to ensure that the new clock nets are routed with these attributes while minimally affecting clock routes of the rest of the design. To accomplish this, the following approach was used. After completing the partial clock tree synthesis on the module as well as pre-route timing optimization and transition fixing, all unchanged clock nets from the pre-ECO routed design were imported to the ECOed design, leaving out the signal nets. The new clock nets were then selectively routed on this design. Any routing violations caused by these nets with existing clock routes were then fixed in ECO mode to ensure minimal routing changes. See figure 5.

Figure 5: Routing Flow Once these violations were fixed, route geometries of all clock nets were fixed to avoid further changes when new signal nets would be routed. The next step was to import the routing information of all the signal nets from the pre-ECO design to this database and route them incrementally to honor existing topologies as much as possible. Using this methodology, we were able to ensure that very few, if any, new functional noise violations were introduced (See Table 4), which again was instrumental in quick timing closure.

3.6. Post-route timing optimization Since there is always a difference between the pre-route estimates and actual post-route delays, one round of timing optimization is necessary for transition and setup fixing at the post-route stage. For fixing any remaining violations in the post-route stage, essentially the same steps are followed as in the preroute stage. Some of the salient differences are that since routes are in place, actual post-route delays are now used by the optimization engine instead of delay estimations based

on a trial route. In addition, optimization is now done based on actual clock tree delays, i.e. clocks are propagated as opposed to ideal clocks used in pre-route optimizations.

4. EXPERIMENTAL RESULTS Our proposed ECO flow was implemented on the applications processor (AP) platform of a 90 nanometer wireless ASIC. AP platform has about 20 different peripherals. The ECO to be implemented involved a change in one of the processor cores of the platform as a result of a functional bug found at the post-layout verification stage. The change being major (17% of SoG) and occurring at a stage where the design was close to timing closure, the traditional flow would necessitate a complete re-synthesis of the SoG, which would mean a 5-6 week schedule impact. However, with our ECO flow, we were able to successfully implement the change, saving about 4.5 weeks in the process. (See Table 4)

Metrics SCMFBC SDMA

No. of Placeable Instances 6204 52896

WNS at Initial Placement Stage ~4ns ~5ns WNS after fully automated Timing Optimization in FE -1.5ns -1.2ns

No. of New Violating paths in the rest of the design 0 0

No. of Violating paths (after optimization) 30 700

No. of Standard-Vt cells used to fix these violations <200 <300

Schedule Impact with traditional flow

3-4 weeks

3-4 weeks

Schedule Impact with Proposed flow 1 week 1.5

weeks No. of Routing violations before ECO ~200

No. of Routing violations after ECO ~230

No. of Functional Noise Violations before ECO ~6

No. of Functional Noise Violations after ECO ~10

CTS Implemented

Import pre-ECO clock nets

Route new clock nets

Correct clock net DRCs in ECO mode and fix route topologies

Import pre-ECO signal nets

Incrementally route signal nets

Remove Existing Routing

Final Routed Design

Table 4: Experimental Results The results can be divided into three main heads:

4.1. Timing results The scmfbc module after initial placement by PC started with setup violations of the order of 4ns. With our flow, we were able to reduce these violations to less than 1.5ns for setup. The remaining negative slack was fixed through the use of fast standard-vt cells. The normal implementation, involving synthesis from scratch would take approximately 2 weeks which was accomplished in five days with our flow. Similar results were seen on sdma module as well which was much bigger and more timing critical. We were able to close it

Page 6: eNGINEERING cHANGE ORDERS

at 1.5ns negative slack for setup through this flow and the time saving on this module was even more, at about 3 weeks. Overall, this flow helped in closing timing on complex modules much ahead of expected time.

4.2. Routing and Functional noise results By ensuring minimal changes to both clock and signal routes through this flow, we were able to minimize emergence of new routing-related violations. We were able to ensure almost the same number of DRC violations at the end of the flow as we had started with on the fully routed design. Crosstalk noise was also kept to a minimum with the two step clock routing and ECO signal routing. This ensured minimum iterations to close on noise-induced functional and timing violations.

4.3. Power saving As mentioned in the flow, it was our endeavor to restrict the use of std-vt cells which would consume more idle power and hence contribute to overall power dissipation and battery life. With our flow, we were able to reduce the number of std-vt cells used to less than a third, in the case of scmfbc and even fewer in the case of sdma.

5. CONCLUSION The methodology developed in this paper, to implement large scale ECOs in a design in its last stage of physical implementation, was proven on a 90 nanometer wireless ASIC that was a first-pass success on silicon. Using this technique, we were able to incorporate a change affecting approximately 17% of Sea-of- Gates area of the design while reducing the overall time for ECO implementation by more than 65% over traditional ECO flows.

6. REFERENCES [1] Gilles-Eric Descamps, Satish Bagalkotkar, Subramanian Ganesan, Satish lyengar, Alain Pirson, "Design of a 17-million Gate Network Processor using a Design Factory" DAC 2003, California USA [2] Encounter(TM) User Guide. Product version 4.2.2 Cadence Design Systems, August 2005.