A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS(Report)

3
1 ECE 546 Final Project Report (Group 35) A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS Xi Chen, Ting Zhu, Harun Demircioglu Introduction SRAM is a critical component in digital applications. In this project, a 32×4 bit SRAM is designed in 45nm CMOS process. With practical array implementation, robust 6T bit- cell, low-power decoder and optimized column circuitry, the design achieves 2.22GHz operation at 1.1V in post- simulation, with average power of 1.1mW. It processes the required testing vectors in 15.2ns. The performance demonstrates excellent energy-delay trade-off. Design Description Figure1 shows the block diagram. The 32×4 bit-cells are implemented in an 8×16 array. The address input is divided to 3-bit row (word-line) and 2-bit column selection signals. We defined the array based on balancing the propagation delay between row and column selection signals, and also balancing the design complexity of the row and column decoders. 6T structure is chosen for bit-cell [1]. The circuit configure is shown in Figure 2. For 45nm design, components need more careful sizing for reliable read-and-write operation. To minimize the capacitance demonstrated on the word-lines and bit-lines, we choose minimum sized (90n/50n) pass- gates M4 and M5. Accordingly, NMOS M2, M3 are sized larger than pass-gates to avoid read-upset, and PMOS M0, M1 are smaller for successful write. To save area and decrease parasitic capacitance, bit-cell is implemented in very compact layout with area of 0.52μm 2 . By sharing connections to bit-line pairs and power supply among the cells, the total area of the array is finished with 66.80 μm 2 . Fig 3 shows the column circuitry, including conditioning circuit, muxing, latches and write/read buffers. Conditioning circuit is used to pre-charge bit-lines by pulling PC low, and equalization transistor PE ensures the identical voltage on the bit-line pair [2]. For reading operation, one of the word- lines is enabled by disabling pre-charging circuit. One bit- line is pulled low by the selected bit-cell. Simultaneously, column pass-gates are tuned on. At the end of the reading, latch circuit stores the output signal steadily for the rest of the cycle. For writing operation, after pre-charging, enable the word-line, column selection and write pass-gates. One of the bit-line is pulled low depending on the input data. Based on our analysis, the main constraint of the operation speed is the latch sensitivity to logic “0” signal on the bit-lines. Unbalanced SR NAND latch is designed to improve the speed. The pull-up capability of the PMOS connected with bit-line is stronger than the pull down capability of NMOS. Figure 4 shows the decoder circuits. To make power-delay tradeoff, static decoders are designed. The critical path delay (from CLK to Q) is optimized by using two-stage logics. For high speed operation, pass-transistor with two stages static logics is used as column decoder. All the output stages of decoders are optimized to drive large capacitance on word- lines and column pass-gates. As Figure 5 shows, edge triggered noise tolerant dynamic TSPCL D flip-flop is chosen for this design [3]. This structure has advantages both in small number of transistors and free of harm caused by clock - clock_bar overlaps. To improve the noise tolerance of the TSPC, one additional transistor is used (indicated in Fig 5). The output Q and Q_bar are adjusted to be synchronous and they are buffered by large inverters (PMOS: 720n/50n, NMOS: 360n/ 50n) to drive the heavy load. Figure 6 shows the clock tree circuits and their insertion delays to every stage. Clock chain is carefully sized to ensure the timing of PC, WL and CL signals. Driving capability is adjusted depending on the loading. Results The project division among the group members is indicated in Table I. By great work of the group, highly competitive performance is achieved. At room temperature, the circuit is able to operate at 3.37 GHz in pre-layout simulation and 2.2 GHz in post-simulation (C+CC). The whole system power dissipation is 1.1mW with 1.1V supply in post-simulation. Table II summarized the circuit features. Figure 7 shows the layout plot of the whole circuit. For the required testing vector, the correct functional waveform is shown in Figure 8. Conclusions A 45nm 32×4 bit 6T-SRAM design is presented in this report. By comprehensive design consideration, the circuit performs up to 2.2GHz at 1.1V in post-simulation, with excellent energy-delay tradeoff. For further optimization, separated read/write path and high- performance sense amplifier can be designed to further speed-up the operation. Low swing technique can be practiced to decrease the power consumption. By this project, we have achieved a good understanding of making tradeoff of power and speed in memory circuit design. For the future class, it’ll be very helpful if more high-speed digital design methodology and commercial product samples can be introduced. Reference [1] Jan M. Rabaey, Digital Integrated Circuits (Second Edition), 2003 [2] Subhasish Mitra, Stanford EE 271, 2007 [3] Mohammed Elgamel, etc. “Noise Tolerant Low Power Dynamic TSPCL D Flip Flops”, ISVLSI’02

Transcript of A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS(Report)

Page 1: A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS(Report)

1

ECE 546 Final Project Report (Group 35)

A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS

Xi Chen, Ting Zhu, Harun Demircioglu

Introduction SRAM is a critical component in digital applications. In this project, a 32×4 bit SRAM is designed in 45nm CMOS process. With practical array implementation, robust 6T bit-cell, low-power decoder and optimized column circuitry, the design achieves 2.22GHz operation at 1.1V in post-simulation, with average power of 1.1mW. It processes the required testing vectors in 15.2ns. The performance demonstrates excellent energy-delay trade-off. Design Description Figure1 shows the block diagram. The 32×4 bit-cells are implemented in an 8×16 array. The address input is divided to 3-bit row (word-line) and 2-bit column selection signals. We defined the array based on balancing the propagation delay between row and column selection signals, and also balancing the design complexity of the row and column decoders. 6T structure is chosen for bit-cell [1]. The circuit configure is shown in Figure 2. For 45nm design, components need more careful sizing for reliable read-and-write operation. To minimize the capacitance demonstrated on the word-lines and bit-lines, we choose minimum sized (90n/50n) pass-gates M4 and M5. Accordingly, NMOS M2, M3 are sized larger than pass-gates to avoid read-upset, and PMOS M0, M1 are smaller for successful write. To save area and decrease parasitic capacitance, bit-cell is implemented in very compact layout with area of 0.52µm2. By sharing connections to bit-line pairs and power supply among the cells, the total area of the array is finished with 66.80 µm2. Fig 3 shows the column circuitry, including conditioning circuit, muxing, latches and write/read buffers. Conditioning circuit is used to pre-charge bit-lines by pulling PC low, and equalization transistor PE ensures the identical voltage on the bit-line pair [2]. For reading operation, one of the word-lines is enabled by disabling pre-charging circuit. One bit-line is pulled low by the selected bit-cell. Simultaneously, column pass-gates are tuned on. At the end of the reading, latch circuit stores the output signal steadily for the rest of the cycle. For writing operation, after pre-charging, enable the word-line, column selection and write pass-gates. One of the bit-line is pulled low depending on the input data. Based on our analysis, the main constraint of the operation speed is the latch sensitivity to logic “0” signal on the bit-lines. Unbalanced SR NAND latch is designed to improve the speed. The pull-up capability of the PMOS connected with bit-line is stronger than the pull down capability of NMOS.

Figure 4 shows the decoder circuits. To make power-delay tradeoff, static decoders are designed. The critical path delay (from CLK to Q) is optimized by using two-stage logics. For high speed operation, pass-transistor with two stages static logics is used as column decoder. All the output stages of decoders are optimized to drive large capacitance on word-lines and column pass-gates. As Figure 5 shows, edge triggered noise tolerant dynamic TSPCL D flip-flop is chosen for this design [3]. This structure has advantages both in small number of transistors and free of harm caused by clock - clock_bar overlaps. To improve the noise tolerance of the TSPC, one additional transistor is used (indicated in Fig 5). The output Q and Q_bar are adjusted to be synchronous and they are buffered by large inverters (PMOS: 720n/50n, NMOS: 360n/ 50n) to drive the heavy load. Figure 6 shows the clock tree circuits and their insertion delays to every stage. Clock chain is carefully sized to ensure the timing of PC, WL and CL signals. Driving capability is adjusted depending on the loading. Results The project division among the group members is indicated in Table I. By great work of the group, highly competitive performance is achieved. At room temperature, the circuit is able to operate at 3.37 GHz in pre-layout simulation and 2.2 GHz in post-simulation (C+CC). The whole system power dissipation is 1.1mW with 1.1V supply in post-simulation. Table II summarized the circuit features. Figure 7 shows the layout plot of the whole circuit. For the required testing vector, the correct functional waveform is shown in Figure 8. Conclusions A 45nm 32×4 bit 6T-SRAM design is presented in this report. By comprehensive design consideration, the circuit performs up to 2.2GHz at 1.1V in post-simulation, with excellent energy-delay tradeoff. For further optimization, separated read/write path and high-performance sense amplifier can be designed to further speed-up the operation. Low swing technique can be practiced to decrease the power consumption. By this project, we have achieved a good understanding of making tradeoff of power and speed in memory circuit design. For the future class, it’ll be very helpful if more high-speed digital design methodology and commercial product samples can be introduced. Reference [1] Jan M. Rabaey, Digital Integrated Circuits (Second Edition), 2003 [2] Subhasish Mitra, Stanford EE 271, 2007 [3] Mohammed Elgamel, etc. “Noise Tolerant Low Power Dynamic TSPCL D Flip Flops”, ISVLSI’02

Page 2: A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS(Report)

2

ECE 546 Final Project Report (Group 35)

Figure 3 Column Circuitry Figure 4 Row Decoder & Column Decoder

Figure 5 D Flip-flop design

Figure 1 Block Diagram

Figure 6 Clock Tree Design & Insertion Delay Analysis

Figure 2 6T-SRAM

Page 3: A 2.2GHz 32×4 bit 6T-SRAM Design in 45nm CMOS(Report)

3

ECE 546 Final Project Report (Group 35)

Figure 7 Layout for the complete design, with major blocks, I/O & VDD/GND pin locations indicated.

Table I Group Work Partition

Name Responsibility

Xi Chen ·Bit-cell array design & layout ·Column circuitry design & layout including bit-line conditioning, Write/Read · Top-level simulation ·Final report

Ting Zhu ·Row & Column decoder design & layout · Clock tree design & layout ·Top-level layout ·Final report

Harun Demircioglu ·D Flip-flop design & layout ·Final report

Figure 8 Output waveform indicating correct function at 2.2 GHz

Table 2 Design Statistics Minimum clock period 450 ps Maximum read-access time (with R+C+CC) 495 ps

Supply voltage 1.1 V Total Energy 16.9 pJ Area 241 µm2

No. of transistors/density 1224 (5.08/µm2)

Estimate of design time (Three members total)

120 working hours

Table II Design Statistic