Chapter 5-1 Memory System Memory System Next Lecture Cache Memory.

48
Chapter 5-1 Memory System Memory System Next Lecture Cache Memory

Transcript of Chapter 5-1 Memory System Memory System Next Lecture Cache Memory.

Page 1: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

Chapter 5-1Memory System

Memory System Next Lecture

Cache Memory

Page 2: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

2

The maximum size of the memory that can be used is determined by the addressing scheme

Example: a 32 bit computer that generates 32-bit addresses is capable of addressing 232 = 4G memory locations in bytes

Basic Concept

Up to 2k addressableMDR

MAR

k-bitaddress bus

n-bitdata bus

Control lines

( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

Page 3: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

3

Main memory is used to store and retrieve data in word-length

Definition of word length: the number of bits stored and retrieved in one memory access

Many current CPUs may be able to retrieve 64 bits (Pentium 4) or even 128 bits at one clock to fill their internal cache

Most modern computers are byte addressable Example: a possible address assignment for a byte

addressable 32-bit computer using the big-endian arrangement

Basic Concept (cont.)

Page 4: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

4

When a 32bit address is sent from CPU to memory The high-order 30 bits determine which word will be

accessed The lower-order 2 bits specify which byte In a read operation, other bytes may be fetched If the operation is a byte write, the control circuitry of the

memory (called memory controller) must ensure that the contents of other bytes of the same word are not changed

Basic Concept (cont.)

3 7 b f

2 6 a e

1 5 9 d

0 4 8 c

A0A1

2 to 4 decoder

A2-A31

AddressAddress

88 88 88 88Data (D0-D31)Data (D0-D31)

3232SelSel

Page 5: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

5

If MAR is k bits long and MDR is n bits long, then the memory may contain 2k addressable locations during a memory cycle, n bits of data are transferred

between the memory and the CPU The transfer takes place over the processor bus, which has k

address lines and n data lines The bus also includes control lines: Read, Write, etc In a byte addressable computer, another or two control lines

are needed to indicate when only a byte rather than a full word of n bits is to be transferred

Basic Concept (cont.)

Up to 2k addressableMDR

MAR

k-bitaddress bus

n-bitdata bus

Control lines( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

Page 6: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

6

A useful measure of the speed of memory units is the time that elapses between the initiation of an operation and the completion of the operation

Memory Access Time: the time between the Read signal and the MFC (memory function complete) signal from the memory

Memory Cycle Time: minimum time delay between the initiation of successive memory operations

The cycle time is typically slightly longer than the access time

Main memories are usually implemented using RAMs (random access memory)

CPU can usually process instructions and data faster than they can be fetched from a reasonably priced memory unit

The memory cycle time is the bottleneck in the system

Memory Access Time

Page 7: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

7

Fast access time Larger size

Desirable Memory System

Page 8: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

8

Cache memory Memory interleaving:

Divides the memory into a number of memory modules

Arranges addressing such that successive words in the address space are placed in different modules, which lessen the cycle time constraints

If requests for memory access tend to involve consecutive addresses, then the accesses will be sent to different modules

Increases the average rate of fetching words from the main memory

Possible Speed and Size Enhancements

Page 9: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

9

Interleaved Memory Modules

4, c, 14, 1c

0, 8, 10, 18

A2 1 to 2 decoder

A3-A31

Assume: MAT is 10 ns and MCT is 12 ns If always consecutive words are fetched, What is the speed difference between a system without

interleaving and a system with two interleaved modules

Page 10: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

10

FF

circuitSense / Write

Addressdecoder

FF

CS

cellsMemory

circuitSense / Write Sense / Write

circuit

Data input /output lines:

A 0

A 1

A 2

A 3

W0

W1

W15

b7 b1 b0

WR /

b7 b1 b0

b7 b1 b0

•••

•••

•••

•••

•••

•••

•••

•••

•••

Internal Organization of Memory Chips

16x8bits16x8bits

Page 11: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

11

Memory cells are usually organized in the form of an array

Each cell of the array is capable of storing one bit of information

Each row of cells constitutes a word All cells of a row are connected to a common line: word

line A word line is driven by the address decoder on the chip Cells in each column are connected to a Sense/Write

circuit by two bit lines

Internal Organization of Memory Chips

Page 12: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

12

The sense/write circuits are connected to the data input/output lines of the chip

During a read operation, the sense/write circuits read the information stored in the cells selected by a word line and transmit the information to the output data lines

During a write, the sense/write circuits receive input information and store it into the cells of the selected word

The data input and data output of each sense/write circuit are connected to a single bidirectional data line in order to reduce the number of pins required

Control line R/W* specifies the direction of the transfer Control line CS selects a given chip in a multichip

memory system

Internal Organization of Memory Chips

Page 13: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

13

Consider a memory block that has 1K memory cells Its circuitry can be organized as 128x8 memory cells,

requiring a total of 19 external connections Or can be organized as 1Kx1 which uses 16 pins even if

separate pins are provided for the data input and data output lines

The required 10 bit address is divided into two groups 5 bits are needed to specify a row of 32 cells

all the cells are accessed in parallel The rest 5 bits for a cell in the row

Given the column address, only one of the cells is connected to the external data lines by the input and output multiplexers

Internal Organization of Memory Chips

Page 14: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

14

CS

Sense/ Writecircuitry

arraymemory cell

address5-bit row

input/outputData

5-bitdecoder

address5-bit column

address10-bit

output multiplexer 32-to-1

input demultiplexer

32 32

WR/

W0

W1

W31

and

Organization of a 1K x 1 Memory Chip

Page 15: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

15

Static Memories

consist of circuits that are capable of retaining their states as long as power is applied: SRAM

YX

Word line

Bit lines

b

T2T1

b

Page 16: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

16

Two inverters are cross-connected to two bit lines b and b*

These transistors (T1 and T2) act as switches that can be opened or closed under the control of the word line

When the word line is at the ground level, the transistors are off and the latch retains its state

Example: assuming the cell is in state “1”, if the logic value at point X is 1 and at point Y is 0, this state is maintained as long as the signal on the word line is at the ground level

SRAM Operation

Page 17: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

17

Read operation The word line is activated to close switches T1 and T2

If the cell is in state “1”, the signal on bit line b becomes high and the signal on bit line b* becomes low

The opposite is true if the cell is in state “0” Write operation

The state of the cell is set by placing the appropriate value on bit line b and its complement on b* and then activating the word line

This forces the cell into the corresponding state

SRAM Operation

Page 18: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

18

SRAMs: fast but expensive b/c each cell requires six transistors

Less expensive RAMs can be implemented if simpler cells are used

But, these cells do not retain their state indefinitely In dynamic memory, information is stored as a charge on

a capacitor DRAM is capable of storing information only for a few

milliseconds Its contents must be periodically refreshed by restoring

the capacitor charge to its full value

Dynamic Memories

Page 19: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

19

To store information in a cell, transistor T is turned on and the appropriate voltage is applied to the bit line

If the bit line is high, this causes a known amount of charge to be stored on the capacitor

After the transistor is turned off, the capacitor begins to discharge due to leakage

Information stored in the cell can be retrieved correctly only if it is read before the charge on the capacitor drops below some threshold value recognized as “1”

DRAM Operation

Page 20: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

20

During a read, the bit line is placed in a high impedance and the transistor is turned on

A sense circuit connected to the bit line determines whether the charge on the capacitor is above or below the threshold value

The read operation discharges the capacitor in the cell To retain the information stored in the cell, DRAM

include special circuitry, called Sense/Write circuit, that writes back the value read

Thus, a cell is refreshed every time it is read Actually, all cells connected to a given word line are

refreshed whenever this word line is activated

DRAM Operation

Page 21: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

21

Column

CSSense / Writecircuits

cell arraylatchaddressRow

Column

latch

decoderRow

decoderaddress

4096 512 8

R/ W

A20 9- A8 0-

D0D7

RA S

CA S

2M 8 dynamic memory chip

DRAM Example and Operation

40964096

40964096

Page 22: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

22

16 megabit DRAM organized as 2M x 8 The cells are organized in the form of 4K x 4K array such that

the high order 12 bits and low order 9 bits of the 21 bit address constitute the row and column addresses of a cell to reduce the number of pins needed for external connection,

The row and column addresses are multiplexed During a read or write operation, the row address is applied

first It is loaded into the row address latch in response to a pulse

on RAS signal

DRAM Example and Operation

Page 23: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

23

The read operation is initiated, in which all the cells on the selected row are read and refreshed

The column address is applied to the address pins and loaded into the column latch controlled by a pulse on CAS signal

The information in the column latch is decoded and the appropriate sense/write circuit is selected

If the R/W* indicates a read, the outputs of the selected circuit are transferred to the data pins (D0-D7)

If the R/W* indicates a write, the data at the data inputs are transferred to the selected circuits; this information is used to overwrite the contents of the selected cells in the corresponding column

DRAM Read/Write Operation

Page 24: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

24

To ensure that the contents of a DRAM are maintained, each row of cells must be accessed periodically

Typically once every 2 to 16 milliseconds A refresh circuitry can perform this function

automatically Some dynamic memory chips incorporate a refresh

facility with the chips themselves Because of their high density and low cost, dynamic

memories are widely used in the main memory units of computers

DRAM Refresh Operation

Page 25: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

25

Consider an application in which a number of memory locations stored in successive addresses are to be accessed

Assume that the cells are all on the same row It is only necessary to load the row address once Different column addresses can then be loaded during

successive memory cycles The rate at which such block transfers can be carried out is

typically double than for transfers involving random addresses This faster rate can be exploited in transferring a data block

from memory to cache

Enhancement

Column

CSSense / Writecircuits

cell arraylatchaddressRow

Columnlatch

decoderRow

decoderaddress

4096 512 8

R/ WA20 9- A8 0-

D0D7

RAS

CAS

Page 26: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

26

The choice of a RAM depends on several factors: speed, power dissipation, size

SRAMs are generally used when very fast operation is the primary requirement

DRAMs are the choice for implementing computer main memories make large memories economically feasible

Memory System Design Consideration

Page 27: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

27

Consider a small static memory consisting of 64K words of 8 bits each

16Kx1 chips are used The address bus is 16 bits

wide The high order 2 bits of the

address are decoded to obtain the four chip select control signals

The remaining 14 address bits are used to access specific locations inside each chip of the selected row

the R/W* inputs of all chips are tied together for a common R/W* control

Example Configuration Using 16K x 1

64K x 8bits64K x 8bits

Page 28: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

28

Consider a large dynamic memory with an organization similar to the previous figure

The control circuitry differs in three respects The row and column parts of the address for each chip

have to be multiplexed A refresh circuitry is needed The timing of various steps of a memory cycle must be

carefully controlled

Memory Configuration

Page 29: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

29

DRAM chips and the required control circuitry for 16M byte dynamic memory unit

DRAM chips are arranged in 4 X 8 array

The individual chips have a 1M x 4 organization The array has a total storage capacity of 4M words of 32

bits The memory unit is assumed to be connected to an

asynchronous memory bus that has 22 address lines (ADRS21-0) 32 data lines (DATA32-0)

Two handshake signals: memory request and MFC A Read/Write* line to indicate the type of memory cycle

requested

Memory Configuration Example

Page 30: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

30

DRAM Read Cycle

Page 31: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

31

The CPU activates the address, the Read/Write* and the memory request lines

The access control block recognizes the request when the memory request signal becomes active

It sets the start signal to 1 The timing control block responds by activating the

Busy signal to prevent the access control from accepting new requests before the cycle ends

The timing control loads the row and column addresses into the memory chip by activating RAS and CAS lines

DRAM Read Cycle

Page 32: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

32

During this time, it uses the Row/Column* line to select first row address ADRS19-10, followed by column address ADRS9-0

After obtaining the row and column parts of the address, the selected memory chips place the contents of the requested bit cells on their data outputs

The timing control block then activates MFC At the end of the memory cycle, the Busy signal is

deactivated and the access unit becomes free to accept new requests

Page 33: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

33

The refresh control block periodically generates refresh requests

The access control block indicates to the refresh control block that it may proceed with a refresh operation by activating the refresh grant line

The access control block arbitrates between memory access requests and refresh requests

If memory access requests and refresh requests arrive simultaneously, refresh requests are given priority to ensure that stored information is not lost

When the refresh control block receives the refresh grant signal, it activates the refresh line

Page 34: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

34

This causes the address multiplexer to select the refresh counter as the source of the row address

The contents of the counter are loaded into the row latches of all memory chips when the RAS is activated

During this time, the R/W* may indicate a write We must ensure that this does not cause new

information to be loaded into any cells during the refresh

The decoder block can deactivate all CS lines to prevent memory chips from responding to R/W*

Page 35: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

35

The rest of the refresh cycle is the same as a normal read cycle

At the end of the refresh, the refresh control block increments the refresh counter in preparation for the next refresh cycle

The CPU and the refresh circuit compete for access to memory

The refresh circuit must be given priority The response of the memory to a request from the CPU

or DMA may be delayed if refresh is in progress During a refresh operation, all memory rows may be

refreshed in succession before the memory is returned to normal use (burst refresh mode)

An alternative interleaves refresh operations on successive rows with accesses from memory bus

DRAM Refresh Cycle

Page 36: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

36

Consider a memory array with 1M x 1 chips Each chip contains a cell array organized as 1024 x

1024 x 1 There are 1024 rows with 1024 bits per row Assuming it takes 130 ns to refresh one row Each row must be refreshed once every 16ms Time needed to refresh all rows in the chip = 0.133 ms All cells are refreshed simultaneously in a burst mode Less than 1% of the memory cycles is used for refresh

operations

Refresh Overhead

Page 37: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

37

There is an apparent increase in the access time of the memory when a request arrives while a refresh operation is in progress

The variability in access time is easily accommodated with an asynchronous bus

In the case of a synchronous bus, it may be possible to hide a refresh cycle within the early part of the bus cycle (if sufficient time remains after the refresh to carry out a read or a write)

Alternatively, the refresh circuit may request bus cycles in the same manner as any device with DMA capability

Refresh Overhead

Page 38: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

38

Logic value 0 is stored in the cell if the transistor is connected to ground at point P; otherwise, a 1 is stored

To read the state of the cell, the word line is activated The transistor switch is closed and the voltage on the bit

line drops to near zero if there is a connection between the transistor and ground

If there is no connection to ground, the bit line remains at the high voltage, indicating a 1

A sense circuit at the end of the bit line generates the proper output value

Data are written into a ROM when it is manufactured

Possible configuration of a ROM cell

Read Only Memories

Page 39: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

39

Some ROMs allow data to be loaded by the user Programmability is achieved by inserting a fuse at point

P Before it is programmed, the memory contains all 0s The user can insert 1s at specific locations by burning

the fuses with high current pulses PROMs provide flexibility and convenience ROMs are economically attractive for high volumes The cost of preparing the mask for storing specific

information into a ROM makes them expensive when only a small number are needed

PROM (Programmable ROM)

Page 40: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

40

EPROMs (Erasable reProgrammable ROM) Allow stored data to be erased and new data to be loaded Provide flexibility during the development phase of a digital system Erasure requires dissipating the charges trapped in the transistors This can be done by exposing chip to ultra violet light A disadvantage of EPROMs is that a chip must be physically

removed from the circuit for reprogramming

EEPROMS (Electrically Erasable and reProgrammable ROM) An alternative to EPROMs, they can be programmed and erased

electrically Cells in EEPROMs can be erased selectively Disadvantage of EEPROMs: different voltages are needed for

erasing, writing and reading stored data Flash ROM: higher density, but erasable only block by block

EPROM, EEPROM, Flash ROM

Page 41: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

41

Very fast memory can be achieved by using SRAM chips SRAMs are more expensive b/c their basic cell uses 6

transistors It is impractical (cost) to build a large memory using

SRAM chips SRAMs can be used for cache memories Alternative: DRAM chips DRAMs are less expensive but significantly slower A large, affordable, main memory can be built with

DRAMs Very large disks are available at a reasonable price for

secondary storage A huge amount of storage can be provided by magnetic

disks

Speed, Size and Cost

Page 42: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

42

DDR Memory

Double-Data-Rate Synchronous Dynamic Random Access Memory (SDRAM)

184 pins 64-bit data width Transferring data on both the rising and falling edges of

the clock signal (double pumped). This effectively nearly doubles the transfer rate without

increasing the frequency of the front side bus. Thus a 100 MHz DDR system has an effective transfer

rate of 200 MHz With data being transferred 64 bits at a time, DDR RAM

gives a transfer rate of (memory bus clock rate) × 2 (for dual rate) × 64 (number of bits transferred) / 8 (number of bits/byte). Thus with a bus frequency of 100 MHz, DDR-SDRAM gives a max transfer rate of 1600 MB/s (or 1.GB/s).

Page 43: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

43

DDR Memory

PC-1600: DDR-SDRAM memory module specified to operate at 100 MHz using DDR-200 chips, 1.600 GByte/s bandwidth

PC-2100: DDR-SDRAM memory module specified to operate at 133 MHz using DDR-266 chips, 2.133 GByte/s bandwidth

PC-2700: DDR-SDRAM memory module specified to operate at 166 MHz using DDR-333 chips, 2.667 GByte/s bandwidth

PC-3200: DDR-SDRAM memory module specified to operate at 200 MHz using DDR-400 chips, 3.200 GByte/s bandwidth

PC-xxxx denotes theoretical bandwidth, whereas DDR-xxx denotes effective clock speed

Page 44: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

44

DDR2 Memory

Double-Data-Rate Two Synchronous Dynamic Random Access Memory (DDR2 SDRAM)

240 pins, 64-bit data width Transfer data both on the rising and falling edge of the

clock Electrical interface improvements, on-die termination,

prefetch buffers and off-chip drivers has further boosted the clock frequency.

The key difference between DDR and DDR2 is that in DDR2 the bus is clocked at twice the speed of the memory cells, allowing transfers from two different cells to occur in the same memory cell cycle. Thus, without speeding up the memory cells themselves, DDR2 can effectively operate at twice the bus speed of DDR.

Note: the latency of each cell may be the same as DDR

Page 45: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

45

DDR2 Memory DDR2's bus frequency is boosted by electrical interface

improvements, on-die termination, prefetch buffers and off-chip drivers

However, latency is greatly increased as a trade-off because the cells take twice as long (in terms of bus cycles) to produce a result, and additional buffering adds yet more delay. While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, DDR2 may have read latencies between 3 and 9 cycles.

Module name Bus clock Chip type Peak transfer rate

PC2-3200 200 MHz DDR2-400 3.200 GB/s

PC2-4200 266 MHz DDR2-533 4.267 GB/s

PC2-5300 333 MHz DDR2-667 5.333 GB/s

PC2-6400 400 MHz DDR2-800 6.400 GB/s

Page 46: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

46

Dual Channel

Dual-channel architecture DDR/DDR2 SDRAM describes a motherboard technology that effectively doubles data throughput from RAM to the memory controller.

Dual channel-enabled memory controllers utilize two 64-bit data channels, resulting in a total bandwidth of 128-bits, to move data from RAM to the CPU

Page 47: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

47

DDR3 Memory

Double-Data-Rate 3 Synchronous Dynamic Random Access Memory (DDR3 SDRAM)

The memory comes with a promise of a power consumption reduction of 40% compared to current commercial DDR2 modules, due to DDR3's 90nm fabrication technology, allowing for lower operating currents and voltages (1.5 V, compared to DDR2's 1.8 V or DDR's 2.5 V).

"Dual-gate" transistors will be used to reduce leakage of current.

PC3-6400: DDR3-SDRAM specified to run at 400 MHz using DDR3-800 chips, 6.40 GB/s bandwidth

PC3-8500: DDR3-SDRAM run at 533 MHz using DDR3-1066 chips, 8.53 GB/s bandwidth

PC3-10600: DDR3-SDRAM run at 667 MHz using DDR3-1333 chips, 10.67 GB/s bandwidth

Page 48: Chapter 5-1 Memory System Memory System Next Lecture   Cache Memory.

48

Prefetch Buffer

The prefetch buffer is a memory cache located on modern RAM modules which stores data before it is actually needed.

The width (or burst length) of the prefetch buffer is increased with each successive standard of modern DDR SDRAM modules

DDR SDRAM's prefetch buffer width is 2-bit. DDR2 SDRAM's prefetch buffer width is 4-bit. DDR3 SDRAM's prefetch buffer width is 8-bit.