Chapter 5-1 Memory System Memory System Next Lecture Cache Memory.
-
Upload
hugo-osborne -
Category
Documents
-
view
233 -
download
1
Transcript of Chapter 5-1 Memory System Memory System Next Lecture Cache Memory.
Chapter 5-1Memory System
Memory System Next Lecture
Cache Memory
2
The maximum size of the memory that can be used is determined by the addressing scheme
Example: a 32 bit computer that generates 32-bit addresses is capable of addressing 232 = 4G memory locations in bytes
Basic Concept
Up to 2k addressableMDR
MAR
k-bitaddress bus
n-bitdata bus
Control lines
( , MFC, etc.)
Processor Memory
locations
Word length = n bits
WR /
3
Main memory is used to store and retrieve data in word-length
Definition of word length: the number of bits stored and retrieved in one memory access
Many current CPUs may be able to retrieve 64 bits (Pentium 4) or even 128 bits at one clock to fill their internal cache
Most modern computers are byte addressable Example: a possible address assignment for a byte
addressable 32-bit computer using the big-endian arrangement
Basic Concept (cont.)
4
When a 32bit address is sent from CPU to memory The high-order 30 bits determine which word will be
accessed The lower-order 2 bits specify which byte In a read operation, other bytes may be fetched If the operation is a byte write, the control circuitry of the
memory (called memory controller) must ensure that the contents of other bytes of the same word are not changed
Basic Concept (cont.)
3 7 b f
2 6 a e
1 5 9 d
0 4 8 c
A0A1
2 to 4 decoder
A2-A31
AddressAddress
88 88 88 88Data (D0-D31)Data (D0-D31)
3232SelSel
5
If MAR is k bits long and MDR is n bits long, then the memory may contain 2k addressable locations during a memory cycle, n bits of data are transferred
between the memory and the CPU The transfer takes place over the processor bus, which has k
address lines and n data lines The bus also includes control lines: Read, Write, etc In a byte addressable computer, another or two control lines
are needed to indicate when only a byte rather than a full word of n bits is to be transferred
Basic Concept (cont.)
Up to 2k addressableMDR
MAR
k-bitaddress bus
n-bitdata bus
Control lines( , MFC, etc.)
Processor Memory
locations
Word length = n bits
WR /
6
A useful measure of the speed of memory units is the time that elapses between the initiation of an operation and the completion of the operation
Memory Access Time: the time between the Read signal and the MFC (memory function complete) signal from the memory
Memory Cycle Time: minimum time delay between the initiation of successive memory operations
The cycle time is typically slightly longer than the access time
Main memories are usually implemented using RAMs (random access memory)
CPU can usually process instructions and data faster than they can be fetched from a reasonably priced memory unit
The memory cycle time is the bottleneck in the system
Memory Access Time
7
Fast access time Larger size
Desirable Memory System
8
Cache memory Memory interleaving:
Divides the memory into a number of memory modules
Arranges addressing such that successive words in the address space are placed in different modules, which lessen the cycle time constraints
If requests for memory access tend to involve consecutive addresses, then the accesses will be sent to different modules
Increases the average rate of fetching words from the main memory
Possible Speed and Size Enhancements
9
Interleaved Memory Modules
4, c, 14, 1c
0, 8, 10, 18
A2 1 to 2 decoder
A3-A31
Assume: MAT is 10 ns and MCT is 12 ns If always consecutive words are fetched, What is the speed difference between a system without
interleaving and a system with two interleaved modules
10
FF
circuitSense / Write
Addressdecoder
FF
CS
cellsMemory
circuitSense / Write Sense / Write
circuit
Data input /output lines:
A 0
A 1
A 2
A 3
W0
W1
W15
b7 b1 b0
WR /
b7 b1 b0
b7 b1 b0
•••
•••
•••
•••
•••
•••
•••
•••
•••
Internal Organization of Memory Chips
16x8bits16x8bits
11
Memory cells are usually organized in the form of an array
Each cell of the array is capable of storing one bit of information
Each row of cells constitutes a word All cells of a row are connected to a common line: word
line A word line is driven by the address decoder on the chip Cells in each column are connected to a Sense/Write
circuit by two bit lines
Internal Organization of Memory Chips
12
The sense/write circuits are connected to the data input/output lines of the chip
During a read operation, the sense/write circuits read the information stored in the cells selected by a word line and transmit the information to the output data lines
During a write, the sense/write circuits receive input information and store it into the cells of the selected word
The data input and data output of each sense/write circuit are connected to a single bidirectional data line in order to reduce the number of pins required
Control line R/W* specifies the direction of the transfer Control line CS selects a given chip in a multichip
memory system
Internal Organization of Memory Chips
13
Consider a memory block that has 1K memory cells Its circuitry can be organized as 128x8 memory cells,
requiring a total of 19 external connections Or can be organized as 1Kx1 which uses 16 pins even if
separate pins are provided for the data input and data output lines
The required 10 bit address is divided into two groups 5 bits are needed to specify a row of 32 cells
all the cells are accessed in parallel The rest 5 bits for a cell in the row
Given the column address, only one of the cells is connected to the external data lines by the input and output multiplexers
Internal Organization of Memory Chips
14
CS
Sense/ Writecircuitry
arraymemory cell
address5-bit row
input/outputData
5-bitdecoder
address5-bit column
address10-bit
output multiplexer 32-to-1
input demultiplexer
32 32
WR/
W0
W1
W31
and
Organization of a 1K x 1 Memory Chip
15
Static Memories
consist of circuits that are capable of retaining their states as long as power is applied: SRAM
YX
Word line
Bit lines
b
T2T1
b
16
Two inverters are cross-connected to two bit lines b and b*
These transistors (T1 and T2) act as switches that can be opened or closed under the control of the word line
When the word line is at the ground level, the transistors are off and the latch retains its state
Example: assuming the cell is in state “1”, if the logic value at point X is 1 and at point Y is 0, this state is maintained as long as the signal on the word line is at the ground level
SRAM Operation
17
Read operation The word line is activated to close switches T1 and T2
If the cell is in state “1”, the signal on bit line b becomes high and the signal on bit line b* becomes low
The opposite is true if the cell is in state “0” Write operation
The state of the cell is set by placing the appropriate value on bit line b and its complement on b* and then activating the word line
This forces the cell into the corresponding state
SRAM Operation
18
SRAMs: fast but expensive b/c each cell requires six transistors
Less expensive RAMs can be implemented if simpler cells are used
But, these cells do not retain their state indefinitely In dynamic memory, information is stored as a charge on
a capacitor DRAM is capable of storing information only for a few
milliseconds Its contents must be periodically refreshed by restoring
the capacitor charge to its full value
Dynamic Memories
19
To store information in a cell, transistor T is turned on and the appropriate voltage is applied to the bit line
If the bit line is high, this causes a known amount of charge to be stored on the capacitor
After the transistor is turned off, the capacitor begins to discharge due to leakage
Information stored in the cell can be retrieved correctly only if it is read before the charge on the capacitor drops below some threshold value recognized as “1”
DRAM Operation
20
During a read, the bit line is placed in a high impedance and the transistor is turned on
A sense circuit connected to the bit line determines whether the charge on the capacitor is above or below the threshold value
The read operation discharges the capacitor in the cell To retain the information stored in the cell, DRAM
include special circuitry, called Sense/Write circuit, that writes back the value read
Thus, a cell is refreshed every time it is read Actually, all cells connected to a given word line are
refreshed whenever this word line is activated
DRAM Operation
21
Column
CSSense / Writecircuits
cell arraylatchaddressRow
Column
latch
decoderRow
decoderaddress
4096 512 8
R/ W
A20 9- A8 0-
D0D7
RA S
CA S
2M 8 dynamic memory chip
DRAM Example and Operation
40964096
40964096
22
16 megabit DRAM organized as 2M x 8 The cells are organized in the form of 4K x 4K array such that
the high order 12 bits and low order 9 bits of the 21 bit address constitute the row and column addresses of a cell to reduce the number of pins needed for external connection,
The row and column addresses are multiplexed During a read or write operation, the row address is applied
first It is loaded into the row address latch in response to a pulse
on RAS signal
DRAM Example and Operation
23
The read operation is initiated, in which all the cells on the selected row are read and refreshed
The column address is applied to the address pins and loaded into the column latch controlled by a pulse on CAS signal
The information in the column latch is decoded and the appropriate sense/write circuit is selected
If the R/W* indicates a read, the outputs of the selected circuit are transferred to the data pins (D0-D7)
If the R/W* indicates a write, the data at the data inputs are transferred to the selected circuits; this information is used to overwrite the contents of the selected cells in the corresponding column
DRAM Read/Write Operation
24
To ensure that the contents of a DRAM are maintained, each row of cells must be accessed periodically
Typically once every 2 to 16 milliseconds A refresh circuitry can perform this function
automatically Some dynamic memory chips incorporate a refresh
facility with the chips themselves Because of their high density and low cost, dynamic
memories are widely used in the main memory units of computers
DRAM Refresh Operation
25
Consider an application in which a number of memory locations stored in successive addresses are to be accessed
Assume that the cells are all on the same row It is only necessary to load the row address once Different column addresses can then be loaded during
successive memory cycles The rate at which such block transfers can be carried out is
typically double than for transfers involving random addresses This faster rate can be exploited in transferring a data block
from memory to cache
Enhancement
Column
CSSense / Writecircuits
cell arraylatchaddressRow
Columnlatch
decoderRow
decoderaddress
4096 512 8
R/ WA20 9- A8 0-
D0D7
RAS
CAS
26
The choice of a RAM depends on several factors: speed, power dissipation, size
SRAMs are generally used when very fast operation is the primary requirement
DRAMs are the choice for implementing computer main memories make large memories economically feasible
Memory System Design Consideration
27
Consider a small static memory consisting of 64K words of 8 bits each
16Kx1 chips are used The address bus is 16 bits
wide The high order 2 bits of the
address are decoded to obtain the four chip select control signals
The remaining 14 address bits are used to access specific locations inside each chip of the selected row
the R/W* inputs of all chips are tied together for a common R/W* control
Example Configuration Using 16K x 1
64K x 8bits64K x 8bits
28
Consider a large dynamic memory with an organization similar to the previous figure
The control circuitry differs in three respects The row and column parts of the address for each chip
have to be multiplexed A refresh circuitry is needed The timing of various steps of a memory cycle must be
carefully controlled
Memory Configuration
29
DRAM chips and the required control circuitry for 16M byte dynamic memory unit
DRAM chips are arranged in 4 X 8 array
The individual chips have a 1M x 4 organization The array has a total storage capacity of 4M words of 32
bits The memory unit is assumed to be connected to an
asynchronous memory bus that has 22 address lines (ADRS21-0) 32 data lines (DATA32-0)
Two handshake signals: memory request and MFC A Read/Write* line to indicate the type of memory cycle
requested
Memory Configuration Example
30
DRAM Read Cycle
31
The CPU activates the address, the Read/Write* and the memory request lines
The access control block recognizes the request when the memory request signal becomes active
It sets the start signal to 1 The timing control block responds by activating the
Busy signal to prevent the access control from accepting new requests before the cycle ends
The timing control loads the row and column addresses into the memory chip by activating RAS and CAS lines
DRAM Read Cycle
32
During this time, it uses the Row/Column* line to select first row address ADRS19-10, followed by column address ADRS9-0
After obtaining the row and column parts of the address, the selected memory chips place the contents of the requested bit cells on their data outputs
The timing control block then activates MFC At the end of the memory cycle, the Busy signal is
deactivated and the access unit becomes free to accept new requests
33
The refresh control block periodically generates refresh requests
The access control block indicates to the refresh control block that it may proceed with a refresh operation by activating the refresh grant line
The access control block arbitrates between memory access requests and refresh requests
If memory access requests and refresh requests arrive simultaneously, refresh requests are given priority to ensure that stored information is not lost
When the refresh control block receives the refresh grant signal, it activates the refresh line
34
This causes the address multiplexer to select the refresh counter as the source of the row address
The contents of the counter are loaded into the row latches of all memory chips when the RAS is activated
During this time, the R/W* may indicate a write We must ensure that this does not cause new
information to be loaded into any cells during the refresh
The decoder block can deactivate all CS lines to prevent memory chips from responding to R/W*
35
The rest of the refresh cycle is the same as a normal read cycle
At the end of the refresh, the refresh control block increments the refresh counter in preparation for the next refresh cycle
The CPU and the refresh circuit compete for access to memory
The refresh circuit must be given priority The response of the memory to a request from the CPU
or DMA may be delayed if refresh is in progress During a refresh operation, all memory rows may be
refreshed in succession before the memory is returned to normal use (burst refresh mode)
An alternative interleaves refresh operations on successive rows with accesses from memory bus
DRAM Refresh Cycle
36
Consider a memory array with 1M x 1 chips Each chip contains a cell array organized as 1024 x
1024 x 1 There are 1024 rows with 1024 bits per row Assuming it takes 130 ns to refresh one row Each row must be refreshed once every 16ms Time needed to refresh all rows in the chip = 0.133 ms All cells are refreshed simultaneously in a burst mode Less than 1% of the memory cycles is used for refresh
operations
Refresh Overhead
37
There is an apparent increase in the access time of the memory when a request arrives while a refresh operation is in progress
The variability in access time is easily accommodated with an asynchronous bus
In the case of a synchronous bus, it may be possible to hide a refresh cycle within the early part of the bus cycle (if sufficient time remains after the refresh to carry out a read or a write)
Alternatively, the refresh circuit may request bus cycles in the same manner as any device with DMA capability
Refresh Overhead
38
Logic value 0 is stored in the cell if the transistor is connected to ground at point P; otherwise, a 1 is stored
To read the state of the cell, the word line is activated The transistor switch is closed and the voltage on the bit
line drops to near zero if there is a connection between the transistor and ground
If there is no connection to ground, the bit line remains at the high voltage, indicating a 1
A sense circuit at the end of the bit line generates the proper output value
Data are written into a ROM when it is manufactured
Possible configuration of a ROM cell
Read Only Memories
39
Some ROMs allow data to be loaded by the user Programmability is achieved by inserting a fuse at point
P Before it is programmed, the memory contains all 0s The user can insert 1s at specific locations by burning
the fuses with high current pulses PROMs provide flexibility and convenience ROMs are economically attractive for high volumes The cost of preparing the mask for storing specific
information into a ROM makes them expensive when only a small number are needed
PROM (Programmable ROM)
40
EPROMs (Erasable reProgrammable ROM) Allow stored data to be erased and new data to be loaded Provide flexibility during the development phase of a digital system Erasure requires dissipating the charges trapped in the transistors This can be done by exposing chip to ultra violet light A disadvantage of EPROMs is that a chip must be physically
removed from the circuit for reprogramming
EEPROMS (Electrically Erasable and reProgrammable ROM) An alternative to EPROMs, they can be programmed and erased
electrically Cells in EEPROMs can be erased selectively Disadvantage of EEPROMs: different voltages are needed for
erasing, writing and reading stored data Flash ROM: higher density, but erasable only block by block
EPROM, EEPROM, Flash ROM
41
Very fast memory can be achieved by using SRAM chips SRAMs are more expensive b/c their basic cell uses 6
transistors It is impractical (cost) to build a large memory using
SRAM chips SRAMs can be used for cache memories Alternative: DRAM chips DRAMs are less expensive but significantly slower A large, affordable, main memory can be built with
DRAMs Very large disks are available at a reasonable price for
secondary storage A huge amount of storage can be provided by magnetic
disks
Speed, Size and Cost
42
DDR Memory
Double-Data-Rate Synchronous Dynamic Random Access Memory (SDRAM)
184 pins 64-bit data width Transferring data on both the rising and falling edges of
the clock signal (double pumped). This effectively nearly doubles the transfer rate without
increasing the frequency of the front side bus. Thus a 100 MHz DDR system has an effective transfer
rate of 200 MHz With data being transferred 64 bits at a time, DDR RAM
gives a transfer rate of (memory bus clock rate) × 2 (for dual rate) × 64 (number of bits transferred) / 8 (number of bits/byte). Thus with a bus frequency of 100 MHz, DDR-SDRAM gives a max transfer rate of 1600 MB/s (or 1.GB/s).
43
DDR Memory
PC-1600: DDR-SDRAM memory module specified to operate at 100 MHz using DDR-200 chips, 1.600 GByte/s bandwidth
PC-2100: DDR-SDRAM memory module specified to operate at 133 MHz using DDR-266 chips, 2.133 GByte/s bandwidth
PC-2700: DDR-SDRAM memory module specified to operate at 166 MHz using DDR-333 chips, 2.667 GByte/s bandwidth
PC-3200: DDR-SDRAM memory module specified to operate at 200 MHz using DDR-400 chips, 3.200 GByte/s bandwidth
PC-xxxx denotes theoretical bandwidth, whereas DDR-xxx denotes effective clock speed
44
DDR2 Memory
Double-Data-Rate Two Synchronous Dynamic Random Access Memory (DDR2 SDRAM)
240 pins, 64-bit data width Transfer data both on the rising and falling edge of the
clock Electrical interface improvements, on-die termination,
prefetch buffers and off-chip drivers has further boosted the clock frequency.
The key difference between DDR and DDR2 is that in DDR2 the bus is clocked at twice the speed of the memory cells, allowing transfers from two different cells to occur in the same memory cell cycle. Thus, without speeding up the memory cells themselves, DDR2 can effectively operate at twice the bus speed of DDR.
Note: the latency of each cell may be the same as DDR
45
DDR2 Memory DDR2's bus frequency is boosted by electrical interface
improvements, on-die termination, prefetch buffers and off-chip drivers
However, latency is greatly increased as a trade-off because the cells take twice as long (in terms of bus cycles) to produce a result, and additional buffering adds yet more delay. While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, DDR2 may have read latencies between 3 and 9 cycles.
Module name Bus clock Chip type Peak transfer rate
PC2-3200 200 MHz DDR2-400 3.200 GB/s
PC2-4200 266 MHz DDR2-533 4.267 GB/s
PC2-5300 333 MHz DDR2-667 5.333 GB/s
PC2-6400 400 MHz DDR2-800 6.400 GB/s
46
Dual Channel
Dual-channel architecture DDR/DDR2 SDRAM describes a motherboard technology that effectively doubles data throughput from RAM to the memory controller.
Dual channel-enabled memory controllers utilize two 64-bit data channels, resulting in a total bandwidth of 128-bits, to move data from RAM to the CPU
47
DDR3 Memory
Double-Data-Rate 3 Synchronous Dynamic Random Access Memory (DDR3 SDRAM)
The memory comes with a promise of a power consumption reduction of 40% compared to current commercial DDR2 modules, due to DDR3's 90nm fabrication technology, allowing for lower operating currents and voltages (1.5 V, compared to DDR2's 1.8 V or DDR's 2.5 V).
"Dual-gate" transistors will be used to reduce leakage of current.
PC3-6400: DDR3-SDRAM specified to run at 400 MHz using DDR3-800 chips, 6.40 GB/s bandwidth
PC3-8500: DDR3-SDRAM run at 533 MHz using DDR3-1066 chips, 8.53 GB/s bandwidth
PC3-10600: DDR3-SDRAM run at 667 MHz using DDR3-1333 chips, 10.67 GB/s bandwidth
48
Prefetch Buffer
The prefetch buffer is a memory cache located on modern RAM modules which stores data before it is actually needed.
The width (or burst length) of the prefetch buffer is increased with each successive standard of modern DDR SDRAM modules
DDR SDRAM's prefetch buffer width is 2-bit. DDR2 SDRAM's prefetch buffer width is 4-bit. DDR3 SDRAM's prefetch buffer width is 8-bit.