Computer Organization
description
Transcript of Computer Organization
Computer Organization
Department of CSE, SSE Mukka
The Memory System
www.bookspar.com | Website for students | VTU NOTES
Chapter Objectives Basic memory circuits Organization of the main memory Cache memory concept –
Shortens the effective memory access time Virtual memory mechanism –
Increases the apparent size of the main memory Secondary storage
Magnetic disks Optical disks Magnetic tapes
www.bookspar.com | Website for students | VTU NOTES
Basic Memory Concepts The maximum size of the Main Memory (MM)
that can be used in any computer is determined by its addressing scheme.
For eg., 16 – bit computer that generates 16-bit
addresses is capable of addressing up to ? 32 – bit computer with 32-bit address can
address _____ memory locations 40 – bit computer can address _______
memory locations
www.bookspar.com | Website for students | VTU NOTES
Word addressability and byte-addressability If the smallest addressable unit of information is a
memory word, the machine is called word-addressable.
If individual memory bytes are assigned distinct addresses, the computer is called byte-addressable.
Most of the commercial machines are byte-addressable. For example in a byte-addressable 32-bit computer, each
memory word contains 4 bytes. A possible word-address assignment would be:
Word Address Byte Address 0 0 1 2 3 4 4 5 6 7 8 8 9 10 11
www.bookspar.com | Website for students | VTU NOTES
Basic Memory Concepts Word length of a computer is the number of bits
actually stored or retrieved in one memory access For eg., a byte addressable 32-bit computer, whose
instructions generate 32-bit addresses High order 30 bit to determine which word in memory Low order 2 bits to determine which byte in that word
Suppose we want to fetch only one byte from a word. In case of Read operation, other bytes are discarded
by processor In case of Write operation, care should be taken not to
overwrite other bytes
www.bookspar.com | Website for students | VTU NOTES
Basic Memory concepts Data transfer between memory and the processor
takes place through the use of 2 processor registers MAR – Memory address register MDR – Memory Data Register
If MAR is k bits long and MDR n bits long Memory unit may contain up to 2k addressable
locations During a memory cycle, n bits of data are transferred
between memory and the processor No of address lines and data lines in processor? There are additional control lines read/write, MFC,
no of bytes to be transferred etc www.bookspar.com | Website for
students | VTU NOTES
Up to 2 k addressableMDR
MAR
Figure 5.1. Connection of the memory to the processor.
k-bitaddress bus
n-bitdata bus
Control lines( , MFC, etc.)
Processor Memory
locations
Word length = n bits
WR /
www.bookspar.com | Website for students | VTU NOTES
How processor reads data from the memory ? Loads the address of the required memory
location into MAR Sets R/W line to 1 The memory responds by placing the
requested data on data lines Confirms this action by asserting MFC signal
Upon receipt of MFC signal, processor loads the data on the data lines in to the MDR register
www.bookspar.com | Website for students | VTU NOTES
How processor Writes Data into memory? Loads the address of the location into MAR Loads the data into MDR Indicates Write operation by setting R/W line
to 0
www.bookspar.com | Website for students | VTU NOTES
Some concepts Memory Access Times: - It is a useful measure of the speed of the memory
unit. It is the time that elapses between the initiation of an operation and the completion of that operation (for example, the time between READ and MFC).
Memory Cycle Time :- It is an important measure of the memory
system. It is the minimum time delay required between the initiations of two successive memory operations (for example, the time between two successive READ operations). The cycle time is usually slightly longer than the access time.
www.bookspar.com | Website for
students | VTU NOTES
Random Access Memory (RAM) A memory unit is called a Random Access
Memory if any location can be accessed for a READ or WRITE
operation in some fixed amount of time that is independent of the location’s address.
Main memory units are of this type. This distinguishes them from serial or partly
serial access storage devices such as magnetic tapes and disks which are used as the secondary storage device.
www.bookspar.com | Website for students | VTU NOTES
Cache Memory The CPU processes instructions and data faster than
they can be fetched from compatibly priced main memory unit. Memory cycle time becomes the bottleneck in the system.
One way to reduce the memory access time is to use cache memory. Its a small and fast memory that is inserted between the
larger, slower main memory and the CPU. Holds the currently active segments of a program and its
data. Because of the locality of address references,
CPU finds the relevant information mostly in the cache memory itself (cache hit)
infrequently needs access to the main memory (cache miss) With suitable size of the cache memory, cache hit rates
of over 90% are possiblewww.bookspar.com | Website for
students | VTU NOTES
Memory Interleaving This technique divides the memory system
into a number of memory modules
Arranges addressing so that successive words in the address space are placed in different modules. When requests for memory access involve
consecutive addresses, the access will be to different modules.
Since parallel access to these modules is possible, the average rate of fetching words from the Main Memory can be increased
www.bookspar.com | Website for students | VTU NOTES
Virtual Memory In a virtual memory System, the addresses generated
by the program may be different from the actual physical address the address generated by the CPU is referred to as a virtual
or logical address. The required mapping between physical memory and
logical address space is implemented by a special memory control unit, called the memory management unit.
The mapping function may be changed during program execution according to system requirements.
The logical (virtual) address space can be as large as the addressing capability of the CPU
The physical address space the actual physical memory can be much smaller.
www.bookspar.com | Website for students | VTU NOTES
Virtual memory Only the active portion of the virtual address
space is mapped onto the physical memory the rest of the virtual address space is mapped onto
the bulk storage device like magnetic disks( hard disks) If the addressed information is in the Main
Memory (MM), it is accessed and execution proceeds.
Otherwise, an exception is generated, in response to which
the memory management unit transfers a contiguous block of words containing the desired word from the bulk storage unit to the MM,
displacing some block that is currently inactive.
www.bookspar.com | Website for students | VTU NOTES
FF
Figure 5.2. Organization of bit cells in a memory chip.
circuitSense / Write
Addressdecoder
FF
CS
cellsMemory
circuitSense / Write Sense / Write
circuit
Data input/output lines:
A0
A1
A2
A3
W0
W1
W15
b7 b1 b0
WR /
b7 b1 b0
b7 b1 b0
•••
•••
•••
•••
•••
•••
•••
•••
•••
www.bookspar.com | Website for students | VTU NOTES
An example of memory organization A memory chip consisting of 16 words of 8 bits each,
which is usually referred to as a 16 x 8 organization. The data input and the data output of each
Sense/Write circuit are connected to a single bi-directional data line in order to reduce the number of pins required.
One control line, the R/W (Read/Write) input is used a specify the required operation and
another control line, the CS (Chip Select) input is used to select a given chip in a multichip memory system.
This circuit requires 14 external connections, and allowing 2 pins for power supply and ground connections, can be manufactured in the form of a 16-pin chip.
It can store 16 x 8 = 128 bits.
www.bookspar.com | Website for students | VTU NOTES
Figure 5.3. Organization of a 1K 1 memory chip.
CS
Sense/Writecircuitry
arraymemory cell
address5-bit row
input/outputData
5-bitdecoder
address5-bit column
address10-bit
output multiplexer 32-to-1
input demultiplexer
32 32
WR /
W0
W1
W31
and
www.bookspar.com | Website for students | VTU NOTES
1K X 1 memory chip The 10-bit address is divided into two groups
of 5 bits each to form the row and column addresses for the cell array.
A row address selects a row of 32 cells, all of which are accessed in parallel.
One of these, selected by the column address, is connected to the external data lines by the input and output multiplexers.
This structure can store 1024 bits, can be implemented in a 16-pin chip.
www.bookspar.com | Website for students | VTU NOTES
Static memories Memories that consist of circuits capable of
retaining their state as long as power is applied – static memories
Static rams can be accessed very quickly – few nanosecs
www.bookspar.com | Website for students | VTU NOTES
YX
Word line
Bit lines
Figure 5.4. A static RAM cell.
b
T2T1
b
Two inverters
Bit line
2 transistors T1 and T2
When word line is at ground level transistors are turned off, and latch retains its state
www.bookspar.com | Website for students | VTU NOTES
Read and Write operation in SRAM Read
Word line is activated – to close switches T1 and T2 If cell is in state 1, the signal on bit line b is high
and signal on bit line b’ is low Opposite holds if cell is in state 0 Sense/Write circuits at the end of the bit lines
monitor the states of b and b’ and sets output Write
State of cell is set by placing appropriate value on bit line ba dn b’ and then word line is activated
This forces the cell into corresponding state Required signals on bit lines are generated by
Sense/Write circuit
www.bookspar.com | Website for students | VTU NOTES
Word line
b
Bit lines
Figure 5.5. An example of a CMOS memory cell.
T1 T2
T6T5
T4T3
YX
Vsupplyb
www.bookspar.com | Website for students | VTU NOTES
Dynamic RAMs Static RAMs are fast but come at a higher cost
Their cells require several transistors Less expensive RAMs using less no of
transistors, But their cells cannot retain their state indefinitely Called as Dynamic RAMs
Information stored in the form of charge on a capacitor This charge can be maintained only for tens of
milliseconds Contents must be periodically refreshed
www.bookspar.com | Website for students | VTU NOTES
Figure 5.6. A single-transistor dynamic memory cell
TC
Word line
Bit line
Dynamic RAM – needs to refreshed periodically to hold data
www.bookspar.com | Website for students | VTU NOTES
A 16-Mbit DRAM chip, configured as 2M X 8 The cells are organized as 4K X 4K array The 4096 cells in each row are divided into
512 groups of 8 A row can hence store 512 bytes of data 12 Address bits are required to select a row 9 bits needed to specify a group of 8 bits in the
selected row
www.bookspar.com | Website for students | VTU NOTES
Column
CSSense / Writecircuits
cell arraylatchaddressRow
Column
latch
decoderRow
decoderaddress
R /W
A20 9- A 8 0-
D0D7
RAS
CAS
Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.
4096X(512X8)
Timing is controlled asynchronously. Specialized memory controller circuit to provide the necessary control signals CAS and RAS, that govern the timing.
Hence it is asynchronous DRAM
www.bookspar.com | Website for students | VTU NOTES
Fast Page mode All bits of a row are sensed but only 8 bits are placed This byte is selected by column address bits A simple modification can make it access other bytes of the
same row without having to reselect the row Add a latch to the output of the sense amplifier in each
column The application of a row address will load latches
corresponding to all bits in a selected row Need only different column addresses to place the different bytes
on the data lines Most useful arrangement is to transfer bytes in sequential
order Apply a consecutive sequence of column addresses under the
control of successive CAS signals. This scheme allows transferring a block of data at a much
faster rate than can be achieved for transfers involving random addresses This block transfer capability is called as fast page mode
www.bookspar.com | Website for students | VTU NOTES
SYNCHRONOUS DRAMs Operation directly synchronized with a clock signal Called as SDRAMs The cell array is the same as in Asynchronous DRAMs. The address and data connections are buffered by
means of registers Output of each sense amplifier is connected to a
latch A read operation causes the contents of all cells in the
selected row to be loaded into these latches If an access is made for refreshing purposes only, it wont
change the contents of these latches Data held in the latches that correspond to the selected
column(s) are transferred into the data output register
www.bookspar.com | Website for students | VTU NOTES
R/ W
R AS
CAS
CS
Clock
Cell arraylatch
addressRow
decoderRow
Figure 5.8. Synchronous DRAM.
decoderColumn Read/Write
circuits & latchescounteraddressColumn
Row/Columnaddress
Data inputregister
Data outputregister
Data
Refreshcounter
Mode registerand
timing control
www.bookspar.com | Website for students | VTU NOTES
SYNCHRONOUS DRAMs SDRAMs have several different modes of
operation Selected by writing control information into a mode
register Can specify burst operations of different lengths
In SDRAMs, it is not necessary to provide externally generated pulses on the CAS line to select successive columns Necessary signals are provided using a column
counter and clock signal Hence new data can be placed on data lines at each
clock cycle All actions triggered by rising edge of the clock
www.bookspar.com | Website for students | VTU NOTES
R/W
R AS
C AS
Clock
Figure 5.9. Burst read of length 4 in an SDRAM.
Row Col
D0 D1 D2 D3
Address
Data
www.bookspar.com | Website for students | VTU NOTES
Burst read of length 4 in an SDRAM. Row address latched under control of RAS signal
Memory takes about 2-3 cycles to activate selected row Column address is latched under control of CAS
signal After delay of 1 cycle, first set of data bits placed
on data lines SDRAM automatically increments column address to
access the next 3 sets of bits in the selected row, placed on data lines in successive clock cycles
SDRAMs have built in refresh circuitry Provides the addresses of rows that are selected for
refreshing Each row must be refreshed at least every 64ns
www.bookspar.com | Website for students | VTU NOTES
Latency and Bandwidth The parameters that indicate the performance
of the memory Memory latency – amount of time it takes to
transfer a word of data to or from memory In block transfers, latency is used to denote the
time it takes to transfer the first word of data This is longer than the time needed to transfer each
subsequent word of a block In prev diagram, access cycle begins with
assertion of RAS and first word is transferred 5 cycles later. Hence latency is 5 clock cycles
www.bookspar.com | Website for students | VTU NOTES
Bandwidth Bandwidth usually is the no of bits or bytes
that can be transferred in one sec Depends on
Speed of memory access Transfer capability of the links – speed of the bus No of bits that can be accessed in parallel
Bandwidth is product of the rate at which data are transferred ( and accessed) and width of the data bus
www.bookspar.com | Website for students | VTU NOTES
Double – Data – Rate SDRAM ( DDR SDRAMs) The standard SDRAM performs all actions on the rising
edge of the clock signal DDR SDRAMs access the cell array in same way but
transfers data on both the edges of the clock The latency is the same as standard SDRAMs But since they transfer data on both the edges of clock,
bandwidth is essentially doubled for long burst transfers To make this possible, the cell array is organized into 2
banks Each bank can be accessed separately Consecutive words of a given block are stored in different banks
Efficiently used in applications where block transfers are prevalent Eg., main memory to and from processor caches
www.bookspar.com | Website for students | VTU NOTES
Questions for assignment 1. Explain how processor reads and writes data
froma dn to memory 2. explain organization of 1K X 1 memory chip 3. Explain a single SRAM cell with diagram. How
read and write operations are carried out? 4. Explain DRAM cell with diagram. How read and
write operations are carried out? 5. Explain 2M X 8 DRAM chip. How can you modify
this for fast page mode 6. Explain SDRAMs with help of a diagram 7. Explain the terms latency and bandwidth 8. Explain the burst length read of 4 in SDRAM
with timing diagram 9. Explain DDR SDRAMs
www.bookspar.com | Website for students | VTU NOTES
Structure of larger memories Memory systems connected to form larger
memories There are 2 types of memory systems
Static memory systems Dynamic memory systems
www.bookspar.com | Website for students | VTU NOTES
Static Memory systems Following is the diagram for implementation of 2M X 32 memory
using 16 512K X 8 static memory chips There are 4 columns, each column containing 4 chips to implement
one byte position Only selected chips ( using chip select input ) place data on output
lines 21 address bits are needed to select a 32 bit word in this memory
high order 2 bits used to determine which of the 4 chip select signals should be activated
19 bits used to access specific byte locations inside each chip of selected row
R/W inputs of each chip are tied together to form a single R/W signal Dynamic memory systems are organized much in the same manner
as static Physical implementation more conveniently done in the form of memory
modules
www.bookspar.com | Website for students | VTU NOTES
Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.
19-bit internal chip address
Chip select
memory chip
decoder2-bit
addresses21-bit
19-bitaddress
512 K X 8
A0A1
A19
memory chip
A20
D31-24 D7-0D23-16 D15-8
512 K X 8
8-bit datainput/output
www.bookspar.com | Website for students | VTU NOTES
Memory System Considerations The choice of a RAM for a given system depends
on several factors Cost Speed Power dissipation Size of chip
Static RAMs are used when very fast operation is the primary requirement Used mostly in cache memories
Dynamic RAMs are predominant choice for computer main memories High densities achievable make larger memories
economically feasible
www.bookspar.com | Website for students | VTU NOTES
Memory Controller To reduce number of pins, dynamic memory chips
use multiplexed address inputs Address divided into 2 parts
High-order address bits, to select a row in a cell array, are provided first and latched into memory under control of RAS
Low-order address bits, to select a column, are provided on the same address pins and latched under CAS signal
Processor issues all bits of address at the same time
The required multiplexing of address bits are performed by a memory controller circuit,
www.bookspar.com | Website for students | VTU NOTES
Processor
RAS
CAS
R/ W
Clock
AddressRow/Column
address
Memorycontroller
R/ W
Clock
Request
CS
Data
Memory
Figure 5.11. Use of a memory controller.
www.bookspar.com | Website for students | VTU NOTES
Memory controller functions Interposed between processor and memory Processor sends Request signal
Accepts complete address and R/W signal from the processor
The controller forwards the row and column portions of address to the memory Generates the RAS and CAS signals
Also sends R/W and CS signals to the memory Data lines are directly connected between the
processor and the memory When used with DRAM chips, the memory
controller provides all the information needed to control the refreshing process Contains a refresh counter – to refresh all rows within
the time limit specified for a device
www.bookspar.com | Website for students | VTU NOTES
RAMBUS Memory To increase the system bandwidth we need to
increase system bus width or system bus speed A wide bus is expensive and required lot of space
on motherboard Rambus – narrow bus but much faster
Key feature is fast signaling method used to transfer information between chips
Uses the concept of differential signaling Instead of either 0 volts or Vsupply ( 5 Volts ), uses 0.3
volt differences from a reference voltage called as Vref
www.bookspar.com | Website for students | VTU NOTES
READ-ONLY Memories (ROMs) Both SRAMs and DRAMs are volatile
Loses data if power is turned off Many applications need to retain data even if power is
off E.g., a hard disk used to store information, including OS When system is turned on , need to load OS from hard disk to
memory Need to execute a program that boots OS That boot program, since is large, is stored on disk Processor must execute some instructions that load boot
program into memory So we need a small amount of non volatile memory that
holds instructions needed to load boot program into RAM
Special type of writing process to place info into non volatile memories
Called as ROM – Read Only Memorywww.bookspar.com | Website for
students | VTU NOTES
Not connected to store a 1Connected to store a 0
Figure 5.12. A ROM cell.
Word line
P
Bit line
T
www.bookspar.com | Website for students | VTU NOTES
ROM Transistor is connected to ground at point P
then 0 is stored Else 1 is stored Bit line connected to a power supply through a
resistor To read, word line is activated
If voltage drops down – then 0 If voltage remains same – then 1
www.bookspar.com | Website for students | VTU NOTES
PROM Allows data to be loaded by the user Achieved by inserting a fuse at point P in the
prev figure Before it is programmed, memory contains all
0s The user can insert 1at required locations
using high-current pulses Process is irreversible
www.bookspar.com | Website for students | VTU NOTES
EPROM Allows the stored data to be erased and new data to be
loaded Erasable, reprogrammable ROM – called as EPROM Can be used when memory is being developed
So that it can accommodate changes Cell structure is similar to ROM
The connection to ground is always made at point P A special transistor is used – ability to function either as a
normal transistor or as a disabled transistor which is always turned off Can be programmed to behave as permanently open switch
Can erase by exposing the chip to UV light which dissipate the charges trapped in transistor memory cells
www.bookspar.com | Website for students | VTU NOTES
EEPROM Disadvantage of EPROMs
Chip must be physically removed from the circuit for reprogramming
Entire contents are erased from UV light EEPROM – another version of Erasable PROM
that can be both programmed and erased electrically Need not be removed for erasure Can erase cell contents selectively
Disadvantage Different voltages needed for erasing , writing and
reading stored datawww.bookspar.com | Website for
students | VTU NOTES
Flash Memory An approach similar to EEPROM
A flash cell is based on a single transistor controlled by trapped charge
In EEPROM can read and write a single cell In Flash memory – can read the contents of a single
cell but can write only to a block of cells Flash devices have greater density
Higher capacity Lower cost per bit
Require a single power supply voltage Consumes less power in operation
Used in portable equipment that is battery driven – handheld computers, cell phones, digital cameras, MP3 players
www.bookspar.com | Website for students | VTU NOTES
Flash Cards and Flash Drives Single flash chips do not provide sufficient storage
capacity Larger memory modules are required – flash cards
and flash drives Flash cards
Mount flash chips on a small card A card is simply plugged into a conveniently accessible slot Variety of sizes
Flash Drives Larger modules to replace hard disk drives Designed to fully emulate hard disks – not yet possible Storage capacity is significantly lower Have shorter seek and access times hence faster response Lower power consumption Insensitive to vibration
www.bookspar.com | Website for students | VTU NOTES
Speed, Size and Cost Ideal memory – fast, large and inexpensive Very fast memory if SRAM chips are used
These chips are expensive So impractical to build a large Memory using SRAM chips
DRAM chips are cheaper But also slower
Solution for space is to provide large secondary storage devices Very large disks at reasonable prices
For main memory – use DRAMs Use SRAMs in smaller memories like cache memory
www.bookspar.com | Website for students | VTU NOTES
Processor
Primarycache
Secondarycache
Main
Magnetic disk
memory
Increasingsize
Increasingspeed
Figure 5.13. Memory hierarchy.
secondarymemory
Increasingcost per bit
Registers
L1
L2
www.bookspar.com | Website for students | VTU NOTES
Cache Memories Speed of main memory slower than modern processors Processor cannot spend time wasting to access
instructions and data in main memory Use a cache memory which is much faster and makes
the main memory appear faster to processor than it really is
Effectiveness of cache based on locality of reference – many instructions in localized areas of the program are executed repeatedly during some time period, and the remainder of the program is accessed relatively infrequently. Two ways Temporal – a recently executed instruction is likely to be
executed again very soon Spatial – instructions in close proximity to a recently executed
instruction ( with respect to instruction’s address ) are likely to be executed soon
www.bookspar.com | Website for students | VTU NOTES
Operation of a cache If the active segments of the program can be placed
in fast cache memory – can reduce total execution time significantly
Memory control circuitry designed to take advantage of locality of reference Temporal – whenever an item( instruction or data) is first
needed, this item is brought into the cache – remains there till needed again
Spatial – instead of fetching just one item from the main memory to the cache, fetch several items that reside at adjacent addresses. – referred to as block or cache line.
Replacement algorithm – to decide which block of data to be moved back from cache to main memory so that a new block can be accommodated
www.bookspar.com | Website for students | VTU NOTES
Figure 5.14. Use of a cache memory.
CacheMain
memoryProcessor
www.bookspar.com | Website for students | VTU NOTES
Operation of a cache Read request from processor
The contents of a block of memory words containing the location specified are transferred into the cache one word at a time When the program references any of the locations in this block,
the desired contents are read directly from the cache. The cache can store reasonable no of words; but it is
small compared to main memory The correspondence between main memory blocks and
those in the cache is specified by a mapping function When the cache is full and a memory word that’s not
in cache is referenced, the cache control hardware decides which block must be removed to create space for newly arrived block The collection of rules for this operation is called
replacement algorithm
www.bookspar.com | Website for students | VTU NOTES
Cache operation Processor does not explicitly need to know
existence of cache It issues read/write requests using memory addresses the cache control circuitry determines whether the
requested word is currently in cache. If in cache, the read/write operation is preformed on
appropriate cache location In this case, a read or write hit is said to have occurred If its read operation, then main memory is not involved For a write operation – there are 2 options
Write-through protocol – both cache and main memory updated simultaneously
Write-back or copy-back protocol – only cache will be updated during write operation. ( denote this using a dirty or modified bit) later when we are moving this block back to main memory – update the main memory.
www.bookspar.com | Website for students | VTU NOTES
Limitations of write-through and write back protocols The write-through protocol is simpler but it
results in unnecessary write operations in the main memory when a given cache word is updated several times during its cache residency
The write-back protocol may also result in unnecessary write operations because when a cache block is written back to the memory all words of the block are written back, even if only a single word has been changed while the block was in cache
www.bookspar.com | Website for students | VTU NOTES
Read miss When the addressed word is not present in
cache The block of words that contains the requested
word is copied from the main memory into cache After that, the requested word is sent to processor
Alternatively, this word may be sent to the processor as soon as it is read from the memory Called as load-through or early restart Reduces the processor’s waiting period But needs more complex circuitry
www.bookspar.com | Website for students | VTU NOTES
Write miss Occurs if the addressed word is not in cache If write-through protocol is used, the
information is written directly into the main memory
If write-back protocol is used, the block containing the addressed word is first
brought into the cache The desired word in the cache is overwritten with
new info
www.bookspar.com | Website for students | VTU NOTES
Mapping functions Correspondence between main memory blocks and
those in the cache 3 techniques
Direct mapping Associative mapping Set-associative mapping
Consider a cache 128 blocks of 16 words each Total of 2K words ( 2048)
Consider Main memory has 16 bit address 64Kwords 4K blocks of 16 words each
Consecutive address refers to consecutive memory locations
www.bookspar.com | Website for students | VTU NOTES
Direct Mapping The simplest way to determine cache locations in
which to store memory blocks is the direct-mapping technique
Block j of main memory maps onto block j modulo 128 of the cache. ( refer following figure )
Whenever one of main memory blocks 0,128,256,.. is loaded in the cache, it is stored in cache block 0.
Blocks 1,129,257,… are stored in cache block ? Since more than one memory block is mapped onto
a given cache block position, contention may arise even when the cache is not full Eg., instructions of a program may start at block 1 and
continue in block 129 ( possibly after a branch) Can resolve the contention by allowing new block to
overwrite the currently resident block
www.bookspar.com | Website for students | VTU NOTES
tag
tag
tag
Cache
Mainmemory
Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 Main memory addressT ag Block W ord
Figure 5.15. Direct-mapped cache.
5
www.bookspar.com | Website for students | VTU NOTES
Direct mapping contd.. Placement of a block in the cache is determined from
the memory address. Memory address divided into 3 fields
Low order 4 bits – 1 out of 16 words 7 bit cache block field – to determine which cache block
this new block is stored High- order 5 bits – tag bits associated with the location in
cache Identifies which of the 32 blocks that are mapped into this cache
position are currently resident in cache If they match then the desired word is in the cache, If there is no match, the block containing the required word must
first be read from the main memory and loaded into cache Direct mapping is easy but not flexible
www.bookspar.com | Website for students | VTU NOTES
Associative mapping A main memory block can be placed into any cache
block position 12 tag bits to identify a memory block when it is
resident in the cache The tag bits of an address received from the
processor are compared to the tag bits of each block of the cache to see if desired block is present
Called as associative - mapping technique Gives complete freedom in choosing the cache
location in which to place the memory block New block has to replace an existing block only if the cache
is full Need replacement algorithms to choose which block to replace
Cost of this mapping technique is higher than direct mapping as we need to search all 128 tag patterns – called as associative search
www.bookspar.com | Website for students | VTU NOTES
4
tag
tag
tag
Cache
Mainmemory
Block 0
Block 1
Block i
Block 4095
Block 0
Block 1
Block 127
12 Main memory address
Figure 5.16. Associative-mapped cache.
T ag W ord
www.bookspar.com | Website for students | VTU NOTES
Set-associative mapping Combination of direct mapping and associative
mapping Blocks of the cache are grouped into sets
Mapping allows a block of the main memory to reside in any block of a specific set.
So, we have got a few choices where to place the block, the problem of contention of the direct method is eased
The hardware cost is reduced by decreasing the size of the associative search.
Following figure is a example – with 2 blocks per set. Memory blocks 0,64,128,….,4032 map into cache set 0, and
they can occupy either of the two block positions within this set.
Total 64 sets, so we need 6 bits to choose a set Compare tag field with tags of the cache blocks to check if
the desired block is present
www.bookspar.com | Website for students | VTU NOTES
tag
tag
tag
Cache
Mainmemory
Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
tag
tagBlock 2
Block 3
tag Block 127
Main memory address6 6 4T ag Set W ord
Set 0
Set 1
Set 63
Figure 5.17. Set-associative-mapped cache with two blocks per set.
www.bookspar.com | Website for students | VTU NOTES
Set associative mapping contd… No of blocks per set is parameter that can be
selected to suit the requirements of the computer Four blocks per set can be accommodated by a 5-bit
set field. Eight blocks per set can be accommodated by 4-bit
set field 128 blocks per set? Requires no set bits and is fully
associative technique , with 12 tag bits Other extreme of one block per set is direct-mapping
method
A cache with k blocks per set is called as k-way set-associative cache.
www.bookspar.com | Website for students | VTU NOTES
Valid bit and cache coherence problem A control bit called as valid bit is provided for
each block Indicates whether the block contains a valid data Is different from the dirty or modified bit Dirty bit is required only in systems that don’t use
write-through method Valid bit is initially 0 when
power is applied to system Main memory is loaded with new programs and data
from the disk Transfers from the disk to the main memory are
carried out by a DMA mechanism Normally DMA transfers bypass cache ( cost and
performance)www.bookspar.com | Website for
students | VTU NOTES
Valid bit and cache coherence problem Valid bit of a block is set to 1 the first time block is
loaded from main memory Whenever a main memory block is updated by a source
that bypasses cache A check is made to determine whether the block being
loaded is currently in cache If so, then its valid bit is cleared to 0. This ensures that stale data does not exist in cache
Whenever a DMA transfer made from the main memory to the disk, and cache uses write-back protocol Data in the memory might not reflect the changes that have
been made in the cached copy. Solution : flush the cache by forcing the dirty data to be
written back to the memory before DMA transfer takes place Need to ensure that two different entities ( processor and
DMA in this case ) uses the same copies of data is referred to as a cache coherence problem
www.bookspar.com | Website for students | VTU NOTES
Replacement algorithms In direct mapping method, the position of each block
is pre-determined No replacement strategy exists
In associative and set-associative strategy, there is some flexibility If cache is full when a new block arrives, the cache
controller must decide which of the old blocks to overwrite This decision is very important and determines system
performance Keep the blocks in the cache that may be referenced in the
near future Some algorithms
LRU block Oldest block Random block
www.bookspar.com | Website for students | VTU NOTES
Least Recently Used ( LRU ) replacement algorithm Uses the property of locality of reference High probability that the blocks that have been
referenced recently will be referenced soon So when a block needs to be overwritten, overwrite the
one that has gone the longest time without being referenced This block is called as least recently used block
The cache controller must track references to all the blocks Uses a 2-bit counter for a set of 4 blocks
When hit occurs – the block’s counter is made 0 Lower values are incremented by 1 Higher values are unchanged
When a miss occurs – Set is not full – new block is loaded and assigned counter value 0 Set is full – block with counter value 3 is removed and new block
put in its place. Other 3 blocks’ counters are incremented by 1
www.bookspar.com | Website for students | VTU NOTES
Reading assignment : Go through the examples of mapping
techniques in the text book
www.bookspar.com | Website for students | VTU NOTES
Performance considerations 2 key factors in success of a computer – cost and
performance The objective is to achieve best possible performance at
the lowest possible cost Challenge in design alternative is to improve performance
without increasing cost Measure of success – price/performance ratio
Performance depends on how fast instructions can be brought into the processor for execution and how fast they can be executed In this unit, we will focus on the first aspect
In case of memory we need shorter access time and larger capacity If we have a slow and faster unit – it is beneficial if we can
transfer data at the rate of faster unit – to achieve this we go for parallel access using a technique called as interleaving
www.bookspar.com | Website for students | VTU NOTES
Interleaving Main memory of a computer is structured as a
collection of physically separate modules Each with its own address buffer register ( ABR )and data
buffer register ( DBR ) Memory access operations may proceed in more than one
module at the same time. Two ways of implementing interleaving
High order k bits name one of n modules and low order m bits name a particular word in that module When consecutive locations are accessed only one module
is involved Devices with DMA capability can access info from other
memory modules Low – order k bits select a module and high order m bits name
a location within that module Consecutive addresses are located in successive modules Hence faster access and higher average utilization Called as memory interleaving – more effective way
www.bookspar.com | Website for students | VTU NOTES
m bitsAddress in module MM address
Figure 5.25. Addressing multiple-module memory systems.
(b) Consecutive words in consecutive modules
i
k bits
0ModuleModuleModule
Module MM address
DBRABRABR DBRABR DBR
Address in module
(a) Consecutive words in a module
i
k bits
Module Module Module
Module
DBRABR DBRABR ABR DBR
0
2 k 1-
n 1-
m bits
Go through the example in the text for better understandingwww.bookspar.com | Website for students | VTU NOTES
Problem A cache with 8-word blocks, on a read miss, the block that
contains the desired word must be copied from main memory into cache. Assume – it takes one clock cycle to send an address to main
memory Memory built using DRAM chips – first word access takes 8 cc and
subsequent words in same block can be accessed in 4cc per word. One CC needed to send one word to cache
Using a single memory module – time needed to load desired block into cache is
1 + 8 +(7X4) + 1 = 38 CC Using memory interleaving – 4 words accessed in 8 CC and
transferred in next 4 CC word by word, during which remaining 4 words are read and stored in DBR. These 4 words are transferred one word at a time to cache So time required to transfer a block is 1+8+4+4 = 17 CC
www.bookspar.com | Website for students | VTU NOTES
Hit rate and miss penalty The number of hits stated as fraction of all
attempted accesses is called the hit rate The number of misses stated as a fraction of all
attempted accesses is called as miss rate Hit rates well over 0.9 are essential for high-
performance computers Performance is adversely affected by the actions
that must be taken after a miss the extra time needed to bring the desired info into
the cache is called as miss penalty Miss penalty is the time needed to bring a block of
data from a slower unit in memory hierarchy to a faster unit Interleaving can reduce miss penalty substantially
www.bookspar.com | Website for students | VTU NOTES
Problem Let h be hit rate, M the miss penalty – ( time to access info
from main memory), and C – the time to access information in the cache. Average access time is tave = hC + (1-h)M
Consider same parameters as previous problem If computer has no cache, then using a fast processor and a typical
DRAM main memory, it takes 10 clock cycles for each memory read access
Suppose computer has a cache that holds 8-word blocks and an interleaved main memory. Then it requires 17 cycles ( as discussed before ) to load one block to
cache Suppose 30 percent of the instructions in a program perform read or
write operation – 130 memory accesses for every 100 instructions Assume hit rates are .95 for instructions and .9 for data Assume miss penalty is same for both read and write accesses An estimate of improvement in performance is – Time without cache / time with cache =
(130 X 10) / ( 100(.95X1 + .05X17) + 30(.9X1 + .1X17) ) = 5.04 So computer with cache performs 5 time better ( considering
processor clock and system bus have same speed)www.bookspar.com | Website for
students | VTU NOTES
Caches on the processor chip From speed point of view, optimal space for cache is on
the processor chip Since space on processor chip Is required for many other
functions, this limits the size of cache that can be accommodated
Either a combined cache(offers greater flexibility in mapping) for instructions and data or separate caches(increases parallel access of information but more complex circuitry) for instructions and data.
Normally 2 levels of caches are used L1 and L2 cache L1 designed to allow very fast access by processor
Its access time will have a very large effect on clock rate of processor L2 can be slower but it should be much larger to ensure high hit
rate A work station computer may include L1 cache with capacity 10s of
kilobytes and L2 cache with capacity several megabytes Including L2 cache further reduces the impact of main
memory speed on the performance of the computerwww.bookspar.com | Website for
students | VTU NOTES
Cache on processor chip Average access time experienced by the
processor with 2 levels of caches is tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M h1 – hit rate in L1 h2 – hit rate in L2 C1 – time to access info in L1 cache C2 – time to access info in L2 cache M – time to access info in main memory
No of misses in L2 cache must be very low
www.bookspar.com | Website for students | VTU NOTES
Write buffers Temporary storage area for write requests Usage when write-through protocol is used
Each write operation results in writing new value to the memory
If processor waits for memory function to be completed, then processor is slowed down
Processor immediately does not require results of write operation
Processor instead of waiting for write operation to complete, places the write requests into this buffer and continues execution of next instruction
The write requests are sent to memory whenever read requests are not serviced by memory Because read requests must be serviced immediately – else
processor cannot proceed without the data to be read
www.bookspar.com | Website for students | VTU NOTES
Write buffers Write buffer holds a number of write requests
A read request may refer to data that are still in write buffer So, addresses of data to be read from memory are compared
with addresses of the data in the write buffer In case of match , data in write buffers are used
Usage of write buffers when write-back protocol is used Write operations are simply performed on the
corresponding word in the cache If a new block of data comes into cache as a result of read
miss, it replaces an existing block which has some dirty data ( modified data) which has to be written into main memory
If write-back operation is performed first, then processor has to wait longer for new block to be read into the cache
So to read the data first, provide a fast write buffer for temporary storage of dirty block that is ejected Afterwards contents of write buffer are written into memory
www.bookspar.com | Website for students | VTU NOTES
Prefetching New data are brought into cache when they are first needed The processor has to pause until the new data arrive To avoid this stalling of the processor, it is possible to prefetch
the data into the cache before they are needed Prefetching done either through software or hardware
In software – include a separate prefetch instruction – which loads the data into cache by the time data are required in the program Allows overlapping of accesses to main memory and
computation of the processor Prefetching instructions inserted either by compiler or by
programmer – compiler insertion is better In hardware – adding circuitry that attempts to discover a
pattern in memory references and prefetches data according to this pattern
www.bookspar.com | Website for students | VTU NOTES
Lock-up free cache The software Prefetching does not work well if it
interferes with normal execution of instructions If action of prefetching stops other accesses to the
cache until the prefetch is completed A cache of this type is said to be locked while it
services a miss Allow the processor to access the cache while the
miss is being serviced A cache that allows multiple outstanding misses is
called lockup-free cache Since it can service only one miss at a time, it must
have circuitry to keep track of all outstanding misses By including special registers to hold pertinent information
www.bookspar.com | Website for students | VTU NOTES
VIRTUAL Memories Refer to slides given separately
www.bookspar.com | Website for students | VTU NOTES
SECONDARY Storage Magnetic hard disks
Organization and accessing of data on a disk Access time Typical disks Data buffer/cache Disk controller Floppy disks RAID Disk arrays Commodity disk considerations
Optical disks CD technology CD-ROM CD-Recordable CD-Rewritable DVD Technology DVD-RAM
Magnetic tape systems
www.bookspar.com | Website for students | VTU NOTES