Introduction · Web viewBandwidth. Word transmission rate. Latency. Time to access the first of a...
Transcript of Introduction · Web viewBandwidth. Word transmission rate. Latency. Time to access the first of a...
The Memory HierarchyIntroduction
TerminologyAccess time
Time to access a word in memorySpecifies the read or write time
Note these may be differentThe memory may be organized as
Bits, bytes, or wordsCycle time
Time from the start of one read until the nextBlock size
Number of words in a blockNote this is a logical description
BandwidthWord transmission rate
LatencyTime to access the first of a sequence of words
Block access timeTime to access an entire block from the start of a read
DRAMDynamic random access memory
SRAMStatic random access memory
Semi-static RAMThe periphery is clock activated (dynamic) - thus is inactive until clocked.Only one memory cycle is permitted per clockPeriphery circuitry must be allowed to reset after each active memory cycle for min precharge time.No refresh is required.
SDRAMSynchronous DRAM SDRAM synchronizes all addresses, data, and control signals to the system clock. Allows much higher data transfer rates that asynchronous transfers
ROMRead only memory
PROMProgrammable read only memory
EPROMErasable programmable read only memory
EEPROMElectrically erasable programmable read only memory
CASColumn address strobeClock in used in dynamic memories to control the input of column addresses
RAS
Row address strobeClock in used in dynamic memories to control the input of row addresses
RefreshTechnique used in dram or SDRAM through which data is retained in memory
Refresh time intervalTime between two refresh operations - determine by system in which memory is operating
Memory block on Von Neumann machinesActually comprised of number of memory components
Arranged In hierarchical mannerTo co-operate with each other
Hierarchical metricsSpeedStorage capacity
At topSlowest largest memories
Also known as secondary memoryAlso tend to be least expensiveSize
On order of 10’s to 100’s of gigabitsLatency
On the order of 10’s of msBandwidth
1 MB per secCost
$0.02 per MBDevices
Tape for archival storageHigh density disk drives
BottomSmallest fastest memories
Call this cache memoryAlso tend to be most expensiveSize
On order of 100’s to 1000’s bitsUp to several M in some machines today
Latency 10-20 ns
Bandwidth8-10 MB per sec
Cost per MB$500.00
Devices
RegistersHigh speed cache
MiddleOften called primary memorySize
On order of 100’s of M bits to 1’s of G bitsLatency
50 nsBandwidth
1 MB per secCost
$30.00 per MBDevices
RAM ROMSome hard drivesLower speed cache
MotivationWe would prefer program to execute as quickly as possibleAs we’ve seen
Accessing memory takes timeEach access contributes to time required to execute instruction
Static Ram Design
A typical SRAM cell appears as follows
Observe that we have 6 transistors per cell
Two access transistors enable the cell forRead and write
Write operationValue written into cell by
Applying value to bi and !bI through write / sense amplifiers Assert word line Causes new value to be written into latch
Read operationValue read from cell by
Precharge bi and !bi to voltage halfway between 0 and 1 Assert the word line Drives bi and !bi to high and low or low and high Values are sensed and amplified by write / sense amplifier
Typical timing is given as
DRAM DesignA typical DRAM cell appears as follows
Observe that we have only one transistor per cellRead and write operations use single bit line
Write operationValue written into cell by
Applying 0 or 1 to bi through write / sense amplifiers Assert word line Charges cap if 1 stored discharges if 0 stored
Read operationValue read from cell by
Precharge bi to voltage halfway between 0 and 1 Assert the word line Gates signal onto bi
Values are sensed and amplified by write / sense amplifier Read operation causes cap to discharge Sensed and amplified value placed back on bit line
Called a refresh operation
Typical timing is given as shown
Chip OrganizationIndependent of type of internal storageTypical RAM chip configured as shown in following drawing
Making Things WorkLocality of Reference
Goal Reduce number of accessesMake each access short as possibleUtilized to much greater extent in today’s memories
Ideally would like to make all memory as fast as technology allowsSuch action has associated cost
Memories near bottom are expensiveSupport circuitry for such memories also expensive
Additional circuitry requiredPower supplies to support
Almost all programs executed today written using procedural paradigmIf we analyze how such programs
DesignedExecute
Discover interesting phenomenonExecution generally occurs
SequentiallySmall loops
Number of instructions
Means overall progress forward through program
Executing at much lower rate than Access times of fastest memory
Put another wayWith respect to entire program
We are executing within a small windowThat moves forward through programThis is shown as the following
Formally such phenomenon calledLocality of referenceWe recognize program executing only few instructions
Within small window
BenefitsIf we can keep those few instructions
In fast memoryProgram will appear to be executing out of fast memoryGain benefits of such speedReduced cost
Important pointApproach works provided
Area within which we are executing is in windowMethod can easily be defeated with
Large loopsBranches outside window
ArchitectureLet’s now look at portion of memory hierarchy
We’ll not consider Archival storageROM or CDROMRegisters
We will focus on
Hard DriveRAMCache
Secondary memory218 pages
Primary memory210 pagesPage = 4 blocksBlock = 1 K wordsWord = 4 bytes
Caching and Virtual MemoryCan represent these as
GoalOperate out of cache memoryWhen need instruction or data not in cache
Bring in from RAMRefer to this as caching
When need instruction or data not in RAMBring in from hard driveUse mechanism similar to cachingCall it virtual memory
PerformanceHow well we do this
Establishes effectiveness of our memory management scheme
CachingLet’s examine cache memory and caching techniques firstOnce again idea
In one senseTake advantage of locality of reference
InstructionsData
To minimize access timeLarger sense
Can use caching techniquesMany places to optimize performance
InternetBit images cached locally to improve display speed
Network file systemsTemporarily maintain local copy
To avoid having to retransferBased upon assumption will be using again in near future
ImplementationCaching requires certain amount of local memory
Size determines how much information can be stored locally
High Level DescriptionProgram begins executingEncounters needed data or instruction
Check cacheIf in cache
Have a cache hitUse
ElseHave cache missMust go get from somewhere else
Bring in new block of data or instructionsIf room left in cache
Store blockElse
Must make roomRemove existing block
If block has been modifiedSave changes
ElseDiscardWrite new block in its place
Important issuesHow do we know something is not in cacheWhere do we go to find something if not in cache
What if not thereHow do we know if room left in cacheHow do we know if information in cache modifiedHow do we select block to replace
Detailed ImplementationWill address each question as we build a cacheImplementation scheme called direct mapped cacheFirst step
Design cacheHardware
Collection of memory devicesMemory address registerMemory data registerWords will be 32 bitsWill have 256 k word cache
ArchitectureWe will logically divide cache into 256 blocks
Each block will be 1 k words longNote this is a logical divisionFurther note
Address increments are rounded to make simplerProvides reasonable size piece of memory to work with
Cache will now logically appear as
28 = 256 blocks 2 = 1000 words
A 1 k block requires 10 bits address bits To uniquely identify each locationRecall our word is 4 bytes long
Bits A0 - A1Identify the byte
Bits A2 - A11Identify a word in cache
Because our cache is logically divided into 256 blocksNeed 8 bits to identify each blockWe can use the actual physical address to do thisThus will use address bits
A12 - A19These 8 bits will give the required 256 combinationsWe’ll call these the index
We do this as followsAny block of addresses with
A12 - A19 - 0000 0000Store in Block 0
A12 - A19 - 0000 0001Store in Block 1
A12 - A19 - 0000 0010Store in Block 2etc.
A20 - A31Not directly used to address cacheUsed to distinguish
Blocks within cacheCalled a tag
Stored in tag table
Tag TableTag table provides last bit of informationContains
One entry for each block in cacheOurs will contain 256 entries
One for each block
Entry containsBit to indicate if word within block modified
Called dirty bitAddress bits A20 - A31 of corresponding blockBit to indicate block in cache
SummarizingA0 - A1 Identify Byte within a WordA2 - A11 Identify Word within a blockA12 - A19 Identify Block within cacheA20 - A31 Identify Addresses within a Block
Stored in tag table
Finding a wordTo find word in cache execute simple process
Check tag table for bits A20 - A31If present
Use bits A12 - A19 to index into cacheUse bits A2 - A11 to index into blockUse bits A0 - A1 for byte accessif WRITE operation
Set dirty bit in tag tableModify word
else return wordelse
Get block from primary memoryif block occupied
Check dirty bitif set
write block to primary memorywrite block to cacheset occupied bit
elsewrite block to cacheset occupied bit
Data or instruction
Most contemporary computersUse two caches
Data Instruction
Same principles work for bothOnly extra work
Deciding which cache to usePerformance
Factors to consider in each caseWith and without cacheWith cache
With and without missOptimizing size
Affect of look ahead
Associative CacheAlternate approach
Let block be placed anywhere in cacheUse associative search to locate
Organization now appears as
Let’s specify the followingMain memory
8 K with 8 byte blocksCache
2 K with 256 8 byte blocksTag table
256 entries
Main memory block goes anywhere in cacheEntry in tag table is main memory block numberLinear search of tag table not feasible
Let a main memory address be of the form
We find a word as follows
Problem with full associative cacheLong search timeComplexity of underlying logic
Let’s now look at scheme that combines features ofDirect mappingAssociative mapping
Called block set associative
Block Set Associative Approach combining direct and associative mappingMain memory organized as collection of groups
Each group comprises number of blocksCache memory organized as collection of sets
Containing specified number of blocksSet number corresponds to main memory group number
Any block from group j can be placed into set jSet is now searched associatively
Far less complex searchDealing with smaller search space
Organization now appears as
Let’s specify the followingMain memory
8 K with 8 byte blocksCache
2 K with 256 8 byte blocksTag table
256 entries
Our addresses have the following association
We can now see how a main memory address IMapped to a cache address
Computation of address follows in same manner asDirect and associative mappings
Intel PentiumImplements separate
Data and instruction cachesEach uses 2 way block set associative scheme
Virtual MemoryVirtual memory is a scheme
Very much like cachingDifference
Caching between primary memory and CPUVirtual memory works between secondary and primary memories
Translates From logical address - program To physical address = primary memory
When information not found in cachePrimary memory checked
When information not found in primary memorySecondary memory checked
In essence primary memory acts as cache for secondary memory
Purpose two foldTake advantage of speed of primary memoryCreate the appearance of unlimited primary memory
High Level AnalysisAs we saw with cache schemeSize of primary memory significantly smaller than secondaryRather than blocks
Primary memory divided into pagesWould like each program to have memory space allocated when loaded
Will assume memory space is contiguousWould like to be to place pages anywhere in primary memory
Makes addressing only slightly more complicatedWill store location of program memory in page tableSimilar to tag table in cache scheme
General retrieval algorithm similar to what we’ve seenProgram begins executingEncounters needed data or instruction
Check cacheIf in cache
Have a cache hitUse
ElseHave cache missCheck primary memoryIf in primary
Bring associated block into cacheElse
Have a page faultGet from secondary memoryMust make roomRemove existing page
If page has been modifiedSave changes
ElseDiscardWrite new page in its placeBring associated block into cache
ImplementationDesign primary memory
HardwareCollection of memory devicesMemory address registerMemory data registerWords will be 32 bitsWill have 16 M byte primary memory
ArchitectureWe will logically divide primary memory into 210 4k pages
Each page will be 4k words longEach will hold 4 blocksEach block will hold 1 K words
Note this is a logical divisionFurther note
Again address increments are rounded to make simplerProvides reasonable size piece of memory to work with
Primary memory will now logically appear asNote j not necessarily = i + 1We consider a
Virtual address Primary memory address
Secondary memory address
Assume Virtual Memory has212 pages
Assume Secondary Memory has218 pages 1 G words or 4 G bytes
Similar to cacheA2 - A11
Identifies word in page
Assume Main Memory has 210 pages – 1000 pages
Identified by bits A23 - A14Now
A13 - A12Identify block in page
A11 - A2Identify word in block
A1-A0Byte within a word
Page table Contains one entry for each of your possible page in secondary memory
Our design will have 218 entriesYour pages can be anywhere in secondary memory
To your program they appear at 0, 1, 2…m-1
Could also have Page tables i, j, and k etc.
Alternately useA31 - A24 to identify one of 256 page tables
Potentially allows for upto 256 jobs in memoryWhen job enters system
Page table included
May have only subset of your pages in memory at any one timeA23 - A14 to identify a page within a page table
From point of view of virtual memory address
Page number represents an offset into the page table
Valid bit to indicate if page in primary memoryEntry contains pointer to location in main memoryIf not in main memory
Point to location in secondary memory
Dirty bit to indicate modified data
VM Page Number Page Address Status Location in MM or SM0 0-4095 01 4096-8191 22 1
13127 123
7212 …. 55
Address calculation proceeds as with cacheOnce we find the pages in primary memoryWhen program loaded
Primary memory space allocatedAmount depends upon program
Address of allocated spaceStored in page table register
Gives starting address of allocated space
To find pagesGo to page tableAdd
Contents of page table register to A31 - A14 of virtual address
Gives index into page tableAddress
Physical memory if therePointer to secondary memory otherwise
Use A13 - A12 to identify block
Page ReplacementClearly primary memory of limited size
16 M bytesWe have the ability to address
4 G wordsAlthough 16 M seems like lot
Do not want to restrict program to that sizeTo satisfy requirements will need to be able to
Load additional pages into memoryAs long as space left no problemIf no space
Must remove somethingSeveral schemes available each has advantages and disadvantagesAll require checking dirty bit prior to removal
If set Write operation necessary
Otherwise Overwrite page
Two most commonRequire time stamp on each pageLRU
Remove least recently used pageAlso called FIFOAssumes oldest page least likely to be used in future
MRURemove most recently used pageAlso called LIFOAssumes newest page least likely to be used in future
RandomSelect and remove page at randomEasy to implement
PerformanceWith and without VMWith VM
With and without page faultOptimizing size
Speeding things up - TLBOur virtual memory page table
Contains information for 218 pagesSearch process and address calculation
Can be very time consumingWould like to improve response timeEasiest way
Use caching techniques learned earlierKeep in memory a cache containing
Most recently used page addressesCalled Translation Look Aside Buffer - TLBSearch for a page begins in the TLBIf not found then check page table as before
Architecture
Choose 256 entries256 most recently calculated addresses stored
Implement associative search Keyed on virtual memory page number
Each entry must containValid Bit
Indicates if entry is validDirty Bit
Indicates if entry has been changedTag
A31 - A14 of virtual addressPhysical Page
Computed address of page in primary memory