Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB...
Transcript of Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB...
CS521 CSE IITG 11/23/2012
A Sahu 1
A Sahu 1
• Memory Hierarchy
• Hit/Miss, IPC
• Cache: Set, Line size, Associativity
A Sahu 2
No Class on Friday (07‐SEP‐2012)
Typical specs of a computer todayhttp://www.flipkart.com
Dell XPS 14
14.0” WLED (1366 x 768), Intel Core i7‐740QM
4 GB DDR3 1333 MHz, 500 GB, Windows 7 Home Premium, 8x CD/DVD burner (dual layer DVD+/‐R drive), NVIDIA GeForceGT 425M
2 USB 2.0, HDMI, eSATA, 6‐cell lithium‐ion
Built‐in 2.0‐megapixel HD Price: 48,300
HP TouchSmart TM2 Series TM2‐2102TU
(Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM,
* 12.1 Inch Screen, * Windows 7 Home Premium Price : Rs. 47,199.00
3A Sahu
Memory technologies
• Semiconductor– Registers– SRAM Random Access– DRAM– FLASHM ti
Array : x=A[i]• Magnetic
– FDD– HDD
• Optical Random (Seq. Seek to Sector) + sequential
– CD– DVD Array + Linked List
4A Sahu
Memory Hierarchy
• Smaller is Faster ‐ Bigger is Slower
• Places of Cash/Money
BANK
Home Locker
Purse
HDD
DDR RAM
Cache
Regs
Speed Storage5A Sahu
CPU Design with Memory Hierachy
Data Memory
ALU
PC
InstructionMemory
Address
Instruction Register FILE
Address
D t
Data
Reg#
Reg#
Reg#
Data
CPU
PC
Register FILE
ALU
InstructionMemory
DataMemory
CPU
PC
Register FILE
ALU
Unified L2
IL1 DL1
Memory
6A Sahu
CS521 CSE IITG 11/23/2012
A Sahu 2
• Programmers want unlimited amounts of memory with low latency
• Fast memory technology is more expensive per bit than slower memory
• Solution: organize memory system into a hierarchy– Entire addressable memory space available in largest, y p gslowest memory
– Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor
• Temporal and spatial locality insures that nearly all references can be found in smaller memories– Gives the allusion of a large, fast memory being presented to the processor
7A Sahu
Hierarchical structure
Memory
CPU
Smallest HighestFastest
Speed Size Cost/Bit
Memory
Memory
Biggest LowestSlowest
Memory
8A Sahu
9A Sahu
• Memory hierarchy design becomes more crucial with recent multi‐core processors:– Aggregate peak bandwidth grows with # cores:
• Intel Core i7 can generate two references per core per clock
• Four cores and 3.2 GHz clockb ll b d f / d– 25.6 billion 64‐bit data references/second +
– 12.8 billion 128‐bit instruction references– = 409.6 GB/s!
• DRAM bandwidth is only 6% of this (25 GB/s)• Requires:
– Multi‐port, pipelined caches– Two levels of cache per core– Shared third‐level cache on chip
10A Sahu
Performance and Power
• High‐end microprocessors have >10 MB on‐chip cache
– Consumes large amount of area and power budget
Introduction
11A Sahu
1000
10,000
100,000
A Sahu 12
1980 1985 1990 1995 2000 2005 2010Year
ProcessorProcessor‐MemoryPerformance e Gap
1
10
100
1000
Performance
Memory
CS521 CSE IITG 11/23/2012
A Sahu 3
Data transfer between levels
Processoraccess
hit
miss
Processor
Cache
Data are transferred
unit of transfer = block
Memory
13A Sahu
Principle of locality• Temporal Locality
– references repeated in time
• Spatial Locality– references repeated in space– references repeated in space
– Special case: Sequential Locality
for(i=0;i<100;i++){A[i] += sqrt(i);
} // 1D SPLocalityAccess A[i], near future will Access A[i+1], A[i+2]..
for(T=0;T<80;T++){for(i=0;i<10;i++)
A[i] +=M[T]*i;}A[i] repeated after some Time
14A Sahu
• Address is divided in to three part : TAG, Index, Offset– Offset = Address % Line Size,
– Index = (Address/LineSize)%NumSet– TAG = Address/(LineSize*NumSet)
• If TAG matches with ExistingTAG then HIT else• If TAG matches with ExistingTAG then HIT else miss
• Assume LS=10, NumSet=100, Address 2067432– Offset = 2, Index =43, TAG=2067
A Sahu 15
if (TAG==CACHE[Index].TAG)Cache HIT
else Cache MISS
Cache Example : Algorithmic
123
0212023
0
212011212011212012212012
3456789
Decimal Example, Direct mapped, Line size 10
3456789
indexTag Line
16A Sahu
Cache Example : Algorithmic
123
021202
2123
0
212011212011212012212012212335212335
3456789
Decimal Example, Direct mapped, Line size 10
2123456789
indexTag Line
17A Sahu
Cache Example : Algorithmic
123
021202
2123
0
212011212011212012212012212335212335414368414368
3456789
Decimal Example, Direct mapped, Line size 10
212345
4143789
indexTag Line
18A Sahu
CS521 CSE IITG 11/23/2012
A Sahu 4
Cache Example : Algorithmic
23
021202
2123
0
212011212012212335414365414318
3456789
Decimal Example, Direct mapped, Line size 10
212345
4143789
indexTag Line
19A Sahu
Cache Example : Algorithmic
123
041432
2123
0
212011212012212335414365414318
3456789
Decimal Example, Direct mapped, Line size 10
212345
4143789
indexTag Line
20A Sahu
Cache Example : Algorithmic
1
2
3
0
2120
2
3
0
212011212012
1
2
3
0
1
2
3
0
3
4
5
6
7
8
9
Decimal Example, Two way Asso, Line size 10
3
4
5
6
7
8
9
indexTag Line
3
4
5
6
7
8
9
3
4
5
6
7
8
9
indexTag Line
21A Sahu
Cache Example : Algorithmic
1
2
3
0
2120
2
2123
0
212011212012212335
1
2
3
0
1
2
3
0
3
4
5
6
7
8
9
2123
4
5
6
7
8
9
indexTag Line
3
4
5
6
7
8
9
3
4
5
6
7
8
9
indexTag Line
Decimal Example, Two way Asso, Line size 1022A Sahu
Cache Example : Algorithmic
1
2
3
0
2120
2
2123
0
212011212012212335414368
1
2
3
0
1
2
3
0
3
4
5
6
7
8
9
2123
4
5
4143
7
8
9
indexTag Line
3
4
5
6
7
8
9
3
4
5
6
7
8
9
indexTag Line
Decimal Example, Two way Asso, Line size 1023A Sahu
Cache Example : Algorithmic212011212012212335414365414318
1
2
3
0
4143
2
3
0
1
2
3
0
2120
2
2123
0
3
4
5
6
7
8
9
3
4
5
6
7
8
9
indexTag Line
3
4
5
6
7
8
9
2123
4
5
4143
7
8
9
indexTag Line
Decimal Example, Two way Asso, Line size 1024A Sahu
CS521 CSE IITG 11/23/2012
A Sahu 5
Cache Size• No of Set (Depend on index field)• Associatively (How many Tag)• Line size (No of Addressable units/byte in a line)
00000000 00000000
• Cache Size = Nset X Associativity X LineSize= 10 x 4 x 10 = 400 Byte
index Lineindex Line index Lineindex
12233445566778899
41432233445566778899
Tag
12233445566778899
212022
212321234455
41434143778899
Tag
12233445566778899
41432233445566778899
Tag
12233445566778899
212022
212321234455
41434143778899
Tag Line
25A Sahu
• Simple Hashing: Direct Map Cache– Example: Array – int A[10], each can store one element– Data stored in Address%10 location
• Array of List
Direct/Random Access to Element
T
USAB
– Int LA[10], each can store a list of element– Data stored in List of (Address%10)th location– List size is limited in Set Associative Cache
• List of Element– Full Associative Cache– All data stored in one list
A Sahu 26
Serial/Associative Access to Element
MIXED
IME
ILITY
Cache: Placement• Direct Mapped
– Only one tag matching, only index
• Set Associative– Both Tag and index matching
11223344556677
0021202
212345
41437
0
• Full Associative– Only Tag matchings, No index (CAM:Contents Add Mem)
index LineTag
index Line
112233
00002222
00
Tag
00112233
44550077
index LineTag
t0tag0
t1tag1
t2tag2
t3tag3
t4tag4
t5tag5
t6tag6
t7tag7
27A Sahu
Addressing Cache
Tag Set Index Displacement
Selects set
Compared to Tags
Selects AU
Early select: access data after tag matchingLate select: access data while tag matching
28A Sahu
Cache access mechanism
index v tag data01
Hit Data
Address31 0
Tag 20 10index
2byteoffset
=
1
...
...
1023 20 32
29A Sahu
Cache with 4 word blocks
index v tag data01
Hit Data
Address31 0
Tag 16 12index
2byte offset
2block offset
=
1
...
...
1023 16 32 3232 32
Mux
30A Sahu
CS521 CSE IITG 11/23/2012
A Sahu 6
0......
31 0
tag 20 8index 2 byte offset2block offset
v tag data v tag data v tag data v tag data
slide 31
...
255
Hit
DataMux
=
20 128
=
20 128
=
20 128
=
20 128
Mux Mux Mux Mux32 32 32 32
A Sahu
6007008009001000
Access time in μS Vs CacheSize
1‐Way
Access time vs. size and associativity
0100200300400500
16KB 32KB 64KB 128KB 256KB
2‐Way
4‐Way
8‐Way
32A Sahu
0.30.350.40.450.5
Energy/read in nano Joules
1‐Way
Energy per read vs. size and associativity
00.050.10.150.20.25
16KB 32KB 64KB 128KB 256KB
y
2‐Way
4‐Way
8‐Way
33A Sahu