Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB...

6
CS521 CSE IITG 11/23/2012 A Sahu 1 A Sahu 1 Memory Hierarchy Hit/Miss, IPC Cache: Set, Line size, Associativity A Sahu 2 No Class on Friday (07SEP2012) Typical specs of a computer today http://www.flipkart.com Dell XPS 14 14.0” WLED (1366 x 768), Intel Core i7740QM 4 GB DDR3 1333 MHz, 500 GB, Windows 7 Home Premium, 8x CD/DVD burner (dual layer DVD+/R drive), NVIDIA GeForce GT 425M 2 USB 2.0, HDMI, eSATA, 6cell lithiumion Builtin 2.0megapixel HD Price: 48,300 HP TouchSmart TM2 Series TM22102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium Price : Rs. 47,199.00 3 A Sahu Memory technologies Semiconductor Registers SRAM Random Access DRAM FLASH M ti Array : x=A[i] Magnetic FDD HDD Optical Random (Seq. Seek to Sector) + sequential CD DVD Array + Linked List 4 A Sahu Memory Hierarchy Smaller is Faster Bigger is Slower Places of Cash/Money BANK Home Locker Purse Pocket HDD DDR RAM Cache Regs Speed Storage 5 A Sahu CPU Design with Memory Hierachy Data Memory ALU PC Instruction Memory Address Instruction Register FILE Address Dt Data Reg# Reg# Reg# Data CPU PC Register FILE ALU Instruction Memory Data Memory CPU PC Register FILE ALU Unified L2 IL1 DL1 Mem ory 6 A Sahu

Transcript of Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB...

Page 1: Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium

CS521 CSE IITG 11/23/2012

A Sahu 1

A Sahu 1

• Memory Hierarchy

• Hit/Miss, IPC

• Cache: Set, Line size, Associativity

A Sahu 2

No Class on Friday (07‐SEP‐2012)

Typical specs of a computer todayhttp://www.flipkart.com

Dell XPS 14

14.0” WLED (1366 x 768), Intel Core i7‐740QM 

4 GB DDR3 1333 MHz, 500 GB, Windows 7 Home Premium, 8x CD/DVD burner (dual layer DVD+/‐R drive), NVIDIA GeForceGT 425M

2 USB 2.0, HDMI, eSATA, 6‐cell lithium‐ion

Built‐in 2.0‐megapixel HD Price: 48,300

HP TouchSmart TM2 Series TM2‐2102TU

(Modern Argento),  * Intel Core i3,  * 3 GB DDR3 RAM, 

* 12.1 Inch Screen,     * Windows 7 Home Premium Price : Rs. 47,199.00

3A Sahu

Memory technologies

• Semiconductor– Registers– SRAM Random Access– DRAM– FLASHM ti

Array :    x=A[i]• Magnetic

– FDD– HDD

• Optical Random (Seq. Seek to Sector) + sequential

– CD– DVD Array + Linked List

4A Sahu

Memory Hierarchy

• Smaller is Faster ‐ Bigger is Slower

• Places of Cash/Money 

BANK

Home Locker

Purse

Pocket

HDD

DDR RAM

Cache

Regs

Speed Storage5A Sahu

CPU Design with Memory Hierachy

Data Memory 

ALU

PC

InstructionMemory

Address

Instruction Register FILE

Address

D t

Data

Reg#

Reg#

Reg#

Data

CPU

PC

Register FILE

ALU

InstructionMemory

DataMemory

CPU

PC

Register FILE

ALU

Unified L2

IL1 DL1

Memory

6A Sahu

Page 2: Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium

CS521 CSE IITG 11/23/2012

A Sahu 2

• Programmers want unlimited amounts of memory with low latency

• Fast memory technology is more expensive per bit than slower memory

• Solution:  organize memory system into a hierarchy– Entire addressable memory space available in largest, y p gslowest memory

– Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor

• Temporal and spatial locality insures that nearly all references can be found in smaller memories– Gives the allusion of a large, fast memory being presented to the processor

7A Sahu

Hierarchical structure

Memory 

CPU

Smallest HighestFastest

Speed Size Cost/Bit

Memory

Memory 

Biggest LowestSlowest

Memory

8A Sahu

9A Sahu

• Memory hierarchy design becomes more crucial with recent multi‐core processors:– Aggregate peak bandwidth grows with # cores:

• Intel Core i7 can generate two references per core per clock

• Four cores and 3.2 GHz clockb ll b d f / d– 25.6 billion 64‐bit data references/second +

– 12.8 billion 128‐bit instruction references– = 409.6 GB/s!

• DRAM bandwidth is only 6% of this (25 GB/s)• Requires:

– Multi‐port, pipelined caches– Two levels of cache per core– Shared third‐level cache on chip

10A Sahu

Performance and Power

• High‐end microprocessors have >10 MB on‐chip cache

– Consumes large amount of area and power budget

Introduction

11A Sahu

1000

10,000

100,000

A Sahu 12

1980 1985          1990            1995            2000          2005            2010Year

ProcessorProcessor‐MemoryPerformance e Gap

1

10

100

1000

Performance

Memory

Page 3: Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium

CS521 CSE IITG 11/23/2012

A Sahu 3

Data transfer between levels

Processoraccess

hit

miss

Processor

Cache

Data are transferred

unit of transfer = block

Memory

13A Sahu

Principle of locality• Temporal Locality

– references repeated in time

• Spatial Locality– references repeated in space– references repeated in space

– Special case: Sequential Locality

for(i=0;i<100;i++){A[i] += sqrt(i);

}    // 1D SPLocalityAccess A[i],  near future will Access A[i+1], A[i+2]..

for(T=0;T<80;T++){for(i=0;i<10;i++)

A[i] +=M[T]*i;}A[i] repeated after some Time

14A Sahu

• Address is divided in to three part :  TAG, Index, Offset– Offset = Address % Line Size, 

– Index  = (Address/LineSize)%NumSet– TAG    = Address/(LineSize*NumSet)

• If TAG matches with ExistingTAG then HIT else• If TAG matches with ExistingTAG then HIT else miss   

• Assume LS=10, NumSet=100, Address 2067432– Offset = 2, Index =43, TAG=2067 

A Sahu 15

if (TAG==CACHE[Index].TAG)Cache HIT

else  Cache MISS

Cache Example : Algorithmic

123

0212023

0

212011212011212012212012

3456789

Decimal Example, Direct mapped, Line size 10

3456789

indexTag Line

16A Sahu

Cache Example : Algorithmic

123

021202

2123

0

212011212011212012212012212335212335

3456789

Decimal Example, Direct mapped, Line size 10

2123456789

indexTag Line

17A Sahu

Cache Example : Algorithmic

123

021202

2123

0

212011212011212012212012212335212335414368414368

3456789

Decimal Example, Direct mapped, Line size 10

212345

4143789

indexTag Line

18A Sahu

Page 4: Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium

CS521 CSE IITG 11/23/2012

A Sahu 4

Cache Example : Algorithmic

23

021202

2123

0

212011212012212335414365414318

3456789

Decimal Example, Direct mapped, Line size 10

212345

4143789

indexTag Line

19A Sahu

Cache Example : Algorithmic

123

041432

2123

0

212011212012212335414365414318

3456789

Decimal Example, Direct mapped, Line size 10

212345

4143789

indexTag Line

20A Sahu

Cache Example : Algorithmic

1

2

3

0

2120

2

3

0

212011212012

1

2

3

0

1

2

3

0

3

4

5

6

7

8

9

Decimal Example, Two way Asso, Line size 10

3

4

5

6

7

8

9

indexTag Line

3

4

5

6

7

8

9

3

4

5

6

7

8

9

indexTag Line

21A Sahu

Cache Example : Algorithmic

1

2

3

0

2120

2

2123

0

212011212012212335

1

2

3

0

1

2

3

0

3

4

5

6

7

8

9

2123

4

5

6

7

8

9

indexTag Line

3

4

5

6

7

8

9

3

4

5

6

7

8

9

indexTag Line

Decimal Example, Two way Asso, Line size 1022A Sahu

Cache Example : Algorithmic

1

2

3

0

2120

2

2123

0

212011212012212335414368

1

2

3

0

1

2

3

0

3

4

5

6

7

8

9

2123

4

5

4143

7

8

9

indexTag Line

3

4

5

6

7

8

9

3

4

5

6

7

8

9

indexTag Line

Decimal Example, Two way Asso, Line size 1023A Sahu

Cache Example : Algorithmic212011212012212335414365414318

1

2

3

0

4143

2

3

0

1

2

3

0

2120

2

2123

0

3

4

5

6

7

8

9

3

4

5

6

7

8

9

indexTag Line

3

4

5

6

7

8

9

2123

4

5

4143

7

8

9

indexTag Line

Decimal Example, Two way Asso, Line size 1024A Sahu

Page 5: Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium

CS521 CSE IITG 11/23/2012

A Sahu 5

Cache Size• No of Set      (Depend on index field)• Associatively (How many Tag)• Line size    (No of Addressable units/byte in a line)

00000000 00000000

• Cache Size = Nset X Associativity X LineSize= 10    x 4 x 10 = 400 Byte

index Lineindex Line index Lineindex

12233445566778899

41432233445566778899

Tag

12233445566778899

212022

212321234455

41434143778899

Tag

12233445566778899

41432233445566778899

Tag

12233445566778899

212022

212321234455

41434143778899

Tag Line

25A Sahu

• Simple Hashing: Direct Map Cache– Example: Array  – int A[10], each can store one element– Data stored in Address%10 location

• Array of List

Direct/Random Access to Element

T

USAB

– Int LA[10], each can store a list of element– Data stored in List of (Address%10)th location– List size is limited in Set Associative Cache

• List of Element– Full Associative Cache– All data stored in one list

A Sahu 26

Serial/Associative Access to Element

MIXED

IME

ILITY

Cache: Placement• Direct Mapped

– Only one tag matching, only index

• Set Associative– Both Tag and index matching

11223344556677

0021202

212345

41437

0

• Full Associative– Only Tag matchings, No index (CAM:Contents Add Mem)

index LineTag

index Line

112233

00002222

00

Tag

00112233

44550077

index LineTag

t0tag0

t1tag1

t2tag2

t3tag3

t4tag4

t5tag5

t6tag6

t7tag7

27A Sahu

Addressing Cache

Tag Set Index Displacement

Selects set

Compared to Tags

Selects AU

Early select: access data after tag matchingLate select: access data while tag matching

28A Sahu

Cache access mechanism

index  v  tag       data01

Hit Data

Address31                       0

Tag 20 10index

2byteoffset

=

1

...

...

1023 20 32

29A Sahu

Cache with 4 word blocks

index  v  tag       data01

Hit Data

Address31                       0

Tag 16 12index

2byte offset

2block offset

=

1

...

...

1023 16 32 3232 32

Mux

30A Sahu

Page 6: Memory technologiesHP TouchSmart TM2 Series TM2‐2102TU (Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, * 12.1 Inch Screen, * Windows 7 Home Premium

CS521 CSE IITG 11/23/2012

A Sahu 6

0......

31 0

tag 20 8index 2 byte offset2block offset

v tag data v tag data v tag data v tag data

slide 31

...

255

Hit

DataMux

=

20 128

=

20 128

=

20 128

=

20 128

Mux Mux Mux Mux32 32 32 32

A Sahu

6007008009001000

Access time in μS  Vs CacheSize 

1‐Way

Access time vs. size and associativity

0100200300400500

16KB 32KB 64KB 128KB 256KB

2‐Way

4‐Way

8‐Way

32A Sahu

0.30.350.40.450.5

Energy/read in nano Joules

1‐Way

Energy per read vs. size and associativity

00.050.10.150.20.25

16KB 32KB 64KB 128KB 256KB

y

2‐Way

4‐Way

8‐Way

33A Sahu