Memory Hierarchy - ECS...

ìComputer Systems and NetworksECPE170– JeffShafer– UniversityofthePacific

MemoryHierarchy(PerformanceOptimization)

Lab Schedule

Activitiesì ThisWeek

ì Lab6– Perf Optimizationì Lab7– MemoryHierarchy

ì NextTuesdayì IntrotoPython

ì NextThursdayì **MidtermExam **

AssignmentsDueì Lab6

ì DuebyMar6th 5:00am

ì Lab7ì DuebyMar20th 5:00am

Spring2017ComputerSystemsandNetworks

Your Personal Repository

2017_spring_ecpe170\lab02lab03lab04lab05lab06lab07lab08lab09lab10lab11lab12.hg

HiddenFolder!(namestartswithperiod)

UsedbyMercurialtotrackallrepositoryhistory(files,changelogs,…)

Mercurial .hg Folder

ì Theexistenceofa.hg hiddenfolderiswhatturnsaregulardirectory(anditssubfolders)intoaspecialMercurialrepository

ì Whenyouadd/commitfiles,Mercuriallooksforthis.hg folderinthecurrentdirectoryoritsparents

ìMemory Hierarchy

Memory Hierarchy

FastPerformanceand LowCost

Goalassystemdesigners:

Tradeoff:Fastermemoryismoreexpensive thanslowermemory

Memory Hierarchy

ì Toprovidethebestperformanceatthelowestcost,memoryisorganizedinahierarchicalfashionì Small,fast storageelementsarekeptintheCPUì Larger,slowermainmemoryareoutsidetheCPU

(andaccessedbyadatabus)ì Largest,slowest,permanentstorage(disks,etc…)

isevenfurtherfromtheCPU

Todate,you’veonlycaredabouttwolevels:MainmemoryandDisks

ìMemory Hierarchy– Registers and Cache

Let’sexaminethefastestmemoryavailable

Memory Hierarchy – Registers

ì Storagelocationsavailableontheprocessoritself

ì Manuallymanagedbytheassemblyprogrammerorcompiler

ì You’llbecomeintimatelyfamiliarwithregisterswhenwedoassemblyprogramming

Memory Hierarchy – Caches

ì Whatisacache?ì Speedupmemoryaccessesbystoringrecentlyused

dataclosertotheCPUì Closer thanmainmemory– ontheCPUitself!ì Althoughcacheismuchsmallerthanmainmemory,

itsaccesstimeismuchfaster!ì Cacheisautomaticallymanagedbythehardware

memorysystemì Cleverprogrammerscanhelpthehardwareusethe

cachemoreeffectively

Memory Hierarchy – Caches

ì Howdoesthecachework?ì Notgoingtodiscusshowcachesworkinternally

ì Ifyouwanttolearnthat,takeECPE173!ì Thisclassisfocusedonwhatdoestheprogrammer

needtoknowabouttheunderlyingsystem

Memory Hierarchy – Access

ì CPUwishestoreaddata (neededforaninstruction)1. Doestheinstructionsayitisinaregisteror

memory?ì Ifregister,gogetit!

2. Ifinmemory,sendrequesttonearestmemory(thecache)

3. Ifnotincache,sendrequesttomainmemory4. Ifnotinmainmemory,sendrequesttothedisk

(Cache) Hits versus Misses

Hitì Whendataisfoundata

givenmemorylevel(e.g.acache)

Missì Whendataisnot foundata

givenmemorylevel(e.g.acache)

Youwanttowriteprogramsthatproducealotofhits,notmisses!

Memory Hierarchy – Cache

ì OncethedataislocatedanddeliveredtotheCPU,itwillalsobesavedintocachememoryforfutureaccessì Weoftensavemorethanjustthespecificbyte(s)

requestedì Typical:Neighboring64bytes

(calledthecachelinesize)

Cache Locality

Onceadataelementisaccessed,itislikelythatanearbydataelement(oreventhesameelement)willbeneededsoon

PrincipleofLocality

Cache Locality

ì Temporallocality– Recently-accesseddataelementstendtobeaccessedagainì Imaginealoopcounter…

ì Spatiallocality- Accessestendtoclusterinmemoryì Imaginescanningthroughallelementsinanarray,

orrunningseveralsequentialinstructionsinaprogram

Programswithgoodlocalityrunfasterthanprogramswithpoor

locality

Aprogramthatrandomlyaccessesmemoryaddresses(butneverrepeats)willgainnobenefit fromacache

Recap – Cache

ì Whichisbigger– acacheormainmemory?ì Mainmemory

ì Whichisfastertoaccess– thecacheormainmemory?ì Cache– Itissmaller (whichisfastertosearch)andcloser

totheprocessor(signalstakelesstimetopropagateto/fromthecache)

ì Whydoweaddacachebetweentheprocessorandmainmemory?ì Performance– hopefullyfrequently-accesseddatawillbe

inthefastercache(sowedon’thavetoaccessslowermainmemory)

Recap – Cache

ì Whichismanuallycontrolled– acacheoraregister?ì Registersaremanuallycontrolledbytheassembly

languageprogram(orthecompiler)ì Cacheisautomaticallycontrolledbyhardware

ì Supposeaprogramwishestoreadfromaparticularmemoryaddress.Whichissearchedfirst– thecacheormainmemory?ì Searchthecachefirst– otherwise,there’sno

performancegain

Recap – Cache

ì Supposethereisacachemiss(datanotfound)duringa1bytememoryreadoperation.Howmuchdataisloadedintothecache?ì Trickquestion– wealwaysloaddataintothecache

1“line”atatime.ì Cachelinesizevaries– 64bytesonaCorei7

processor

Cache Q&A

ì Imagineacomputersystemonlyhasmainmemory (nocachewaspresent).Istemporal orspatiallocalityimportantforperformancewhenrepeatedlyaccessinganarraywith8-byteelements?ì No.Localityisnotimportantinasystemwithout

caching,becauseeverymemoryaccesswilltakethesamelengthoftime.

Cache Q&A

ì Imagineamemorysystemhasmainmemoryanda1-levelcache,buteachcachelinesizeisonly8bytes insize.Assumethecacheismuchsmallerthanmainmemory.Istemporal orspatiallocalityimportantforperformanceherewhenrepeatedlyaccessinganarraywith8-byteelements?ì Only1arrayelementisloadedatatimeinthiscacheì Temporallocalityisimportant(accesswillbefasterifthe

sameelementisaccessedagain)ì Spatiallocalityisnot important(neighboringelements

arenotloadedintothecachewhenanearlierelementisaccessed)

Cache Q&A

ì Imagineamemorysystemhasmainmemoryanda1-levelcache,andthecachelinesizeis64bytes.Assumethecacheismuchsmallerthanmainmemory.Istemporal orspatiallocality importantforperformanceherewhenrepeatedlyaccessinganarraywith8-byteelements?ì 8elements(64B)areloadedintothecacheatatimeì Both formsoflocalityareusefulhere!

Cache Q&A

ì Imagineyourprogramaccessesa100,000elementarray(of8byteelements)oncefrombeginningtoendwithstride1.Thememorysystemhasa1-levelcachewithalinesizeof64bytes.Nopre-fetchingisimplemented.Howmanycachemisseswouldbeexpectedinthissystem?ì 12500 cachemisses.Thearrayhas100,000

elements.Uponacachemiss,8adjacentandalignedelements(oneofwhichisthemiss)ismovedintothecache.Futureaccessestothoseremainingelementsshouldhitinthecache.Thus,only1/8ofthe100,000elementaccessesresultinamiss

Cache Q&A

ì Imagineyourprogramaccessesa100,000elementarray(of8byteelements)oncefrombeginningtoendwithstride1.Thememorysystemhasa1-levelcachewithalinesizeof64bytes.Ahardwareprefetcher isimplemented.Inthebest-possiblecase,howmanycachemisseswouldbeexpectedinthissystem?ì 1cachemiss - Thisprogramhasatrivialaccesspattern

withstride1.Intheperfectworld,thehardwareprefetcher wouldbeginguessingfuturememoryaccessesaftertheinitialcachemissandloadingthemintothecache.Assumingtheprefetcher canstayaheadoftheprogram,thenallfuturememoryaccesseswiththetrivial+1patternshouldresultincachehits

Cache Example – Intel Core i7 980x

ì 6coreprocessorwithasophisticatedmulti-levelcachehierarchy

ì 3.5GHz,1.17billiontransistors

ì EachprocessorcorehasitsownaL1andL2cacheì 32kBLevel1(L1)datacacheì 32kBLevel1(L1)instructioncacheì 256kBLevel2(L2)cache(bothinstructionanddata)

ì Theentirechip(all6cores)share asingle12MBLevel3(L3)cache

ì Accesstime?(Measuredin3.5GHzclockcycles)ì 4cyclestoaccessL1cacheì 9-10cyclestoaccessL2cacheì 30-40cyclestoaccessL3cache

ì Smallercachesarefastertosearchì Andcanalsofitclosertotheprocessorcore

ì Largercachesareslowertosearchì Pluswehavetoplacethemfurtheraway

Caching is Ubiquitous!

Type WhatCached WhereCached ManagedBy

TLB AddressTranslation(Virtual->PhysicalMemoryAddress)

On-chipTLB Hardware MMU(MemoryManagementUnit)

Buffer cache Partsoffileson disk Mainmemory Operating Systems

Diskcache Disksectors Diskcontroller Controllerfirmware

Browsercache Webpages LocalDisk Web browser

Manytypesof“cache”incomputerscience,withdifferentmeanings

ìMemory Hierarchy – Virtual Memory

Virtual Memory

VirtualMemoryisaBIGLIE!ì Welie toyourapplicationand

tellitthatthesystemissimple:ì Physicalmemoryisinfinite!

(oratleasthuge)ì Youcanaccessall ofphysical

memoryì Yourprogramstartsat

memoryaddresszeroì Yourmemoryaddressis

contiguous andin-orderì YourmemoryisonlyRAM

(mainmemory)

WhattheSystemReallyDoes

Why use Virtual Memory?

ì Wewanttorunmultipleprogramsonthecomputerconcurrently(multitasking)ì Eachprogramneedsitsownseparatememoryregion,so

physicalresourcesmustbedividedì Theamountofmemoryeachprogramtakescouldvary

dynamicallyovertime(andtheusercouldrunadifferentmixofappsatonce)

ì Wewanttousemultipletypesofstorage(mainmemory,disk)toincreaseperformanceandcapacity

ì Wedon’twanttheprogrammertoworryaboutthisì Maketheprocessorarchitecthandlethesedetails

Pages and Virtual Memory

ì Mainmemoryisdividedintopagesforvirtualmemoryì Pagessize=4kBì Dataismovedbetweenmainmemoryanddiskata

pagegranularityì i.e.likethecache,wedon’tmovesinglebytesaround,

butratherbiggroupsofbytes

Pages and Virtual Memory

ì Mainmemoryandvirtualmemoryaredividedintoequalsizedpages

ì Theentireaddressspacerequiredbyaprocessneednotbeinmemoryatonceì Somepagescanbeondisk

ì Pushtheunneededpartsouttoslowdiskì Otherpagescanbeinmainmemory

ì Keepthefrequentlyaccessedpagesinfastermainmemory

ì Thepagesallocatedtoaprocessdonotneedtobestoredcontiguously-- eitherondiskorinmemory

Virtual Memory Terms

ì Physicaladdress– theactualmemoryaddressintherealmainmemory

ì Virtualaddress– thememoryaddressthatisseeninyourprogramì Specialhardware/softwaretranslatesvirtualaddressesinto

physicaladdresses!

ì Pagefaults – aprogramaccessesavirtualaddressthatisnotcurrentlyresidentinmainmemory(ataphysicaladdress)ì Thedatamustbeloadedfromdisk!

ì Pagefile – Thefileondiskthatholdsmemorypagesì Usuallytwicethesizeofmainmemory

Cache Memory vs Virtual Memory

ì Goalofcachememoryì Fastermemoryaccessspeed(performance)

ì Goalofvirtualmemoryì Increasememorycapacity withoutactuallyadding

moremainmemoryì Dataiswrittentodiskì Ifdonecarefully,thiscanimprove performanceì Ifoverused,performancesuffers greatly!

ì Increasesystemflexibilitywhenrunningmultipleuserprograms(aspreviouslydiscussed)

ìMemory Hierarchy – Magnetic Disks

Magnetic Disk Technology

ì Harddiskplattersaremountedonspindles

ì Read/writeheadsaremountedonacombthatswingsradiallytoreadthediskì Allheadsmove

together!

Magnetic Disk Technology

ì Thereareanumberofelectromechanicalpropertiesofharddiskdrivesthatdeterminehowfastitsdatacanbeaccessed

ì Seektime– timethatittakesforadiskarmtomoveintopositionoverthedesiredcylinder

ì Rotationaldelay– timethatittakesforthedesiredsectortomoveintopositionbeneaththeread/writehead

ì Seektime+rotationaldelay= accesstime

How Big Will Hard Drives Get?

ì Advancesintechnologyhavedefiedalleffortstodefinetheultimateupperlimitformagneticdiskstorageì Inthe1970s,theupperlimitwasthoughttobearound

2Mb/in2

ì Asdatadensitiesincrease,bitcellsconsistofproportionatelyfewermagneticgrainsì Thereisapointatwhichtherearetoofewgrainstohold

avalue,anda1mightspontaneouslychangetoa0,orviceversa

ì Thispointiscalledthesuperparamagnetic limit

How Big Will Hard Drives Get?

ì Whenwillthelimitbereached?

ì In2006,thelimitwasthoughttoliebetween150Gb/in2and200Gb/in2(with longitudinalrecordingtechnology)

ì 2010:Commercialdriveshavedensitiesupto667Gb/in2

ì 2012:Seagatedemosdrivewith1Tbit/in²densityì Withheat-assistedmagneticrecording – theyusealaser

toheatbitsbeforewritingì Eachbitis~12.7nminlength(adozenatoms)

ìMemory Hierarchy – SSDs

Emergence of Solid State Disks (SSD)

ì Harddriveadvantages?ì Lowcostperbits

ì Harddrivedisadvantages?ì Veryslowcomparedtomainmemoryì Fragile(everdroppedone?)ì Movingpartswearout

ì Reductionsinflashmemorycosthascreatedanotherpossibility:solidstatedrives (SSDs)ì SSDsappearlikeharddrivestothecomputer,buttheystore

datainnon-volatileflashmemorycircuitsì Flashisquirky! Physicallimitationsposeengineering

challenges…

Flash Memory

ì TypicalflashchipsarebuiltfromdensearraysofNANDgates

ì Differentfromharddrives– wecan’t read/writeasinglebit(orbyte)ì Readingorwriting? Datamustbereadfromanentireflash

page (2kB-8kB)ì Readingmuchfasterthanwritingapageì Ittakessometimebeforethecellchargereachesastablestate

ì Erasing? Anentireerasureblock(32-128pages)mustbeerased(settoall1’s)firstbeforeindividualbitscanbewritten(setto0)ì Erasingtakestwoordersofmagnitudemoretimethanreading

Flash-based Solid State Drives (SSDs)

Advantagesì Sameblock-addressableI/O

interfaceasharddrives

ì Nomechanicallatencyì Accesslatencyisindependent

oftheaccesspatternì Comparethistoharddrives

ì Energyefficient(nodisktospin)

ì Resistanttoextremeshock,vibration,temperature,altitude

ì Near-instantstart-uptime

Challengesì Limitedenduranceandthe

needforwearleveling

ì Veryslowtoeraseblocks(neededbeforereprogramming)ì Erase-before-write

ì Read/writeasymmetryì Readsarefasterthan

writes

Flash Translation Layer

ì FlashTranslationLayer(FTL)ì Necessaryforflashreliability

andperformanceì “Virtual”addressesseenbythe

OSandcomputerì “Physical”addressesusedby

theflashmemory

ì Performwritesout-of-placeì Amortizeblockerasuresover

manywriteoperations

ì Wear-levelingì Writingthesame“virtual”

addressrepeatedlywon’twritetothesamephysicalflashlocationrepeatedly!

“Virtual”addresses

“Physical”addresses

devicelevel

flashchiplevelFlashTranslationLayer

logicalpage

flashpage flashblock sparecapacity

Memory Hierarchy - ECS...

Documents

Transcript of Memory Hierarchy - ECS...

ECPE 170 –Jeff Shafer –University of the Pacific IntroductionECPE 170 –Jeff Shafer –University of the Pacific Introduction. A Modern Computer –iPhone XS Computer Systems

ECPE Speaking

Performance! Measurement - University of the Pacific€¦ · ComputerSystems)and)Networks) ECPE!170!–Jeﬀ!Shafer!–University!of!the!Paciﬁc! Performance! Measurement

MARIE!Simulator!ecs-network.serv.pacific.edu/.../2011-fall-ecpe-170/slides/11mariesimulator2.pdfComputerSystems)and)Networks) ECPE!170!–Jeﬀ!Shafer!–University!of!the!Paciﬁc!

ECPE!170!–Jeff!Shafer!–University!of!the!Pacific! $$$ … · ComputerSystems)and)Networks) ECPE!170!–Jeff!Shafer!–University!of!the!Pacific! Cache!Memory! $$$ $$$ $$$

State!Machines!! & Karnaugh!Maps! - ecs-network.serv ... · ComputerSystems)and)Networks) ECPE!170!–Jeﬀ!Shafer!–University!of!the!Paciﬁc! State!Machines!! & Karnaugh!Maps!

ECPE 170 –Instructor: Vivek Pallipuram–University of the ...

MIPS!Assembly! - University of the Pacificecs-network.serv.pacific.edu/...ecpe-170/slides/12mipsassembly.pdf · ComputerSystems)and)Networks) ECPE!170!–Jeﬀ!Shafer!–University!of!the!Paciﬁc!

ECPE/COMP177$ Fall2014$ - University of the Pacificecs-network.serv.pacific.edu/past-courses/2014-fall-ecpe...ECPE$170$–$ComputerSystems$and$Networks$! Linux$/commandlineusage! C$programming$

ComputerSystemsand) Networksecs-network.serv.pacific.edu/ecpe-170/syllabus/Lecture1... · BoomUpPicture Memory cel l Transistor Level$0:$Electronics$ and$Circuits$(ECPE$41,$ 131)$

ComputerSystemsand) Networks - ECS Networkingecs-network.serv.pacific.edu/ecpe-170/Lecture3_170_VCS.pdfComputerSystemsand) Networks LECTURE3:VERSIONCONTROL SYSTEMS JANUARY)25TH2018

Ecpe Honors Companion

ECPE BOOK 2

ECPE 170 –Instructor Dr. Pallipuram–University of the ...

ComputerSystemsand) Networks - University of the Pacificecs-network.serv.pacific.edu/ecpe-170/Lecture1Introduction.pdf · BoomUpPicture Memory cel l Transistor Level$0:$Electronics$

ECpE Sp 04

ECPE Honors

MIPS Assembly - University of the Pacificecs-network.serv.pacific.edu/ecpe-170/slides/14mipsassembly.pdfì Computer Systems and Networks ECPE 170 –Jeff Shafer –University of the

2004_practice test ecpe

Basic BASH Scriptingecs-network.serv.pacific.edu/ecpe-170/11-bash.pdfdeclare -a arrayname=(element1 element2 element3); declare -a Unix=('Debian' 'Red hat' 'Suse' 'Fedora'); Length