Evolution Beam Parameters During Injection and Storage of ...
Storage Attachment Evolution
Transcript of Storage Attachment Evolution
11/27/17
1
CS61C:GreatIdeasinComputerArchitecture
Lecture25:DependabilityandRAID
Krste Asanović &RandyH.Katzhttp://inst.eecs.berkeley.edu/~cs61c/fa17
11/27/17 Fall2017 – Lecture#25 1
StorageAttachmentEvolution
Host
OS
Disk Interface (DI)
AllocationTable
Disk, Cylinder,Track,
Sector
LAN
Host
Host
HostNetwork
FileServer
OS
NetworkInterface
(NI)
NetworkInterface
(NI)
NetworkInterface
(NI)
File Name, Offset, LengthDirectAttachment
NetworkServerAttachment
11/27/17 Fall2017 -- Lecture#25 2
StorageAttachmentEvolution
DiskStorage
Subsystem
WorkStation
MainFrame
MainFrame
ChannelInterface
OSOSLUN,
Offset,Length
LUNToPHY
LAN
Host
Host
Host
NetworkFile
Server
OS
NetworkInterface
(NI)
NetworkInterface
(NI)
NetworkInterface (NI)
Network-attachedStorage (NAS)
OSOS
File Name, Offset, Length
Disk, Cylinder,Track, Sector
11/27/17 Fall2017 -- Lecture#25 3
NetworkAttached
ChannelAttached
OpticalDisk
StorageSubsystem
SAN
MainFrame
DiskStorage
Subsystem
TapeStorage
Subsystem
ChannelInterface
LAN
Host
Host
Host
NetworkInterface
(NI)
NetworkInterface
(NI)
NetworkInterface
(NI)
File Name, Offset, Length
FileServer
FileServer
FileServer
CI
LUN,Offset,Length
Gateway
WAN
Gateway
LAN SAN
MainFrame
FS DSS
Remote SAN
CI
CI
CI
LUN,Offset, Length
PHY Device,Cyl, Trk, Sector
StorageAttachmentEvolution
11/27/17 Fall2017 -- Lecture#25 4
StorageAreaNetworks(SAN)
StorageClassMemoryakaRack-ScaleMemory
11/27/17 Fall2017 -- Lecture#25 5
StorageClassMemoryakaRack-ScaleMemory
11/27/17 Fall2017 -- Lecture#25 6
CheaperthanDRAMMoreexpensivethandiskNon-Volatileandfasterthandisk
11/27/17
2
RemoteDirectMemoryAccess
711/27/17 Fall2017 -- Lecture#25
RemoteDirectMemoryAccess
8
ConventionalNetworking
Cut-throughMemoryaccessOvernetwork
Outline
• DependabilityviaRedundancy• ErrorCorrection/Detection• RAID• And,inConclusion…
11/27/17 Fall2017 – Lecture#25 9
Outline
• DependabilityviaRedundancy• ErrorCorrection/Detection• RAID• And,inConclusion…
11/27/17 Fall2017 – Lecture#25 10
SixGreatIdeasinComputerArchitecture
1. DesignforMoore’sLaw(Multicore,Parallelism,OpenMP,Project#3)2. AbstractiontoSimplifyDesign(Everythinganumber,Machine/Assembler
Language,C,Project#1;LogicGates,Datapaths,Project#2)3. MaketheCommonCaseFast(RISCArchitecture,InstructionPipelining,
Project#2)4. MemoryHierarchy(Locality,Consistency,FalseSharing,Project#3)5. PerformanceviaParallelism/Pipelining/Prediction(thefivekindsof
parallelism,Project#3,#4)6. DependabilityviaRedundancy(ECC,RAID)
11/27/17 Fall2017 – Lecture#25 11
GreatIdea#6:DependabilityviaRedundancy
• Redundancysothatafailingpiecedoesn’tmakethewholesystemfail
1+1=2 1+1=2 1+1=1
1+1=22of3agree
FAIL!
Increasingtransistordensityreducesthecostofredundancy11/27/17 Fall2017 – Lecture#25 12
11/27/17
3
GreatIdea#6:DependabilityviaRedundancy
• Appliestoeverythingfromdatacenterstomemory– RedundantdatacenterssothatcanloseonedatacenterbutInternetservicestaysonline
– RedundantroutessocanlosenodesbutInternetdoesn’tfail– Redundantdiskssothatcanloseonediskbutnotlosedata(RedundantArraysofIndependentDisks/RAID)
– Redundantmemorybitsofsothatcanlose1bitbutnodata(ErrorCorrectingCode/ECCMemory)
11/27/17 Fall2017 – Lecture#25 13
Dependability
• Fault:failureofacomponent– Mayormaynotleadtosystemfailure
ServiceaccomplishmentServicedelivered
asspecified
ServiceinterruptionDeviationfromspecifiedservice
FailureRestoration
11/27/17 Fall2017 – Lecture#25 14
DependabilityviaRedundancy:Timevs.Space
• SpatialRedundancy– replicateddataorcheckinformationorhardwaretohandlehardandsoft(transient)failures
• TemporalRedundancy– redundancyintime(retry)tohandlesoft(transient)failures
11/27/17 Fall2017 – Lecture#25 15
DependabilityMeasures
• Reliability:MeanTimeToFailure(MTTF)• Serviceinterruption:MeanTimeToRepair(MTTR)• Meantimebetweenfailures(MTBF)
– MTBF=MTTF+MTTR
• Availability=MTTF/(MTTF+MTTR)• ImprovingAvailability
– IncreaseMTTF:Morereliablehardware/software+FaultTolerance– ReduceMTTR:improvedtoolsandprocessesfordiagnosisandrepair
11/27/17 Fall2017 – Lecture#25 16
UnderstandingMTTF
ProbabilityofFailure
1
Time11/27/17 Fall2017 – Lecture#25 17
AvailabilityMeasures
• Availability=MTTF/(MTTF+MTTR)as%– MTTF,MTBFusuallymeasuredinhours
• Sincehoperarelydown,shorthandis“numberof9sofavailabilityperyear”
• 1nine:90%=>36daysofrepair/year• 2nines:99%=>3.6daysofrepair/year• 3nines:99.9%=>526minutesofrepair/year• 4nines:99.99%=>53minutesofrepair/year• 5nines:99.999%=>5minutesofrepair/year
11/27/17 Fall2017 – Lecture#25 18
11/27/17
4
ReliabilityMeasures
• Anotherisaveragenumberoffailuresperyear:AnnualizedFailureRate(AFR)– E.g.,1000diskswith100,000hourMTTF– 365days*24hours=8760hours– (1000disks*8760hrs/year)/100,000=87.6faileddisksperyearonaverage
– 87.6/1000=8.76%annualfailurerate• Google’s2007study*foundthatactualAFRsforindividualdrivesrangedfrom1.7%forfirstyeardrivestoover8.6%forthree-yearolddrives
*research.google.com/archive/disk_failures.pdf11/27/17 Fall2017 – Lecture#25 19
BreakingNews,1Q17,BackBlazehttps://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/
11/27/17 Fall2017 – Lecture#25 20
DependabilityDesignPrinciple
• DesignPrinciple:Nosinglepointsoffailure– “Chainisonlyasstrongasitsweakestlink”
• DependabilityCorollaryofAmdahl’sLaw– Doesn’tmatterhowdependableyoumakeoneportionofsystem– Dependabilitylimitedbypartyoudonotimprove
11/27/17 Fall2017 – Lecture#25 21
Outline
• DependabilityviaRedundancy• ErrorCorrection/Detection• RAID• And,inConclusion…
11/27/17 Fall2017 – Lecture#25 22
Error Detection/CorrectionCodes• Memorysystemsgenerateerrors(accidentallyflipped-bits)– DRAMs storeverylittlechargeperbit– “Soft”errorsoccuroccasionallywhencellsarestruckbyalphaparticlesorotherenvironmentalupsets
– “Hard”errorscanoccurwhenchipspermanentlyfail– Problemgetsworseasmemoriesgetdenserandlarger
• MemoriesprotectedagainstfailureswithEDC/ECC• Extrabitsareaddedtoeachdata-word– Usedtodetectand/orcorrectfaultsinthememorysystem– Eachdatawordvalue mappedto uniquecodeword– Afaultchangesvalidcodewordto invalidone,whichcanbedetected
11/27/17 Fall2017 – Lecture#25 23
BlockCodePrinciples• Hammingdistance=differencein#ofbits• p =011011,q =001111,Ham.distance(p,q)=2• p=011011,q =110001,distance(p,q)=?
• Canthinkofextrabitsascreatingacodewiththedata
• Whatifminimumdistancebetweenmembersofcodeis2andgeta1-biterror? RichardHamming,1915-98
TuringAwardWinner11/27/17 Fall2017 – Lecture#25 24
11/27/17
5
Parity:SimpleError-DetectionCoding• Eachdatavalue,beforeitis
writtentomemoryis“tagged”withanextrabittoforcethestoredwordtohaveevenparity:
• Eachword,asitisreadfrommemoryis“checked”byfindingitsparity(includingtheparitybit).
b7b6b5b4b3b2b1b0
+
b7b6b5b4b3b2b1b0p
+c• MinimumHammingdistanceofparitycodeis2
• Anon-zeroparitycheckindicatesanerroroccurred:– 2errors(ondifferentbits)arenotdetected– Noranyevennumberoferrors,justoddnumbersoferrorsaredetected
p
11/27/17 Fall2017 – Lecture#25 25
ParityExample
• Data01010101• 4ones,evenparitynow• Writetomemory:010101010tokeepparityeven
• Data01010111• 5ones,oddparitynow• Writetomemory:010101111tomakeparityeven
• Readfrommemory010101010
• 4ones=>evenparity,sonoerror• Readfrommemory110101010
• 5ones=>oddparity,soerror
• Whatiferrorinparitybit?
11/27/17 Fall2017 – Lecture#25 26
SupposeWanttoCorrectOneError?
• HammingcameupwithsimpletounderstandmappingtoallowErrorCorrectionatminimumdistanceofthree– Singleerrorcorrection,doubleerrordetection
• Called“HammingECC”–Workedweekendsonrelaycomputerwithunreliablecardreader,frustratedwithmanualrestarting
– Gotinterestedinerrorcorrection;published1950– R.W.Hamming,“ErrorDetectingandCorrectingCodes,”TheBellSystemTechnicalJournal,Vol.XXVI,No2(April1950)pp 147-160.
11/27/17 Fall2017 – Lecture#25 27
Detecting/CorrectingCodeConcept
• Detection:bitpatternfailscodewordcheck• Correction:maptonearestvalidcodeword
11/27/17 Fall2017– Lecture#25 28
Spaceofpossiblebitpatterns(2N)
Sparsepopulationofcodewords(2M <<2N)- withidentifiablesignature
Errorchangesbitpatterntonon-code
HammingDistance:EightCodeWords
11/27/17 Fall2017 – Lecture#25 29
HammingDistance2:DetectionDetectSingleBitErrors
• No1biterrorgoestoanothervalidcodeword• ½codewords arevalid
InvalidCodewords
11/27/17 Fall2017 – Lecture#25 30
11/27/17
6
HammingDistance3:CorrectionCorrectSingleBitErrors,DetectDoubleBitErrors
• No2biterrorgoestoanothervalidcodeword;1biterrornear• 1/4codewords arevalid
Nearest000
(one1)
Nearest111(one0)
11/27/17 Fall2017 – Lecture#25 31 11/27/17 Fall2017 -- Lecture#25 32
Administrivia (1/2)
• Finalexam:thelastThursdayexaminationslot!– 14December,7-10PM,RoomTBD– Contactusaboutconflicts– ReviewLecturesandBookwitheyeontheimportantconceptsofthecourse,e.g.,theGreatIdeasinComputerArchitectureandtheDifferentKindsofParallelism
• ReviewSessionFriDec8,5-8PM@TBA• ElectronicCourseEvaluationsthisweek!Seehttps://course-evaluations.berkeley.edu
11/27/17 Fall2017 -- Lecture#25 33
Administrivia (2/2)
• Project3ContestresultstobeannouncedduringThursday’slecture
• Lab11(Spark)isdueanydaythisweek• Lab13(VM)isdueanydaynextweek• VMGuerrillaSessiontonight!– 7-9pm@Cory293(unlessbiggerroomfound)– LastGuerrillaSessionisnextTuesday,sametimeandplace
• Willgooverthemostdifficulttopicsthissemester
• Project4Partytomorrownight7-9pm@Cory29311/27/17 Fall2017 -- Lecture#25 34
GraphicofHammingCode
• http://en.wikipedia.org/wiki/Hamming_code11/27/17 Fall2017 -- Lecture#25 35
HammingECCSetparitybitstocreateevenparity foreachgroup• Abyteofdata:10011010• Createthe codedword,leavingspacesfortheparitybits:
• __1_001_1010123456789abc– bitposition
• Calculatetheparitybits11/27/17 Fall2017 -- Lecture#25 36
11/27/17
7
HammingECC• Position1checksbits1,3,5,7,9,11:? _1 _0 01 _1 01 0.setposition1toa _:
• Position2checksbits2,3,6,7,10,11:0?1_001 _101 0.setposition2toa _:
• Position4checksbits4,5,6,7,12:011?001 _1010.setposition4toa _:
• Position8checksbits8,9,10,11,12:0111001?1010.setposition8toa_:
11/27/17 Fall2017 -- Lecture#25 37
HammingECC• Finalcodeword:011100101010• Dataword: 10011010
11/27/17 Fall2017 -- Lecture#25 38
HammingECCErrorCheck
• Supposereceive011100101110
0 1 1 1 0 0 1 0 1 1 1 0
11/27/17 Fall2017 -- Lecture#25 39
HammingECCErrorCheck
• Supposereceive011100101110
11/27/17 Fall2017 – Lecture#25 40
HammingECCErrorCheck
• Supposereceive0111001011100 1 0 1 1 1 √11 01 11 X-Parity2inerror
1001 0 √01110 X-Parity8inerror
• Impliesposition8+2=10isinerror011100101110
11/27/17 Fall2017 – Lecture#25 41
HammingECCErrorCorrect
• Fliptheincorrectbit…011100101010
11/27/17 Fall2017 – Lecture#25 42
11/27/17
8
HammingECCErrorCorrect
• Supposereceive0111001010100 1 0 1 1 1 √11 01 01 √
1001 0 √01010 √
11/27/17 Fall2017 – Lecture#25 43
WhatifMoreThan2-BitErrors?
• Networktransmissions,disks,distributedstoragecommonfailuremodeisburstsofbiterrors,notjustoneortwobiterrors– ContiguoussequenceofB bitsinwhichfirst,lastandanynumberofintermediatebitsareinerror
– Causedbyimpulsenoiseorbyfadinginwireless– Effectisgreaterathigherdatarates
• SolvewithCyclicRedundancyCheck(CRC),interleavingorothermoreadvancedcodes
11/27/17 Fall2017 – Lecture#25 44
11/27/17 Fall2017 -- Lecture#25 45
PeerInstructionQuestionThefollowingwordisreceived,encodedwithHammingcode:0 1 10 001
Whatisthecorrecteddatabitsequence?
A.1111B.0001C.1101D.1011
11/27/17 Fall2017 – Lecture#25 46
Outline
• DependabilityviaRedundancy• ErrorCorrection/Detection• RAID• And,inConclusion…
11/27/17 Fall2017 – Lecture#25 47
EvolutionoftheDiskDrive
IBMRAMAC305,1956
IBM3390K,1986
AppleSCSI,198611/27/17 Fall2017 – Lecture#25 48
11/27/17
9
CansmallerdisksbeusedtoclosegapinperformancebetweendisksandCPUs?
ArraysofSmallDisks
14”10”5.25”3.5”
3.5”
DiskArray:1diskdesign
Conventional:4diskdesigns
LowEnd HighEnd
11/27/17 Fall2017 – Lecture#25 49
ReplaceSmallNumberofLargeDiskswithLargeNumberofSmallDisks!(1988Disks)
CapacityVolumePowerDataRateI/ORateMTTFCost
IBM3390K20GBytes97cu.ft.3KW15MB/s600I/Os/s250KHrs$250K
IBM3.5"0061320MBytes0.1cu.ft.11W1.5MB/s55I/Os/s50KHrs$2K
x7023GBytes11cu.ft.1KW120MB/s3900IOs/s???Hrs$150K
DiskArrayshavepotentialforlargedataandI/Orates,highMBpercu.ft.,highMBperKW,butwhataboutreliability?
9X3X
8X
6X
11/27/17 Fall2017 – Lecture#25 50
RAID:RedundantArraysof(Inexpensive)Disks
• Filesare"striped"acrossmultipledisks• Redundancyyieldshighdataavailability– Availability:servicestillprovidedtouser,evenifsomecomponentsfailed
• Diskswillstillfail• Contentsreconstructedfromdataredundantlystoredinthearray− Capacitypenaltytostoreredundantinfo− Bandwidthpenaltytoupdateredundantinfo
11/27/17 Fall2017 – Lecture#25 51
RedundantArraysofInexpensiveDisksRAID1:DiskMirroring/Shadowing
• Eachdiskisfullyduplicatedontoits“mirror”Veryhighavailabilitycanbeachieved
•Writeslimitedbysingle-diskspeed•Readsmaybeoptimized
Mostexpensivesolution:100%capacityoverhead
recoverygroup
11/27/17 Fall2017 – Lecture#25 52
RedundantArrayofInexpensiveDisksRAID3:ParityDisk
P
100100111100110110010011...
logicalrecord 10100011
11001101
10100011
11001101
Pcontainssumofotherdisksperstripemod2(“parity”)Ifdiskfails,subtractPfromsumofotherdiskstofindmissinginformation
Stripedphysicalrecords
11/27/17 Fall2017 – Lecture#25 53
RedundantArraysofInexpensiveDisksRAID4:HighI/ORateParity
D0 D1 D2 D3 P
D4 D5 D6 PD7
D8 D9 PD10 D11
D12 PD13 D14 D15
PD16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.DiskColumns
IncreasingLogicalDiskAddress
Stripe
Insidesof5disks
Example:smallreadD0&D5,largewriteD12-D15
11/27/17 Fall2017 – Lecture#25 54
11/27/17
10
InspirationforRAID5• RAID4workswellforsmallreads• Smallwrites(writetoonedisk):– Option1:readotherdatadisks,createnewsumandwritetoParityDisk– Option2:sincePhasoldsum,compareolddatatonewdata,addthedifferencetoP
• SmallwritesarelimitedbyParityDisk:WritetoD0,D5bothalsowritetoPdisk
D0 D1 D2 D3 P
D4 D5 D6 PD7
11/27/17 Fall2017 – Lecture#25 55
RAID5:HighI/ORateInterleavedParity
Independentwritespossiblebecauseofinterleavedparity
D0 D1 D2 D3 P
D4 D5 D6 P D7
D8 D9 P D10 D11
D12 P D13 D14 D15
P D16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.DiskColumns
IncreasingLogicalDiskAddresses
Example:writetoD0,D5usesdisks0,1,3,4
11/27/17 Fall2017 – Lecture#25 56
ProblemsofDiskArrays: SmallWrites
D0 D1 D2 D3 PD0'
+
+
D0' D1 D2 D3 P'
newdata
olddata
oldparity
XOR
XOR
(1.Read) (2.Read)
(3.Write) (4.Write)
RAID-5:SmallWriteAlgorithm
1LogicalWrite=2PhysicalReads+2PhysicalWrites
11/27/17 Fall2017 – Lecture#25 57
TechReportRead‘RoundtheWorld(December1987)
11/27/17 Fall2017 – Lecture#25 58
RAID-I
• RAID-I(1989)–ConsistedofaSun4/280workstationwith128MBofDRAM,fourdual-stringSCSIcontrollers,285.25-inchSCSIdisksandspecializeddiskstripingsoftware
11/27/17 Fall2017 – Lecture#25 59
RAIDII• 1990-1993• EarlyNetworkAttached
Storage(NAS)SystemrunningaLogStructuredFileSystem(LFS)
• Impact:– $25Billion/yearin2002– Over$150BillioninRAID
devicesoldsince1990-2002– 200+RAIDcompanies(atthe
peak)– SoftwareRAIDastandard
componentofmodernOSs
11/27/17 Fall2017 – Lecture#25 60
11/27/17
11
Outline
• DependabilityviaRedundancy• ErrorCorrection/Detection• RAID• And,inConclusion…
11/27/17 Fall2017 – Lecture#25 61
And,inConclusion,…• GreatIdea:RedundancytoGetDependability– Spatial(extrahardware)andTemporal(retryiferror)
• Reliability:MTTF&AnnualizedFailureRate(AFR)• Availability:%uptime(MTTF/MTTF+MTTR)• Memory– Hammingdistance2:ParityforSingleErrorDetect– Hammingdistance3:SingleErrorCorrectionCode+encodebitpositionoferror
• Treatdiskslikememory,exceptyouknowwhenadiskhasfailed—erasuremakesparityanErrorCorrectingCode
• RAID-2,-3,-4,-5:Interleaveddataandparity11/27/17 Fall2017 – Lecture#25 62