Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two...

34
Case Study: Amazon AWS CSE 40822 – Cloud Compu0ng Prof. Douglas Thain University of Notre Dame

Transcript of Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two...

Page 1: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Case Study: Amazon AWS CSE40822–CloudCompu0ng

Prof.DouglasThainUniversityofNotreDame

Page 2: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Cau3on to the Reader: Herein are examples of prices consulted in spring 2016, to give a sense of the magnitude of costs. Do your own research before spending your own money!

Page 3: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Several Historical Trends •  SharedU0lityCompu0ng

•  1960s–MULTICS–ConceptofaSharedCompu0ngU0lity•  1970s–IBMMainframes–rentbytheCPU-hour.(Fast/slowswitch.)

•  DataCenterCo-loca0on•  1990s-2000s–Rentmachinesformonths/years,keepthemclosetothenetworkaccesspointandpayaflatrate.Avoidrunningyourownbuildingwithu0li0es!

•  PayasYouGo•  Early2000s-Submitjobstoaremoteserviceproviderwheretheyrunontherawhardware.SunCloud($1/CPU-hour,Solaris+SGE)IBMDeepCapacityCompu0ngonDemand(50cents/hour)

•  Virtualiza0on•  1960s–OS-VM,VM-360–Usedtosplitmainframesintologicalpar00ons.•  1998–VMWare–Firstprac0calimplementa0ononX86,butatsignificantperformancehit.

•  2003–Xenparavirtualiza0onprovidesmuchperf,butkernelmustassist.•  Late2000s–IntelandAMDaddhardwaresupportforvirtualiza0on.

Page 4: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Virtual-* Allows for the Scale of Abstrac3on to Increase Over Time • Runoneprocesswithincertainresourcelimits.

OpSyshasvirtualmemory,virtualCPU,andvirtualstorage(filesystem).• Runmul0pleprocesseswithincertainresourcelimits.

Resourcecontainers(Solaris),virtualservers(Linux),virtualimages(Docker)• Runanen0reopera0ngsystemwithincertainlimits.

Virtualmachinetechnology:VMWare,Xen,KVM,etc.• Runasetofvirtualmachinesconnectedviaaprivatenetwork.

Virtualnetworks(SDNs)provisionbandwidthbetweenvirtualmachines.• Runaprivatevirtualarchitectureforeverycustomer.

Automatedtoolsreplicatevirtualinfrastructureasneeded.

Page 5: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Amazon AWS

•  GrewoutofAmazon’sneedtorapidlyprovisionandconfiguremachinesofstandardconfigura0onsforitsownbusiness.

•  Early2000s–Bothprivateandshareddatacentersbeganusingvirtualiza0ontoperform“serverconsolida0on”

•  2003–InternalmemobyChrisPinkhamdescribingan“infrastructureservicefortheworld.”

•  2006–S3firstdeployedinthespring,EC2inthefall•  2008–Elas0cBlockStoreavailable.•  2009–Rela0onalDatabaseService•  2012–DynamoDB•  Doesitturnaprofit?

Page 6: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 7: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Terminology

•  Instance=Onerunningvirtualmachine.•  InstanceType=hardwareconfigura0on:cores,memory,disk.•  InstanceStoreVolume=Temporarydiskassociatedwithinstance.•  Image(AMI)=Storedbitswhichcanbeturnedintoinstances.• KeyPair=Creden0alsusedtoaccessVMfromcommandline.• Region=Geographicloca0on,price,laws,networklocality.• AvailabilityZone=Subdivisionofregiontheisfault-independent.

Page 8: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 9: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

EC2 Pricing Model • FreeUsageTier• On-DemandInstances

•  Startandstopinstanceswheneveryoulike,costsareroundeduptothenearesthour.(Worstprice)

• ReservedInstances• Payupfrontforone/threeyearsinadvance.(Bestprice)• Unusedinstancescanbesoldonasecondarymarket.

• SpotInstances•  Specifythepriceyouarewillingtopay,andinstancesgetstartedandstoppedwithoutanywarningasthemarkedchanges.(KindoflikeCondor!)

hnp://aws.amazon.com/ec2/pricing/

Page 10: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Free Usage Tier

•  750hoursofEC2runningLinux,RHEL,orSLESt2.microinstanceusage

•  750hoursofEC2runningMicrosopWindowsServert2.microinstanceusage

•  750hoursofElas0cLoadBalancingplus15GBdataprocessing•  30GBofAmazonElas0cBlockStorageinanycombina0onofGeneralPurpose(SSD)orMagne0c,plus2millionI/Os(withMagne0c)and1GBofsnapshotstorage

•  15GBofbandwidthoutaggregatedacrossallAWSservices•  1GBofRegionalDataTransfer

Page 11: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 12: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Reserved Instance Example

Page 13: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 14: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Surprisingly, you can’t scale up that large.

Page 15: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Simple Storage Service (S3)

•  Abucketisacontainerforobjectsanddescribesloca0on,logging,accoun0ng,andaccesscontrol.Abucketcanholdanynumberofobjects,whicharefilesofupto5TB.Abuckethasanamethatmustbegloballyunique.

•  Fundamentalopera0onscorrespondingtoHTTPac0ons:•  hnp://bucket.s3.amazonaws.com/object•  POSTanewobjectorupdateanexis0ngobject.•  GETanexis0ngobjectfromabucket.•  DELETEanobjectfromthebucket•  LISTkeyspresentinabucket,withafilter.

•  Abuckethasaflatdirectorystructure(despitetheappearancegivenbytheinterac0vewebinterface.)

Page 16: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Easily Integrated into Web Applica3ons <form action="http://examplebucket.s3.amazonaws.com/" method="post" enctype="multipart/form-data"> <input type="input" name="key" value="user/user1/" /> <input type="hidden" name="acl" value="public-read" /> <input type="hidden" name="success_action_redirect" value="http://examplebucket.s3.amazonaws.com/successful_upload.html" /> . . . <input type="text" name="X-Amz-Credential” value="AKIAIOSFODNN7EXAMPLE/20130806/us-east-1/s3/aws4_request" /> . . . <input type="submit" name="submit" value="Upload to Amazon S3" /> </form>

hnp://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html

Page 17: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Bucket Proper3es

• Versioning–Ifenabled,POST/DELETEresultinthecrea0onofnewversionswithoutdestroyingtheold.

•  Lifecycle–Deleteorarchiveobjectsinabucketacertain0meapercrea0onorlastaccessornumberofversions.

• AccessPolicy–Controlwhenandwhereobjectscanbeaccessed.• AccessControl–Controlwhomayaccessobjectsinthisbucket.•  Logging–Keeptrackofhowobjectsareaccessed.• No0fica0on–Beno0fiedwhenfailuresoccur.

Page 18: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

S3 Weak Consistency Model DirectquotefromtheAmazondeveloperAPI:“Updatestoasinglekeyareatomic….”“AmazonS3achieveshighavailabilitybyreplica0ngdataacrossmul0pleserverswithinAmazon'sdatacenters.IfaPUTrequestissuccessful,yourdataissafelystored.However,informa0onaboutthechangesmustreplicateacrossAmazonS3,whichcantakesome0me,andsoyoumightobservethefollowingbehaviors:

•  AprocesswritesanewobjecttoAmazonS3andimmediatelyanemptstoreadit.Un0lthechangeisfullypropagated,AmazonS3mightreport"keydoesnotexist."

•  AprocesswritesanewobjecttoAmazonS3andimmediatelylistskeyswithinitsbucket.Un0lthechangeisfullypropagated,theobjectmightnotappearinthelist.

•  Aprocessreplacesanexis0ngobjectandimmediatelyanemptstoreadit.Un0lthechangeisfullypropagated,AmazonS3mightreturnthepriordata.

•  Aprocessdeletesanexis0ngobjectandimmediatelyanemptstoreadit.Un0lthedele0onisfullypropagated,AmazonS3mightreturnthedeleteddata.”

Page 19: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 20: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 21: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Always read the fine print….

Page 22: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 23: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Elas3c Block Store

• AnEBSvolumeisavirtualdiskofafixedsizewithablockread/writeinterface.ItcanbemountedasafilesystemonarunningEC2instancewhereitcanbeupdatedincrementally.Unlikeaninstancestore,anEBSvolumeispersistent.

•  (ComparetoanS3object,whichisessen0allyafilethatmustbeaccessedinitsen0rety.)

•  Fundamentalopera0ons:•  CREATEanewvolume(1GB-1TB)•  COPYavolumefromanexis0ngEBSvolumeorS3object.•  MOUNTononeinstanceata0me.•  SNAPSHOTcurrentstatetoanS3object.

Page 24: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 25: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability
Page 26: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

EBS is approx. 3x more expensive by volume and 10x more expensive by IOPS than S3.

Page 27: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Use Glacier for Cold Data • GlacierisstructuredlikeS3:avaultisacontainerforanarbitrarynumberofarchives.Policies,accoun0ng,andaccesscontrolareassociatedwithvaults,whileanarchiveisasingleobject.

• However:•  Allopera0onsareasynchronousandno0fiedviaSNS.•  Vaultlis0ngsareupdatedonceperday.•  Archivedownloadsmaytakeuptofourhours.•  Only5%oftotaldatacanbeaccessedinagivenmonth.

• Pricing:•  Storage:$0.01perGB-month•  Opera0ons:$0.05per1000requests•  DataTransfer:LikeS3,freewithinAWS.

•  S3Policiescanbesetuptoautoma0callymovedataintoGlacier.

Page 28: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Durability •  AmazonclaimsaboutS3:

•  AmazonS3isdesignedtosustaintheconcurrentlossofdataintwofacili0es,e.g.3+copiesacrossmul0pleavailabledomains.

•  99.999999999%durabilityofobjectsoveragivenyear.•  AmazonclaimsaboutEBS:

•  AmazonEBSvolumedataisreplicatedacrossmul0pleserversinanAvailabilityZonetopreventthelossofdatafromthefailureofanysinglecomponent.

•  Volumes<20GBmodifieddatasincelastsnapshothaveanannualfailurerateof0.1%-0.5%,resul0ngincompletelossofthevolume.

•  CommodityharddiskshaveanAFRofabout4%.•  AmazonclaimsaboutGlacieristhesameasS3:

•  AmazonS3isdesignedtosustaintheconcurrentlossofdataintwofacili0es,e.g.3+copiesacrossmul0pleavailabledomainsPLUSperiodicinternalintegritychecks.

•  99.999999999%durabilityofobjectsoveragivenyear.

•  Bewareofoversimplifiedargumentsaboutlow-probabilityevents!

Page 29: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Architecture Center •  Ideasforconstruc0nglargescaleinfrastructuresusingAWS:hnp://aws.amazon.com/architecture/

Page 30: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Command Line Setup • Gotoyourprofilemenu(yourname)intheupperrighthandcorner,select“SecurityCreden0als”and“Con0nuetoSecurityCreden0als”

•  Select“AccessKeys”•  Select“NewAccessKey”andsavethegeneratedkeyssomewhere.•  Edit~/.aws/configandsetituplikethis:

• Nowtestit:awsec2-describe-instances

Notethesyntaxhereisdifferentfromhowitwasgiveninthewebconsole!AWSAccessKey=XXXXXXAWSSecretAccessKey=YYYYYYYYY

[default]output=jsonregion=us-west-2aws_access_key=XXXXXXaws_secret_access_key=YYYYYYYYYYYY

Page 31: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

S3 Command Line Examples

awss3 mb s3://bucket... cp localfiles3://bucket/key mv s3://bucket/keys3://bucket/newname

ls s3://bucket rm s3://bucket/key rb s3://bucket

aws s3 helpaws s3 lshelp

Page 32: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

EC2 Command Line Examples

awsec2 describe-instances run-instances--image-idami-xxxxx--count1

--instance-typet1.micro--key-namekeyfile stop-instances--instance-idi-xxxxxx

aws ec2 helpaws ec2 start-instanceshelp

Page 33: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Warmup: Get Started with Amazon

•  SkimthroughtheAWSdocumenta0on.•  SignupforAWSathnp://aws.amazon.com•  (SkiptheIAMmanagementfornow)• Applytheservicecredityoureceivedbyemail.• CreateanddownloadaKey-Pair,saveitinyourhomedirectory.• CreateaVMviatheAWSConsole• Connecttoyournewly-createdVMlikethis:

•  ssh-imy-aws-keypair.pemec2-user@ip-address-of-vm• CreateabucketinS3andupload/downloadsomefiles.

Page 34: Case Study: Amazon AWS · • Amazon S3 is designed to sustain the concurrent loss of data in two facili0es, e.g. 3+ copies across mul0ple available domains. • 99.999999999% durability

Demo Time h_p://aws.amazon.com