Case Study: Amazon AWS - University of Notre Damedthain/courses/cse40822/fall... · managing...
Transcript of Case Study: Amazon AWS - University of Notre Damedthain/courses/cse40822/fall... · managing...
Case Study: Amazon AWS CSE40822–CloudComputing
Prof.DouglasThainUniversityofNotreDame
Caution to the Reader: Herein are examples of prices consulted in fall 2018, to give a sense of the magnitude of costs. Do your own research before spending your own money!
Several Historical Trends • SharedUtilityComputing
• 1960s–MULTICS–ConceptofaSharedComputingUtility• 1970s–IBMMainframes–rentbytheCPU-hour.(Fast/slowswitch.)
• DataCenterCo-location• 1990s-2000s–Rentmachinesformonths/years,keepthemclosetothenetworkaccesspointandpayaflatrate.Avoidrunningyourownbuildingwithutilities!
• PayasYouGo• Early2000s-Submitjobstoaremoteserviceproviderwheretheyrunontherawhardware.SunCloud($1/CPU-hour,Solaris+SGE)IBMDeepCapacityComputingonDemand(50cents/hour)
• Virtualization• 1960s–OS-VM,VM-360–Usedtosplitmainframesintologicalpartitions.• 1998–VMWare–FirstpracticalimplementationonX86,butatsignificantperformancehit.
• 2003–Xenparavirtualizationprovidesmuchperf,butkernelmustassist.• Late2000s–IntelandAMDaddhardwaresupportforvirtualization.
Virtual-* Allows for the Scale of Abstraction to Increase Over Time • Runoneprocesswithincertainresourcelimits.
OpSyshasvirtualmemory,virtualCPU,andvirtualstorage(filesystem).• Runmultipleprocesseswithincertainresourcelimits.
Resourcecontainers(Solaris),virtualservers(Linux),virtualimages(Docker)• Runanentireoperatingsystemwithincertainlimits.
Virtualmachinetechnology:VMWare,Xen,KVM,etc.• Runasetofvirtualmachinesconnectedviaaprivatenetwork.
Virtualnetworks(SDNs)provisionbandwidthbetweenvirtualmachines.• Runaprivatevirtualarchitectureforeverycustomer.
Automatedtoolsreplicatevirtualinfrastructureasneeded.
Amazon AWS
• GrewoutofAmazon’sneedtorapidlyprovisionandconfiguremachinesofstandardconfigurationsforitsownbusiness.
• Early2000s–Bothprivateandshareddatacentersbeganusingvirtualizationtoperform“serverconsolidation”
• 2003–InternalmemobyChrisPinkhamdescribingan“infrastructureservicefortheworld.”
• 2006–S3firstdeployedinthespring,EC2inthefall• 2008–ElasticBlockStoreavailable.• 2009–RelationalDatabaseService• 2012–DynamoDB• 2014–Lambda("Serverless")• 2016(?)–ElasticContainerService• Doesitturnaprofit?
Terminology
• Instance=Onerunningvirtualmachine.• InstanceType=hardwareconfiguration:cores,memory,disk.• InstanceStoreVolume=Temporarydiskassociatedwithinstance.• Image(AMI)=Storedbitswhichcanbeturnedintoinstances.• KeyPair=CredentialsusedtoaccessVMfromcommandline.• Region=Geographiclocation,price,laws,networklocality.• AvailabilityZone=Subdivisionofregionthatisfault-independent.
EC2 Pricing Model • FreeUsageTier• On-DemandInstances
• Startandstopinstanceswheneveryoulike,costsareroundeduptothenearesthour.(Worstprice)
• ReservedInstances• Payupfrontforone/threeyearsinadvance.(Bestprice)• Unusedinstancescanbesoldonasecondarymarket.
• SpotInstances• Specifythepriceyouarewillingtopay,andinstancesgetstartedandstoppedwithoutanywarningasthemarkedchanges.(KindoflikeCondor!)
http://aws.amazon.com/ec2/pricing/
Free Usage Tier
• 750hoursofEC2runningLinux,RHEL,orSLESt2.microinstanceusage
• 750hoursofEC2runningMicrosoftWindowsServert2.microinstanceusage
• 750hoursofElasticLoadBalancingplus15GBdataprocessing• 30GBofAmazonElasticBlockStorageinanycombinationofGeneralPurpose(SSD)orMagnetic,plus2millionI/Os(withMagnetic)and1GBofsnapshotstorage
• 15GBofbandwidthoutaggregatedacrossallAWSservices• 1GBofRegionalDataTransfer
On-De
man
dInstan
ces
Reserved
Instan
ces
SpotIn
stan
ces
Surprisingly, you can’t scale up that large.
Simple Storage Service (S3)
• Abucketisacontainerforobjectsanddescribeslocation,logging,accounting,andaccesscontrol.Abucketcanholdanynumberofobjects,whicharefilesofupto5TB.Abuckethasanamethatmustbegloballyunique.
• FundamentaloperationscorrespondingtoHTTPactions:• http://bucket.s3.amazonaws.com/object• POSTanewobjectorupdateanexistingobject.• GETanexistingobjectfromabucket.• DELETEanobjectfromthebucket• LISTkeyspresentinabucket,withafilter.
• Abuckethasaflatdirectorystructure(despitetheappearancegivenbytheinteractivewebinterface.)
Easily Integrated into Web Applications <form action="http://examplebucket.s3.amazonaws.com/" method="post" enctype="multipart/form-data"> <input type="input" name="key" value="user/user1/" /> <input type="hidden" name="acl" value="public-read" /> <input type="hidden" name="success_action_redirect" value="http://examplebucket.s3.amazonaws.com/successful_upload.html" /> . . . <input type="text" name="X-Amz-Credential” value="AKIAIOSFODNN7EXAMPLE/20130806/us-east-1/s3/aws4_request" /> . . . <input type="submit" name="submit" value="Upload to Amazon S3" /> </form>
http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html
Bucket Properties
• Versioning–Ifenabled,POST/DELETEresultinthecreationofnewversionswithoutdestroyingtheold.
• Lifecycle–Deleteorarchiveobjectsinabucketacertaintimeaftercreationorlastaccessornumberofversions.
• AccessPolicy–Controlwhenandwhereobjectscanbeaccessed.• AccessControl–Controlwhomayaccessobjectsinthisbucket.• Logging–Keeptrackofhowobjectsareaccessed.• Notification–Benotifiedwhenfailuresoccur.
S3 Weak Consistency Model DirectquotefromtheAmazondeveloperAPI:“Updatestoasinglekeyareatomic….”“AmazonS3achieveshighavailabilitybyreplicatingdataacrossmultipleserverswithinAmazon'sdatacenters.IfaPUTrequestissuccessful,yourdataissafelystored.However,informationaboutthechangesmustreplicateacrossAmazonS3,whichcantakesometime,andsoyoumightobservethefollowingbehaviors:
• AprocesswritesanewobjecttoAmazonS3andimmediatelyattemptstoreadit.Untilthechangeisfullypropagated,AmazonS3mightreport"keydoesnotexist."
• AprocesswritesanewobjecttoAmazonS3andimmediatelylistskeyswithinitsbucket.Untilthechangeisfullypropagated,theobjectmightnotappearinthelist.
• Aprocessreplacesanexistingobjectandimmediatelyattemptstoreadit.Untilthechangeisfullypropagated,AmazonS3mightreturnthepriordata.
• Aprocessdeletesanexistingobjectandimmediatelyattemptstoreadit.Untilthedeletionisfullypropagated,AmazonS3mightreturnthedeleteddata.”
Elastic Block Store
• AnEBSvolumeisavirtualdiskofafixedsizewithablockread/writeinterface.ItcanbemountedasafilesystemonarunningEC2instancewhereitcanbeupdatedincrementally.Unlikeaninstancestore,anEBSvolumeispersistent.
• (ComparetoanS3object,whichisessentiallyafilethatmustbeaccessedinitsentirety.)
• Fundamentaloperations:• CREATEanewvolume(1GB-1TB)• COPYavolumefromanexistingEBSvolumeorS3object.• MOUNTononeinstanceatatime.• SNAPSHOTcurrentstatetoanS3object.
EBS is approx. 3x more expensive by volume and 10x more expensive by IOPS than S3.
Use Glacier for Cold Data • GlacierisstructuredlikeS3:avaultisacontainerforanarbitrarynumberofarchives.Policies,accounting,andaccesscontrolareassociatedwithvaults,whileanarchiveisasingleobject.
• However:• AlloperationsareasynchronousandnotifiedviaSNS.• Vaultlistingsareupdatedonceperday.• Archivedownloadsmaytakeuptofourhours.• Only5%oftotaldatacanbeaccessedinagivenmonth.
• Pricing:• Storage:$0.01perGB-month• Operations:$0.05per1000requests• DataTransfer:LikeS3,freewithinAWS.
• S3PoliciescanbesetuptoautomaticallymovedataintoGlacier.
Durability • AmazonclaimsaboutS3:
• AmazonS3isdesignedtosustaintheconcurrentlossofdataintwofacilities,e.g.3+copiesacrossmultipleavailabledomains.
• 99.999999999%durabilityofobjectsoveragivenyear.• AmazonclaimsaboutEBS:
• AmazonEBSvolumedataisreplicatedacrossmultipleserversinanAvailabilityZonetopreventthelossofdatafromthefailureofanysinglecomponent.
• Volumes<20GBmodifieddatasincelastsnapshothaveanannualfailurerateof0.1%-0.5%,resultingincompletelossofthevolume.
• CommodityharddiskshaveanAFRofabout4%.• AmazonclaimsaboutGlacieristhesameasS3:
• AmazonS3isdesignedtosustaintheconcurrentlossofdataintwofacilities,e.g.3+copiesacrossmultipleavailabledomainsPLUSperiodicinternalintegritychecks.
• 99.999999999%durabilityofobjectsoveragivenyear.
• Bewareofoversimplifiedargumentsaboutlow-probabilityevents!
Amazon Elastic File Services (EFS)
• EFSisastandalonefileservicedesignedtosharedamongVMs.• FileSystemInstance–Files,directories,storageallocation,multipleAZs• MountTarget–DNSname,IPaddress,NFStarget,singleAZ:file-system.id.efs.aws-region.amazonaws.com
• InsideofVM,usenormalNFSconnectiontothemounttarget:
mount –t nfs –o rsize=1M,wsize=1M,… file-system-dns-name /mnt/data
https://docs.aws.amazon.com/efs
Amazon ElasticFile Services (EFS)
• DataConsistencyModel• "closetoopen"consistencysemantics.• In(normal)asynchronous,sequentialI/Omode:
• Dataisdurableafteranfsyncoraclose.• Dataisvisibletootherprocessesthatopenafteryouclose.• Indeterminatecase:ProcessBopensbeforeProcessAcloses.
• In(explicit)synchronousI/Omode(O_DIRECT)• Non-appendingwritesareimmediatelyvisible.• (Thisimpliesthatappendingwritesarenot.Why?)• Hint:Sizeoffileisapropertyofmetadata.
Everything in AWS is Carefully Limited!
• 50KB/sperGBofdatainfreethroughput• 250MB/sMaxEFSthroughputperEC2Instance• 128userIDscanhaveopenfiles• 32Kopenfilesonasingleinstance• 4080Bmaxsymboliclinklength• 1000maximumdirectorydepth• 47.9TiBmaxbytesperfile• 87maxlocksperfile(?)• 7000fileoperationspersecond
https://docs.aws.amazon.com/efs/latest/ug/limits.html
Amazon EFS Pricing
EBS EFSLatency "lowest"
(localhardware)"low"(network)
MaxThroughput 2GB/s 10GB/s
Clients(VMs) Single O(1000)
StorageCostPerMonth
$0.05/GB(MagneticDisk)
$0.30/GB
AccessCostPerMonth
$0.05/1MIOPS(MagneticDisk
$6.00/MB/sXput
Serverless Computing • Aside:Worst.Name.Ever.(Exceptfordevicelesscomputing.)• "Serverlesstakesitastepfurther,whereyoudon'teventhinkabouttheinfrastructure.Youthinkoffunctionsthatyourcodeneedstoperform."–Prof.PaulBrenner,NotreDameCRC*
• KeyIdea:• Defineafixedfunction(andassociatedstate)once.• Invokethatfunctiononsmallamountsofdatamanytimes.• Teardownthefunction(andstate).
• Cloudprovidertakestheresponsibilityofscalingthesystemupanddowninordertomeettheofferedload.
*https://cacm.acm.org/magazines/2018/2/224625-going-serverless/fulltext
Amazon Lambda
• WithAWSLambda,youcanruncodewithoutprovisioningormanagingservers.Youpayonlyforthecomputetimethatyouconsume—there’snochargewhenyourcodeisn’trunning.Youcanruncodeforvirtuallyanytypeofapplicationorbackendservice—allwithzeroadministration.JustuploadyourcodeandLambdatakescareofeverythingrequiredtorunandscaleyourcodewithhighavailability.YoucansetupyourcodetoautomaticallytriggerfromotherAWSservicesorcallitdirectlyfromanywebormobileapp.
https://docs.aws.amazon.com/lambda/index.html#lang/en_us
Amazon Lambda
• SupportedEnvironments:• Node.js,Python,C#,Go,Java,PowerShell
• Encouragedmodeofuse:• Defineaself-containedworkfunctioninthenativelanguage.• Writeahandlerfunctionthatpacks/unpacksJSONandinvokestheworkfunc.
• Possible,butlessencouragedandcancausetrouble:• Handlerfunctionusessystem/exec/shelltoinvokeexternalprocessesinordertocomputeresult.
• Thelambdaruntimeisdefinedrelativetoalanguageruntime(e.g.Python2.7,Python3.0)andnotspecificabouttheOSenvironment.
Requirements
• Functionmustbestateless.• Noaffinitywiththeunderlyingcomputeinfrastructure.• Localfilesystemaccess,childprocesses,andsimilarartifactstobelimitedtothelifetimeoftherequest.
• PersistentstateshouldbestoredinAmazonS3,DynamoDB,etc..
• P.S.Easytosay,butrequiresdisciplinetosticktothis.• P.P.S.Doesthissoundsimilartoanothersystemwehavediscussed?
Defining a Function
• Uniquename.• Providethecode(obviously)• Useadeploymentpackageifaddldependenciesneeded.• Statememoryrequiredbyfunction.(CPUisproportional)• Statethetimeoutforthefunction.(Defaultis3seconds!)• GivepermissionstoaccessotherAWSservices.
How does it work?
• (Sketchontheboard)
Lambda Pricing in Oct 2018
• Atfirst:• First1Minvocationsarefree.• 400,000GB-sofmemoryxexecutiontimearefree.
• Then:• $0.20per1Minvocations.• $0.00001667perGB-sofmemoryxexecutiontime.
• Careful:Functionscanincuraddlcostsbyinvokingotherservices.
Perf
orm
ance
Lim
its
Principle: Hierarchy of Invocation Costs
• Stepstogettingremoteworkdone:• Provisionphysicalhardware.• Activatevirtualmachineonhardware.• Deploycontaineronvirtualmachine.• Startprocessand"warmup"theprogram.• Loadnecessarydatafortask.• Invokethefunctionthatconstitutesyourwork.
• Shouldwesetup/teardowneverythingforeveryinvocation?• Whatarethepros/consofstayingatthelastlevel?• (Hint:Thinkaboutwhatotherconcurrentusersmaydo.)
Architecture Center • IdeasforconstructinglargescaleinfrastructuresusingAWS:http://aws.amazon.com/architecture/
Command Line Setup • Gotoyourprofilemenu(yourname)intheupperrighthandcorner,select“SecurityCredentials”and“ContinuetoSecurityCredentials”
• Select“AccessKeys”• Select“NewAccessKey”andsavethegeneratedkeyssomewhere.• Edit~/.aws/configandsetituplikethis:
• Nowtestit:awsec2-describe-instances
Notethesyntaxhereisdifferentfromhowitwasgiveninthewebconsole!AWSAccessKey=XXXXXXAWSSecretAccessKey=YYYYYYYYY
[default]output=jsonregion=us-west-2aws_access_key=XXXXXXaws_secret_access_key=YYYYYYYYYYYY
S3 Command Line Examples
awss3 mb s3://bucket... cp localfiles3://bucket/key mv s3://bucket/keys3://bucket/newname
ls s3://bucket rm s3://bucket/key rb s3://bucket
aws s3 helpaws s3 lshelp
EC2 Command Line Examples
awsec2 describe-instances run-instances--image-idami-xxxxx--count1
--instance-typet1.micro--key-namekeyfile stop-instances--instance-idi-xxxxxx
aws ec2 helpaws ec2 start-instanceshelp
Warmup: Get Started with Amazon
• SkimthroughtheAWSdocumentation.• SignupforAWSathttp://aws.amazon.com• (SkiptheIAMmanagementfornow)• Applytheservicecredityoureceivedbyemail.• CreateanddownloadaKey-Pair,saveitinyourhomedirectory.• CreateaVMviatheAWSConsole• Connecttoyournewly-createdVMlikethis:
• ssh-imy-aws-keypair.pemec2-user@ip-address-of-vm• CreateabucketinS3andupload/downloadsomefiles.
Demo Time http://aws.amazon.com