Beang the tyranny of scale with a private cloud configured ... · ‘long tail’ of Science Share...
Transcript of Beang the tyranny of scale with a private cloud configured ... · ‘long tail’ of Science Share...
Bea$ngthetyrannyofscalewithaprivatecloud
configuredforBigDataB.N.Lawrence,V.Benne>,J.Churchill,M.Juckes,P.J.Kershaw,S.Pepler,M.
Pritchard,A.Stephens.
STFC.NCAS,NCEOUniversityofReading
InfrastructureContext
Mul$pleremotedatasources:can’tbringthecomputetoalldata,SO:bringalldatatooneplace,andbringthecomputetothat!(Avoidn x n datatransfer!)Needtoworryabout:• StorageLayout• Scheduling• Cura$onPolicies• Interfaces• Storageandanalysis
interconnect
JASMINComputeandStorageLotus+PrivateCloud+TapeStore+DMZfordatatransfer
InternalHelpdesk
CEDAArchiveServicesDataCentres,Cura$on,DBsystemsUsermanagement,ExternalHelpdesk
CEDAAS(onceBADC)
CEDAEO(onceNEODC)
CEDASolar(onceUKSSDC)
IPCCDDC
etc
CEDAComputeServicesComputeCloud:
PaaS(JAP+GenericScienceVMs+UserManagement),IaaSExternalHelpdesk
NERCManagedAnalysisCompu$ng(CEMS+SharedSystemsfor
NCAS,MetO,NOCetc)
NERCCloudAnalysisCompu$ng(EOSCloud,EnvWBetc)
etc
Organisa$onalViewpoint
JASMINandthe‘longtail’ofScience
ShareofCloudResourcebetweenorganisa6onsandcommuni6es
Organisa$onA
CommunityB
InternalServices
Organisa$onCBatch
Compute
Rawinfrastructurepower(dataavailableallthe6me,nexttothecompute)butmoreconstrainedservicemodel
Richandflexibleservicemodelallowsestablishmentofdomainspecificcollabora6veenvironments
Tradi$onal*nixuser
managem
ent
Highperformancecompute+globalfilesystem
Longtailresearch
Sharedvirtualapplica$onenvironments
Highlevelofexper$serequiredforusers
Virtualisa$on CloudServiceModelAbstractsPhysicalResources
Compute
Network
Storage
ManagedArchive CommunitySharing
Suppor$ngCommuni$es
EngineeringViewpoint
Currently(April2015):• 268CoresManagedCompute,• 548CoresUn-ManagedComputeaka“Tradi$onalCloud”• 3760CoresBatchCompute• 120CoresSpecialisedCompute• 17PBofdisk!Notebalanceofnetworkinterfacesinstorageandcompute!• YettobenchmarkfullI/O,probablyinexcessof3Tb/s?
JASMINOpera$ons• 500JASMINusers• >80projects• 4.9PBallocatedasGroup
Workspace;2.8PBCEDAarchives
• Over1.5millionprocessingjobs
AcademicCEMSUsage(Nov‘14)
GWS 25;1900TB
ManagedVMs 54
Loginusers 81
0
100
200
300
400
500
600
700
800
900
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Tbytes
JASMINusageOctober2014.Blue:allocatedbutnotyetused.Red:used.Green:asyetunallocated
DataManagementatScale(1)CEDAArchiveSnapshot• 3.0PBofallocated
archive,2.3PBusedin2,176filesetstotalling152Mfiles.
• 1copyondisk,atleastoneontapenearline,andoneoffsite
• Longtailinbothdatasetsizeandnumberoffiles.
• Volumeandnumberoffilesnotcorrelated,althoughthehighvolumedatasetstendnottohavethemostfiles.
Howdowetestfordataintegrity? Snapshotdata01/12/2014viaSamPepler.
DataManagementatScale(2)for i in range(number_to_do): fileset = CEDADB.next_audit()
# EITHER METHOD A
# checkm_file = fileset.create_checkm()
# OR METHOD B
filelists=fileset.make_jobs()
for fs in filelists:
results[fs]=fs.create_checkm()
checkm_file = combine(results)
# EITHER WAY:
CEDADB.store_anal_notify(checkm_file)
# Yes this is “poor man‘s Map Reduce”
• Doingauditsinbatchesasosenusefultoonlydosomeata$me.
• CEDADBisarestulservicetoworkoutwhichaudittodonextandstoreresult.
• MethodA:1LOTUSjobperfileset.Somefilesetsarebiggerthanotherssosmallonesfinishfastandlargeronesdragonfordays.
• MethodBmakesmul$pleLOTUSjobseachwithnomorethanacertainvolumeandnomorethanacertainnumberoffiles.
Itturnsoutthatnotonlyisalotofdatacura1onembarrassinglyparallel(andamenabletomap-reduce)butsoisalotofscience!
JASMINnetworkGangliaplots:GreennetinputareproxyforreadAuditjobsdonequicker&moreefficientlyusingmethod2(rightpanel)
www.GlobAlbedo.org www.QA4ECV.eu
ESA BIDS’14 Conference, 12-14 November 2014, ESRIN
Example uses of CEMS-JASMIN for global land surface products
Objective 1: Re-project BRDF files from SIN-coordinates to lat/lon using an Energy Conservation method
l Challenge: the projected SIN-Tiles into lat/lon results for non-rectangular shapes, with different SIN tiles
l Solution: SIN and Lat,Lon Cells are represented by geometry polygons rather than simple points and then the process is based on ratios of common area rather than on simple distance
l Challenge: huge number of polygons to be spatiality indexed and processed. This process requires massive RAM and usually takes a very long time
l Solution: Use Cloud-computing system on CEMS-JASMIN (~100 times faster than 224-core in house linux cluster!)
Jan-Peter Muller, Said Kharbouche (NCEO, UCL)
www.GlobAlbedo.org www.QA4ECV.eu
ESA BIDS’14 Conference, 12-14 November 2014, ESRIN
Example uses of CEMS-JASMIN for global land surface products
Objective 2: Create specific albedo products for computation of 8-daily LAI/fAPAR between 2002 and 2011 at 3 different resolutions: 1km, 5km and 25km
l Challenge: Upscale big data BRDF (50TB) from 1km to 5km and 25km using energy conservation method, and then create separate Albedo-Snow_only and Albedo-Snow_Free products: This process is extremely time consuming!
l Solution: Cloud-computing system in CEMS-JASMIN (~100 times faster than 224-core in house linux cluster)
Also use Science DMZ for data transfers from NASA l Achieved rates up to 28 TB/day
Jan-Peter Muller, Said Kharbouche (NCEO, UCL)
Issues
Curatedenvironmentisonevirtualorganisa$onalongsideo(100)otherorganisa$ons.Keyissuesinclude:(1) Howtoprovidehighperformancedataaccessandanaly$csinthemanagedandsemi-
managedenvironmentformul$pleusers,mul$pleworkflows,allintersec$nginsomeofthedata.
(2) Howtosupporthighperformancedatatransferandjobmigra$onbetweenthedifferent$ersofinfrastructure,
(3) Allinacontextofextremedatagrowth.
WorkflowandSchedulingIssues(1)Thesevendeadlysinsofcloud
compu$ngresearch(Schwarzkopf,MurrayandHand,Hotcloud,2012)
Pickoneissue:I/OandStorage
Pickfive,allinplay:• Unnecessarydistributedparallelism:Weneedto
support(nicely)highmemoryandothernodesinsideourenvironment.
• Assumingperformancehomogeneity.ThisisarealproblemforusinamixedVM/batchenvironment...Help.
• Forcingtheabstrac6on(Map-Reduce,HADOOPorbust)Weavoidthisbyhavingaparallelfilesystem,buthowdoweknowwearegexngvalue?.
• Unrepresenta6veworkloads.Wereallydon’tknowhowtoop$miseourjobs(yes,wecangivepeopleexclusiveaccesstonodes,butit’shardertogivethemexclusiveI/Obandwidth).
• Assumingperfectelas6city.Wehaven’tworkedouthowtoscheduletouseourresources,orhowtocloudburstproperly.
Weneedworkonunderstandingallthesethings
Wecanbreakourfilesystemupintopools(“bladesets”)inPanasas.Givecommuni$esaccesstoresourcesononebladeset.NowtheirI/OdoesnotinterferewithVOsusingotherbladesets.
Whenwerunoutofphysicalspacefordisk,howarewegoingtoefficientlyusetapeinourworkflows?
(IOR)Doweunderstandtheperformanceattheuser/applevel?
Thisisn’tveryflexible!Wecans$llnailaPBbladesetwith80nodes!HowdowegetmoreandflexibleI/O?
WorkflowandSchedulingIssues(2)
Thiscurrentlyworksbecausewehavesparecapacity,andrela$velyfewusersintheun-managedcloudandtheipythonnotebookenvironment.
Wedon’tknowhowtodotheschedulinghere,thehypervisior/VMparadigmisbangingupagainstthebatchsystemjob.Interac$veisbangingupagainstresource.Thesixthdeadlysin:thereisnotperfectelas$city.Weareofferingcloudburs$ng(toAmazonandAzure,wehope),butthenthereneedstobemoreworkondatapipelines.
Furtherinfo
JASMIN– h>p://www.jasmin.ac.uk
CentreforEnvironmentalDataArchival
– h>p://www.ceda.ac.ukJASMINContext:
Lawrence,B.N.,V.L.Benne>,J.Churchill,M.Juckes,P.Kershaw,S.Pascoe,S.Pepler,M.Pritchard,andA.Stephens.Storingandmanipula6ngenvironmentalbigdatawithJASMIN.ProceedingsofIEEEBigData2013,p68-75,doi:10.1109/BigData.2013.6691556