Apache ZooKeeper · • ZooKeeper provides a very simple interface to a highly reliable and...
Transcript of Apache ZooKeeper · • ZooKeeper provides a very simple interface to a highly reliable and...
ApacheZooKeeper
CMSC491Hadoop-BasedDistributedCompu=ng
Spring2016AdamShook
Whatisit?
• ApacheZooKeeperisanefforttodevelopandmaintainanopen-sourceserverwhichenableshighlyreliabledistributedcoordina=on.– Simple– Replicated– Ordered– Fast
Provides
• Configura=onInforma=on• DistributedSynchroniza=on• GroupServices
• Eachoftheseservicesareusedinsomebydistributedapplica=ons
Interface
• ZooKeeperprovidesaverysimpleinterfacetoahighlyreliableanddistributedservice
• Powerfulabstrac=onscanbebuiltfromthisverysimpleinterface
• CurrentlyinterfacesareinJavaandC– WanttoexpandtoPython,Perl,andREST.
TheCore
• Sharedhierarchicalnamespaceofdataregisters,calledznodes
• Unlikefilesystems,providesclientswithhighthroughput,lowlatency,highlyavailable,andorderedaccesstoznodes
Quorum
Namespace
znodes
• Meta-informa=on:– Configura=on– StatusInforma=on– Loca=onInforma=on– Whateveryouwant(that’ssmall)
znodes
• Eachnodeactsasafileanddirectory• 1MBmaximumperznode• Persistentvs.Ephemeral• Sequen=alznodes• Fullpaths– Anop=onal“chroot”suffixcanbeappendedtoconnec=onstring
– “127.0.0.1:3000,127.0.0.1:3002/app/a”
Watchers
• Tiedtoeachznode
• One-=metrigger• Senttotheclient• Thedataforwhyitwassent
That’sIt
• Inanutshell• Verybasicservice,fromwhichpowerfulabstrac=onscanbebuilt
• Let’stalkabouthowgooditis!– Thatis,ifyoudon’thaveanyques=onsrightnow…• Youcanask.Idon’tbite
– Really» Promise
UseCase:Loca=onData• Serversstoremachinehostnameasephemeralznodes
– /app1/machine1– /app1/machine87– /app1/machine4
• Whenaserverisadded,createanewznode• Whenaserverisremoved,znodeisdeleted• Whenaserverfails,ZKwilldeletetheephemeralnode• Allowsfordynamicthronlingofresources• Clientscanchooseahostnamefromchildrenof/app1to
connectto– Setachildwatchon/app1,ifservergoesdownitwillreceiveno=fica=onandcanchooseanewserver
UseCase:Status
• UseZooKeeperasaheartbeatmechanism• “Master”servicekeepsdatawatchesonznodes
• Serverssetthedataoftheirnodeevery15seconds
• IftheMasterdoesn’treceiveano=fica=onchangewithin20seconds,canassumethatserverhasfailedandkillitbeforebadthingshappen.
Performance
Performance
CommandLineInterface
• Interac=veusageofthenamespaceinashell– create[path][data]– delete[path]– get[path]– set[path]– ls[path]– rmr[path]– Anumberofothercommands…
• Tabcomple=on!
API
• Currentandstablev3.4.6(March2014)• RequiresonlyalistofZKserverstoconnect• IMO,goodbutmessyinterface• RecommendbuildinganicewrapperAPIforgerng/serngPODtypesandhandlingexcep=ons
Recipes!
• Wearegoingtotalkaboutthese:• Configura=on• DistributedLocks• DistributedQueue
Configura=on
• Configura=onisosendriventhroughkey/valuepairsstoredinafile– Cangetmessywhenconfigura=onisdynamic
• Implementa=onisverystraightorward,asitiswhatZooKeeperwasdesignedfor
• Eachfull-pathedznodeisthekeyandthedataassociatedwiththeznodeisthevalue
Variables
• Sta=cVariables– Thoseonesthatareprobablynevergoingtochange(notasmuchfun)
• DynamicVariables– Changedbyhandviacommandlineorbytheapplica=onitself• Trackstatusofprocesses• Updatehistoricaldata
UseofWatchers
• Applica=onscanchangeconfigura=onontheflyforsomevariables
• Wheneveravariablechanges,thosewatchinganodecanreceivethechangedvariableandmakethecorrectchanges
• Veryusefulforlong-runningapplica=onsthatrequirethemostuptodateinforma=on
DistributedLocks• Ameanstohavedistributedprocessesretrievealockfor
someopera=on– Thronledupda=ngofdatabase– Yourusecasehere!
• ExistsinZooKeeper'srecipesdirectoryandisdistributedwiththerelease--src/recipes/lock
Algorithm• Defineaznodetoholdthelock,say“/dlock”1. mypath=create(“/dlock/lock-”),withthesequence
andephemeralflagsset2. children=getChildren(“/dlock”),nowatch3. Ifmypathhaslowestnumbersuffixinchlidren,exit4. Callexists()onnodefromchildrenwithnextlowest
sequencenumberwiththewatchflagset1. i.e.,ifmypathis“/dlock/lock-6”andchildrencontains
3,4,6,7,callexistson“/dlock/lock-4”5. Ifexistsisfalse,gotostep26. Iftrue,waitforwatchtriggerbeforegoingtostep2
DistributedQueues• Ameanstoallowclientstoasynchronouslyaddelementstoa
queueandhaveasingleprocessorapplica=ondequeueandprocessthem.– Ican’trememberthelast=meIneededaqueue– Maybeyouhaveafew
Algorithm• Designateaznodetoholdthequeue,say“/dqueue”• Enqueue:create(“/dqueue/queue-”),withsequenceandephemeralflagsset.– Returnsarealpathnode/dqueue/queue-X,whereXisamonotonicincreasingnumber
• Dequeue:getChildren(“/dqueue”),watchsettotrue• Processthesenodeswiththelowestnumberfirst– NoneedtocallgetChildren()un=lthecurrentreceivedlistisexhausted
• Ifnochildrenareinthequeue,waitforwatchno=fica=onbeforecheckingagain
PriorityQueueExtension• Twosimplemodifica=onstothisalgorithm!– Whenenqueuing,pathnamesendswithqueue-ZZ,whereZZisthepriorityoftheelement• Lowerthenumber,higherthepriority
– Whendequeuing,ifthewatchno=fica=onistriggeredonthe“/dqueue”node,clientneedstocallgetChildren()againandresortbypriority.
OtherRecipes
• Groupmembership• Barriers• Two-phasedcommit• LeaderElec=on
ApacheCurator
• "Curatornˈkyoor͝ˌātər:akeeperorcustodianofamuseumorothercollec=on-AZooKeeperKeeper.“
• Contains:• Recipes• Framework• U=li=es
• Client• Errors• Extensions
References
• hnp://zookeeper.apache.org• hnp://curator.apache.org