Oozie Hug May 2011
-
Upload
mislam77 -
Category
Technology
-
view
709 -
download
1
description
Transcript of Oozie Hug May 2011
![Page 1: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/1.jpg)
Oozie3:ImprovedSchedulingandControlOfWorkflows
MohammadKIslamkamrul@yahoo‐inc.com
![Page 2: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/2.jpg)
Introduc?ons
• OozieTeam• Architecture,Development,Management
– MayankBansal– AngeloHuang– MohammadIslam– AmolKekre– AndreasNewman– LeiZhang
• Externalcontributors.• QE
– MarcyChang– MichelleChiang
• WhoIam• TechnicalLeadatYahoo!
![Page 3: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/3.jpg)
Agenda• OozieOverview• Oozie3.0features:– Bundle– Scalability– Usability
• FuturePlan• Q&A
![Page 4: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/4.jpg)
Overview:Workflow• OozieexecutesworkflowdefinedasDAGofjobs.• Thejobtypeincludes:Map‐Reduce/Pipes/Streaming/Pig/CustomJavaCodeetc.
• IntroducedinOozie1.x.
startM/Rjob
M/Rstreaming
job
decision
fork
Pigjob
M/Rjob
join
end JavaFSjob
ENOUGH
MORE
![Page 5: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/5.jpg)
Overview:Coordinator• Oozieexecutesworkflowbasedon:– TimeDependency(Frequency)– DataDependency
• IntroducedinOozie2.x.
Hadoop
OozieServer
OozieClient
OozieWorkflow
WSAPI OozieCoordinator
CheckDataAvailability
![Page 6: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/6.jpg)
Bundle
• WhatisBundle?– AnewabstracconlayerontopofCoordinator.– Userscandefineandexecutea bunch of coordinatorapplicacons.
– IntroducedinOozie3.x.• Whyitisrequired?– Datapipeline:Asetofinter‐relatedcoordinatorsapplicaconrequiredforlargedataprocessing.
– Operaconalnightmare:HardtomaintainandcontrolthesepipelinesforServiceEngineeringteam.
![Page 7: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/7.jpg)
BundleCont.• Userdefinesthebundlethroughanew XML.• Usercouldstart/stop/suspend/resume/rerun inthebundlelevel.
• Bundleisop3onal.
Hadoop
OozieServer
OozieClient
Workflow
WSAPI
Coordinator
CheckDataAvailability
Bundle
![Page 8: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/8.jpg)
OozieAbstrac?onLayers
Coord Action 1
Coord Action 2
Coord Action1
Coord Action 2
WF Job 1 WF Job 2 WF Job 2
M/R Job
PIG Job
FS Job
M/R Job
PIG Job
Bundle Layer1
Coord Job 1 Coord Job 2
Layer2
WF Job 1
Layer3
![Page 9: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/9.jpg)
EnhancedStabilityandScalability
• Issue:Atveryhighload,Ooziebecomesslow.• Impact:90%ofthetotalOoziesupportincidence.• Reason:– Lotofaccvebutnon‐progressingjobs.– Non‐progressingjobsareconsumingalotofresources.
– Oozieinternalqueueisfull.• Resolucon:– Throhlethenumberofaccvejobs/coordinator– Putthejobintocmeoutstate.– Enforcetheuniquenessforooziequeueelement.
![Page 10: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/10.jpg)
ImprovedUsability
• Issue:Coordinatorjob’sstatusisnotintuicveandcausesconfusiontotheOozieuser.
• Impact:UserconfusionandrelatedOoziesupport.
• Reason:– StatusSUCCEEDEDdoesn’tmeanjobissuccessful!!– StatusPREMATERisforoozieinternaluseonly.Butitwasexposedtouser.
• Resolucon:– RedesignCoordinatorstatus
![Page 11: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/11.jpg)
CoordinatorStatusRedesign
PREP Running
KILLED
SUCCEEDED
FAILED
DONE_WITH_ERROR
SUSPENDED
PAUSED
Current
New
PREP PREMATER Running
KILLED
SUCCEEDED
FAILED
SUSPENDED
PREMATER SUCCEEDED
![Page 12: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/12.jpg)
FuturePlan• HigherScalability:Changepolling‐baseddata‐dependencychecktopush‐modelthroughHCatalogandNocficaconsystem.
• Adaptability:GracefulhandlingHadoopdowncme:– IfHadoopisdown,blocksubmission.
– WhenHadoopbecomesavailable• Submittheblockedjob
• Auto‐resubmittheuntracedjob.
• Monitoring:RichWSAPIforapplicaconMonitoring/Alercng.
![Page 13: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/13.jpg)
FuturePlanCont.
• Automa?cFailover:UsingZooKeeper.• LoadBalancing:Throughserverreplicacon• ImprovedUsability:– Distcpaccon– HiveAccon
• Asynchronousdataprocessing.• Incrementaldataprocessing.
• ApacheMigra?on:Worksinicated.
![Page 14: Oozie Hug May 2011](https://reader034.fdocuments.in/reader034/viewer/2022051609/54620e8bb4af9f531c8b45c6/html5/thumbnails/14.jpg)
Q&A
MohammadKIslam
kamrul@yahoo‐inc.com
• Githublink:hhp://yahoo.github.com/oozie• Mailinglist:[email protected]