Data Pipeline Management Framework on Oozie
-
Upload
sharethis -
Category
Engineering
-
view
499 -
download
4
description
Transcript of Data Pipeline Management Framework on Oozie
Data Pipeline Management
Framework on Oozie
Kun Lu
OverviewArchitecture of Campaign Analytics
What are the issues in the old Campaign Analytics processes
Build Pipeline Management Framework for robust computing environment
Architecture of Campaign Analytics
What are the issues the framework needs to solve
Consistent and robust frameworkAdding a new analytics job more
easier Ability to coordinate complex
workflows (serialized and parallel processing)
It should support the catch-up feature
It should make debugging and tracing easier
What does Oozie provide?Workflow Engine
Workflow definitionA DAG with control flow nodes or action nodes (connected
with transition arrows)
Workflow NodesControl flow nodes (start, end, decision, fork, join, kill
node)Action nodes (Map-reduce, pig, Java, Script, etc.)
Parameterization of WorkflowJob PropertiesEL functions (Basic EL, WF EL, Hadoop EL, HDFS EL)
Oozie Console
Oozie Client and API
Workflow Design Pattern
Campaign Analytics Pipeline Management FrameworkCampaign Analytics Pipeline Management
Framework(PMF) is built on top of Oozie.
PMF defines campaign analytics processing pipeline. Each pipeline includes a set of workflows.
PMF organizes, schedules and coordinates the campaign analytics jobs. It also provides the built-in catch-up feature to make the pipeline robust.
Oozie workflow engine executes workflows and sending jobs status to Oozie server.
Monitoring/Tracing jobs through Oozie console.
PMF & Oozie Execution Env.
PMF ServersOwn Pipeline definitionPassing workflow tasks to Oozie through Ooize
client
Oozie ServerExecutes workflow tasksManages task status
Hadoop ClusterWorkflow definition deployed in HDFSM/R processes run on the cluster
Oozie Console
Workflow Console
Current WorkflowsPMF manages three pipelines
(hourly pipeline, daily pipeline, and weekly pipeline)
Includes 12 workflows
Map/Reduce Jobs run per month: ~100,000 jobs