Continuuity Weave

26
WEAVE - SENSIBLE YARN Terence Yim Monday, September 16, 13

description

Terence Yim from Continuuity talks about Weave SDK at Big Data Gurus meetup (http://www.meetup.com/BigDataGurus/)

Transcript of Continuuity Weave

Page 1: Continuuity Weave

WEAVE - SENSIBLE YARNTerence Yim

Monday, September 16, 13

Page 2: Continuuity Weave

BIG FLOWWhat is BigFlow?• A core Processor; a real-time stream processing engine

(Similar to Storm and S4)

• BigFlow provides

‣ Tight integration with your existing Hadoop/HBase cluster

‣ Exactly-once execution vs. At-least-once execution (your apps do not have to be idempotent)

‣ Supports transactions for consistency and HBase-backed queues for durability

‣ Direct integration with a data storage engine (Continuuity Data Fabric) and Datasets

‣ Beautiful User interface for application management

‣ Utilizes YARN for deployment and supports runtime elastic scalability of Flows

Monday, September 16, 13

Page 3: Continuuity Weave

I HAVE HADOOP - WHAT NEXT?• What if:

‣ Map/Reduce is not the only thing I want to run?

‣ I would like to do real-time analysis?

‣ Or run a message-passing algorithm over my data?

‣ My cluster has more compute capacity than my big data needs?

• Can I utilize its idle resources for other, compute-intensive tasks?

Monday, September 16, 13

Page 4: Continuuity Weave

A MESSAGE PASSING (MPI) APP

data

data data

data data

data

Monday, September 16, 13

Page 5: Continuuity Weave

A STREAMING APP

Database

event

Monday, September 16, 13

Page 6: Continuuity Weave

A DISTRIBUTED LOAD TEST

test testtest

test test test

test test test

test

test

test

WebServic

Monday, September 16, 13

Page 7: Continuuity Weave

A MULTI-PURPOSE CLUSTER

log log log

Database

split spli spli

par par par

data

data data

data data

data

test testtest

test test test

test test test

test

test

test

WebService

Database

events

Monday, September 16, 13

Page 8: Continuuity Weave

WHAT IS YARN?• The Resource Manager of Hadoop 2.0

• Separates resource management from programming paradigm

• Allows (almost) all kinds of distributed application in Hadoop cluster

• Multi-tenant, secure

• Scales to many thousands of nodes

• Pluggable scheduling algorithms

Monday, September 16, 13

Page 9: Continuuity Weave

YARN - HOW IT WORKSProtocols

1) Client-RM: Submit the app master

2) RM-NM: Start the app master

3) AM-RM: Request + release containers

4) RM-NM: Start tasks in containers

YARNResourceManager

Node Mgr

AM

Node Mgr

Node MgrNode Mgr

Task Task

Task

Task

TaskTask

YARNClient

1)

2)

3)

4)

Monday, September 16, 13

Page 10: Continuuity Weave

YARN IS NOT SIMPLE!Monday, September 16, 13

Page 11: Continuuity Weave

YARN PAIN POINTS• YARN is not for novice Big Data programmers

• High ramp-up time

• Lots of boiler plate code required to build simple applications

• 80% of applications are generally simple

• YARN is built around applications that terminate

‣ Logs are available only on completion of application

• No standard support for

• Application lifecycle management

• Communication between Container(s) & Application Master

• Handling Application level errors

• Hard to debug

Monday, September 16, 13

Page 12: Continuuity Weave

TYPICAL YARN APP• Many distributed applications are simple.

• For example: Distributed load test for a queuing system:

• “Run n producers and m consumers for k seconds”

• Neither producer nor consumer know that they run in YARN

• This would be easy if we run them as threads in a single JVM

• Can we make YARN as simple as Java Threads?

Monday, September 16, 13

Page 13: Continuuity Weave

TYPICAL NEEDS ...... of a distributed application:

• Monitor and restart containers

• Collect logs and metrics

• Elastic scale (dynamic adding/removing of instances)

• Long-running jobs

Monday, September 16, 13

Page 14: Continuuity Weave

WEAVE!

Monday, September 16, 13

Page 15: Continuuity Weave

MOTIVATION• Not designed to be a replacement for YARN

‣ Designed to simplify building, debugging & running of YARN Applications‣ Running in YARN should be as simple as running Threads in Java

-A simpler programming model for YARN• Hide low level details of YARN

‣ Simplified API for specifying, running & managing an application‣ Simplified way to specify and manage stage(s) of the application lifecycle‣ Generic Application Master to better support simple applications‣ Simpler archive management‣ Better control over application logs and errors

Monday, September 16, 13

Page 16: Continuuity Weave

SIMPLICITY140. public class ApplicationMaster {141. .150. private AMRMClientAsync resourceManager; .435. public boolean run() throws YarnRemoteException {436. LOG.info(“Starting ApplicationMaster”); . 498. } .500. public void finish() {504. for(Thread launchThread: launchThreads) { . ...511. }517. FinalApplicationStatus appStatus;518. String appMessage = null;538. } .790. public ContainerRequest setupContainerAskForRM(int numContainers) {.800. Resource capability = Records.newRecord(Resource.class);801. capability.setMemory(containerMemory);.803. ContainerRequest request = new ContainerRequest(capability, ..);807. }808. } . .1102. public class Client extends YarnClientImpl { .1336. public boolean run() throws IOException { .1354. GetNewApplicationResponse newApp = super.getNewApplication();1355. ApplicationId appId = newApp.getApplicationId(); .1583. return monitorApplication(appId);1585. } .1594. private boolean monitorApplication(ApplicationId appId) throws ... { .1596. while(true) { .1606. ApplicationReport report = super.getApplicationReport(appId); .1647. }1648. } .1657. private void forceKillApplication(ApplicationId appId) throws ... { .1664. super.killApplication(appId); .1665. } . 1667. }

Distributed Shell - Vanilla YARN Distributed Shell - WEAVE

001. public class DistributedShell extends AbstractWeaveRunnable {002. static Logger LOG = LoggerFactory.getLogger(DistributedShell.class); 003. private String command;004. 005. public DistributedShell(String command) {006. this.command = command;007. }008.009. @Override010. public WeaveRunnableSpecification configure() {011. return WeaveRunnableSpecification.builder().with()012. .setName("Executor " + command)013. .withArguments(ImmutableMap.of("cmd", this.command))014. .build();015. }016.017. @Override018. public void initialize(WeaveContext context) {019. this.command = context.getSpecification().getArguments("cmd");020. }021.022. @Override023. public void run() {024. try {025. Process process = Runtime.getRuntime().exec(this.command);026. InputStream stdin = proc.getInputStream();027. BufferedReader br = new BufferedReader(new InputStreamReader(stdin));028.029. String line;030. while( (line = br.readline()) != null) {031. LOG.info(line);032. }033. } catch (Throwable t) {034. LOG.error(t.getMessage());035. }036. }037. }038.039. public static main(String[] args) {040. String zkStr = args[0];041. String command = args[1];042. yarnConfig = ...043. WeaveRunnerService weaveService = new YarnWeaveRunnerService(yarnConfig, zkStr);044. weaveService.startAndWait();045. WeaveController controller = weaveService.prepare(new DistributedShell(command))046. .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out)))047. .start();048. }

~ 500 lines ~ 50 lines

Monday, September 16, 13

Page 17: Continuuity Weave

Client Host

JVM

ARCHITECTUREZooKeeper

ZooKeeperZooKeeper

Cluster Host

YARN Node Manager

YARN container

WeaveContainer

WeaveRunnable

Cluster Host

YARN Node Manager

YARN container

WeaveContainer

WeaveRunnable

Cluster Host

YARN Node Manager

YARN container

WeaveContainer

WeaveRunnable

Runtime states and messages

YARN container

WeaveApplicationMaster

WeaveRunnerService

Application

WeaveRunnable

WeaveRunnable

WeaveRunnable

Submit

KafkaServer

Log entries

Monday, September 16, 13

Page 18: Continuuity Weave

RUNNABLE// Defines a Task public class EchoServer implements WeaveRunnable

{

private static Logger LOG = ... private WeaveContext context;

//....

@Override

public void run() { ServerSocket server = new ServerSocket(0);

context.announce("echo", server.getLocalPort()); while (isRunning()) {

Socket socket = server.accept();

//...

}

}

}

‣ Class implements WeaveRunnable is a single task.

‣ SLF4J logger can be used for logging‣ logs are collected and sent back to client using

Kafka‣ Kafka server is started inside Application Master.

‣ WeaveRunnable can be run in‣ Thread‣ YARN Container

‣ Service discovery integration

‣ Easy! - Piece of cake.

Monday, September 16, 13

Page 19: Continuuity Weave

RUNNER// Starts a taskWeaveRunner runner = ...;WeaveController controller = runner.prepare(new EchoServer())

.addLogHandler(new PrinterLogHandler(

new PrintWriter(System.out)

)

).start();

//...

controller.sendCommand( Commands.create(“port”, 44444));

//...

controller.stop();

‣ WeaveRunner ‣ Can run Runnable or Application‣ Packages the class and it’s dependencies along

with additional resources specified‣ Attaches log collector for collecting all the logs

from containers

WeaveController is used to manage the lifecycle of a runnable or application

Controller also provides ability to send commands to running containers

Monday, September 16, 13

Page 20: Continuuity Weave

APPLICATION// Defines an Applicationpublic class WebServer implements WeaveApplication {

@Override

public WeaveSpecification configure() { return WeaveSpecification.Builder.with()

.setName(“Jetty WebServer”)

.withRunnables()

.add(new JettyWebServer())

.withLocalFiles() .add(“www-html”, new File(“/var/html”))

.apply()

.add(new LogsCollector())

.anyOrder() .build();

}

}

WeaveController controller = runner.prepare(new WebServer()).start();

‣ WeaveApplication defines a collection of runnables and their behavior

‣ Application is specified by a WeaveSpecification

‣ Application can specify any additional local directories to be made available for container to run.

‣ Weave starts containers in no order.

‣ Simply starts the application with WeaveRunner.

Monday, September 16, 13

Page 21: Continuuity Weave

APPLICATION// Defines Application with specific tasks orderingpublic class DataApplication implements WeaveApplication {

@Override

public WeaveSpecification configure() {

return WeaveSpecification.Builder.with()

.setName(“Cool Data Application”)

.withRunnables()

.add(“reader”, new InputReader())

.add(“processor”, new Processor())

.add(“writer”, new OutputWriter())

.order() .first(“reader”) .next(“processor”) .next(“writer”) .build();

}

}

Monday, September 16, 13

Page 22: Continuuity Weave

RESOURCE MANAGEMENT// Starts a task with resource specification.WeaveRunner runner = ...;

ResourceSpecification resource = ResourceSpecification.Builder.with()

.setCores(2) .setMemory(1, SizeUnit.GIGA).build()

WeaveController controller = runner.prepare(new EchoServer(), resource) .start();

// Query for resource usage.ResourceReport report = controller.getResourceReport();println(report.getRunnableResources(“EchoServer”).getCores());

Monday, September 16, 13

Page 23: Continuuity Weave

ROADMAP• Supports across both hadoop-2.0 and hadoop-2.1 version (in branch already, merging soon)

• Resource management through cgroups (in branch already).

• API support for managing lifecycle hooks of an application and it’s containers

• Advanced error handling capabilities

‣ E.g. Fail application if 50% of my containers fail

‣ E.g. Fail application if it takes more than ‘x’ seconds to allocate containers for the application

• Extendable Application Master• Support for non-JVM languages

• Support for command dispatching

• Contribs

‣ Running HBase using Weave

‣ Running Jetty HTTP Server using Weave

• Support for custom application-level metrics

• Better UI• Ability to remote debug a running container

• Improve APIs, Improve APIs, Improve APIs!

Monday, September 16, 13

Page 24: Continuuity Weave

How to start?• Checkout at githubhttp://github.com/continuuity/weave

• Read the docs at http://weave.continuuity.com

Monday, September 16, 13

Page 25: Continuuity Weave

We are hiring!Excellent healthcare benefits

✓casual vacation policy✓brand new workstations✓free lunches✓convenient location✓hard-working colleagues.

If you're interested in changing the world of Big Data, please send an email and resume to [email protected].• Software Engineers (Senior and Junior)• VP Services and Support• Senior Dev Ops Engineer• Tools Engineer• Solutions Engineers• Pre-sales Engineer• Junior Designer• Inside Sales• Marketing Leader

Monday, September 16, 13

Page 26: Continuuity Weave

THANK YOUhttp://github.com/continuuity/weave

Monday, September 16, 13