Continuuity Weave
-
Upload
elephantscale -
Category
Technology
-
view
48 -
download
1
description
Transcript of Continuuity Weave
WEAVE - SENSIBLE YARNTerence Yim
Monday, September 16, 13
BIG FLOWWhat is BigFlow?• A core Processor; a real-time stream processing engine
(Similar to Storm and S4)
• BigFlow provides
‣ Tight integration with your existing Hadoop/HBase cluster
‣ Exactly-once execution vs. At-least-once execution (your apps do not have to be idempotent)
‣ Supports transactions for consistency and HBase-backed queues for durability
‣ Direct integration with a data storage engine (Continuuity Data Fabric) and Datasets
‣ Beautiful User interface for application management
‣ Utilizes YARN for deployment and supports runtime elastic scalability of Flows
Monday, September 16, 13
I HAVE HADOOP - WHAT NEXT?• What if:
‣ Map/Reduce is not the only thing I want to run?
‣ I would like to do real-time analysis?
‣ Or run a message-passing algorithm over my data?
‣ My cluster has more compute capacity than my big data needs?
• Can I utilize its idle resources for other, compute-intensive tasks?
Monday, September 16, 13
A MESSAGE PASSING (MPI) APP
data
data data
data data
data
Monday, September 16, 13
A STREAMING APP
Database
event
Monday, September 16, 13
A DISTRIBUTED LOAD TEST
test testtest
test test test
test test test
test
test
test
WebServic
Monday, September 16, 13
A MULTI-PURPOSE CLUSTER
log log log
Database
split spli spli
par par par
data
data data
data data
data
test testtest
test test test
test test test
test
test
test
WebService
Database
events
Monday, September 16, 13
WHAT IS YARN?• The Resource Manager of Hadoop 2.0
• Separates resource management from programming paradigm
• Allows (almost) all kinds of distributed application in Hadoop cluster
• Multi-tenant, secure
• Scales to many thousands of nodes
• Pluggable scheduling algorithms
Monday, September 16, 13
YARN - HOW IT WORKSProtocols
1) Client-RM: Submit the app master
2) RM-NM: Start the app master
3) AM-RM: Request + release containers
4) RM-NM: Start tasks in containers
YARNResourceManager
Node Mgr
AM
Node Mgr
Node MgrNode Mgr
Task Task
Task
Task
TaskTask
YARNClient
1)
2)
3)
4)
Monday, September 16, 13
YARN IS NOT SIMPLE!Monday, September 16, 13
YARN PAIN POINTS• YARN is not for novice Big Data programmers
• High ramp-up time
• Lots of boiler plate code required to build simple applications
• 80% of applications are generally simple
• YARN is built around applications that terminate
‣ Logs are available only on completion of application
• No standard support for
• Application lifecycle management
• Communication between Container(s) & Application Master
• Handling Application level errors
• Hard to debug
Monday, September 16, 13
TYPICAL YARN APP• Many distributed applications are simple.
• For example: Distributed load test for a queuing system:
• “Run n producers and m consumers for k seconds”
• Neither producer nor consumer know that they run in YARN
• This would be easy if we run them as threads in a single JVM
• Can we make YARN as simple as Java Threads?
Monday, September 16, 13
TYPICAL NEEDS ...... of a distributed application:
• Monitor and restart containers
• Collect logs and metrics
• Elastic scale (dynamic adding/removing of instances)
• Long-running jobs
Monday, September 16, 13
WEAVE!
Monday, September 16, 13
MOTIVATION• Not designed to be a replacement for YARN
‣ Designed to simplify building, debugging & running of YARN Applications‣ Running in YARN should be as simple as running Threads in Java
-A simpler programming model for YARN• Hide low level details of YARN
‣ Simplified API for specifying, running & managing an application‣ Simplified way to specify and manage stage(s) of the application lifecycle‣ Generic Application Master to better support simple applications‣ Simpler archive management‣ Better control over application logs and errors
Monday, September 16, 13
SIMPLICITY140. public class ApplicationMaster {141. .150. private AMRMClientAsync resourceManager; .435. public boolean run() throws YarnRemoteException {436. LOG.info(“Starting ApplicationMaster”); . 498. } .500. public void finish() {504. for(Thread launchThread: launchThreads) { . ...511. }517. FinalApplicationStatus appStatus;518. String appMessage = null;538. } .790. public ContainerRequest setupContainerAskForRM(int numContainers) {.800. Resource capability = Records.newRecord(Resource.class);801. capability.setMemory(containerMemory);.803. ContainerRequest request = new ContainerRequest(capability, ..);807. }808. } . .1102. public class Client extends YarnClientImpl { .1336. public boolean run() throws IOException { .1354. GetNewApplicationResponse newApp = super.getNewApplication();1355. ApplicationId appId = newApp.getApplicationId(); .1583. return monitorApplication(appId);1585. } .1594. private boolean monitorApplication(ApplicationId appId) throws ... { .1596. while(true) { .1606. ApplicationReport report = super.getApplicationReport(appId); .1647. }1648. } .1657. private void forceKillApplication(ApplicationId appId) throws ... { .1664. super.killApplication(appId); .1665. } . 1667. }
Distributed Shell - Vanilla YARN Distributed Shell - WEAVE
001. public class DistributedShell extends AbstractWeaveRunnable {002. static Logger LOG = LoggerFactory.getLogger(DistributedShell.class); 003. private String command;004. 005. public DistributedShell(String command) {006. this.command = command;007. }008.009. @Override010. public WeaveRunnableSpecification configure() {011. return WeaveRunnableSpecification.builder().with()012. .setName("Executor " + command)013. .withArguments(ImmutableMap.of("cmd", this.command))014. .build();015. }016.017. @Override018. public void initialize(WeaveContext context) {019. this.command = context.getSpecification().getArguments("cmd");020. }021.022. @Override023. public void run() {024. try {025. Process process = Runtime.getRuntime().exec(this.command);026. InputStream stdin = proc.getInputStream();027. BufferedReader br = new BufferedReader(new InputStreamReader(stdin));028.029. String line;030. while( (line = br.readline()) != null) {031. LOG.info(line);032. }033. } catch (Throwable t) {034. LOG.error(t.getMessage());035. }036. }037. }038.039. public static main(String[] args) {040. String zkStr = args[0];041. String command = args[1];042. yarnConfig = ...043. WeaveRunnerService weaveService = new YarnWeaveRunnerService(yarnConfig, zkStr);044. weaveService.startAndWait();045. WeaveController controller = weaveService.prepare(new DistributedShell(command))046. .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out)))047. .start();048. }
~ 500 lines ~ 50 lines
Monday, September 16, 13
Client Host
JVM
ARCHITECTUREZooKeeper
ZooKeeperZooKeeper
Cluster Host
YARN Node Manager
YARN container
WeaveContainer
WeaveRunnable
Cluster Host
YARN Node Manager
YARN container
WeaveContainer
WeaveRunnable
Cluster Host
YARN Node Manager
YARN container
WeaveContainer
WeaveRunnable
Runtime states and messages
YARN container
WeaveApplicationMaster
WeaveRunnerService
Application
WeaveRunnable
WeaveRunnable
WeaveRunnable
Submit
KafkaServer
Log entries
Monday, September 16, 13
RUNNABLE// Defines a Task public class EchoServer implements WeaveRunnable
{
private static Logger LOG = ... private WeaveContext context;
//....
@Override
public void run() { ServerSocket server = new ServerSocket(0);
context.announce("echo", server.getLocalPort()); while (isRunning()) {
Socket socket = server.accept();
//...
}
}
}
‣ Class implements WeaveRunnable is a single task.
‣ SLF4J logger can be used for logging‣ logs are collected and sent back to client using
Kafka‣ Kafka server is started inside Application Master.
‣ WeaveRunnable can be run in‣ Thread‣ YARN Container
‣ Service discovery integration
‣ Easy! - Piece of cake.
Monday, September 16, 13
RUNNER// Starts a taskWeaveRunner runner = ...;WeaveController controller = runner.prepare(new EchoServer())
.addLogHandler(new PrinterLogHandler(
new PrintWriter(System.out)
)
).start();
//...
controller.sendCommand( Commands.create(“port”, 44444));
//...
controller.stop();
‣ WeaveRunner ‣ Can run Runnable or Application‣ Packages the class and it’s dependencies along
with additional resources specified‣ Attaches log collector for collecting all the logs
from containers
WeaveController is used to manage the lifecycle of a runnable or application
Controller also provides ability to send commands to running containers
Monday, September 16, 13
APPLICATION// Defines an Applicationpublic class WebServer implements WeaveApplication {
@Override
public WeaveSpecification configure() { return WeaveSpecification.Builder.with()
.setName(“Jetty WebServer”)
.withRunnables()
.add(new JettyWebServer())
.withLocalFiles() .add(“www-html”, new File(“/var/html”))
.apply()
.add(new LogsCollector())
.anyOrder() .build();
}
}
WeaveController controller = runner.prepare(new WebServer()).start();
‣ WeaveApplication defines a collection of runnables and their behavior
‣ Application is specified by a WeaveSpecification
‣ Application can specify any additional local directories to be made available for container to run.
‣ Weave starts containers in no order.
‣ Simply starts the application with WeaveRunner.
Monday, September 16, 13
APPLICATION// Defines Application with specific tasks orderingpublic class DataApplication implements WeaveApplication {
@Override
public WeaveSpecification configure() {
return WeaveSpecification.Builder.with()
.setName(“Cool Data Application”)
.withRunnables()
.add(“reader”, new InputReader())
.add(“processor”, new Processor())
.add(“writer”, new OutputWriter())
.order() .first(“reader”) .next(“processor”) .next(“writer”) .build();
}
}
Monday, September 16, 13
RESOURCE MANAGEMENT// Starts a task with resource specification.WeaveRunner runner = ...;
ResourceSpecification resource = ResourceSpecification.Builder.with()
.setCores(2) .setMemory(1, SizeUnit.GIGA).build()
WeaveController controller = runner.prepare(new EchoServer(), resource) .start();
// Query for resource usage.ResourceReport report = controller.getResourceReport();println(report.getRunnableResources(“EchoServer”).getCores());
Monday, September 16, 13
ROADMAP• Supports across both hadoop-2.0 and hadoop-2.1 version (in branch already, merging soon)
• Resource management through cgroups (in branch already).
• API support for managing lifecycle hooks of an application and it’s containers
• Advanced error handling capabilities
‣ E.g. Fail application if 50% of my containers fail
‣ E.g. Fail application if it takes more than ‘x’ seconds to allocate containers for the application
• Extendable Application Master• Support for non-JVM languages
• Support for command dispatching
• Contribs
‣ Running HBase using Weave
‣ Running Jetty HTTP Server using Weave
• Support for custom application-level metrics
• Better UI• Ability to remote debug a running container
• Improve APIs, Improve APIs, Improve APIs!
Monday, September 16, 13
How to start?• Checkout at githubhttp://github.com/continuuity/weave
• Read the docs at http://weave.continuuity.com
Monday, September 16, 13
We are hiring!Excellent healthcare benefits
✓casual vacation policy✓brand new workstations✓free lunches✓convenient location✓hard-working colleagues.
If you're interested in changing the world of Big Data, please send an email and resume to [email protected].• Software Engineers (Senior and Junior)• VP Services and Support• Senior Dev Ops Engineer• Tools Engineer• Solutions Engineers• Pre-sales Engineer• Junior Designer• Inside Sales• Marketing Leader
Monday, September 16, 13
THANK YOUhttp://github.com/continuuity/weave
Monday, September 16, 13