Confluence

29
CONFLuEnCE: Implementation and Application Design Panayiotis (Panickos) Neophytou Panos K. Chrysanthis Alexandros Labrinidis CollaborateCom 2011 Advanced Data Management Technologies Lab Computer Science Department University of Pittsburgh CONtinuous workFLow ExeCution Engine

description

 

Transcript of Confluence

  • 1. CONtinuous workFLow ExeCution Engine Panayiotis (Panickos) Neophytou Panos K. ChrysanthisAlexandros LabrinidisCollaborateCom 2011 Advanced Data Management Technologies Lab Computer Science DepartmentUniversity of Pittsburgh

2. Workflows are GREAT! Ability to automate processes Integrate and orchestrate resources(including humans) seamlessly andeffectively. Service composition. Process large data static sets Keep track of things (provenance) Re-usable Easy to program (Visual Languages)CONFLuEnCE: Implementation and Application Design 2 3. High data rates Push model New type of data sources (proactive): (unsupported) Stock price ticker, twitter stream, DSMS tuple streams. Polling: blocking, miss updates. Data items participate in multiple interleaving WFinvocations. CONFLuEnCE: Implementation and Application Design 3 4. Our approach Goal: Enable monitoring and collaborative applications that involve processing and integration of continuous streams of data. CONFLuEnCE: Continuous Workflow Execution Engine Define the model. [CollaborateCom 2008] Develop the new constructs. Window semantics, event waves, support backwards workflowcompatibility, enable push Develop the new model of computation Continuously running workflow activities. Deadline driven scheduling. Implement prototype. [Demo SIGMOD 2011] CONFLuEnCE: Implementation and Application Design 4 5. Overview Motivation Continuous Workflow Model Waves Window Operator Push communication CONFLuEnCE CWf Application Scenarios ConclusionsCONFLuEnCE: Implementation and Application Design 5 6. Continuous Workflow Model Includes all existing workflow constructs. Waves of events to distinguish between event contexts. Window operators on queues. Continuously running activities. Ability to support push communications.CONFLuEnCE: Implementation and Application Design 6 7. Wave of events Distinguish events between multiple invocations of anactivity. Waves expose provenance during design/execution. Allows synchronization of events of the same lineage. E.g., Customer order: multiple items, multiple handlersCONFLuEnCE: Implementation and Application Design 7 8. Window Operator Apply flexible bounds on unbounded stream of events Size Token, Time, Wave, Semantics Step (period of recalculation) - Token, Time, Wave, Semantics Delete_used_events flag (after activity has finished executing) Triggers activities in combination with preconditions. Window definition Size=5min Activity preconditions Step=1minif (window.length >= 2) Delete_used_events=true fire activityOut-of-stockevents 10 11 9 804673251 BDCBANotifyDC B A11 8 6 0ManagerFired: Expired If 2 events occur between 5 min A eventsof each other, then notify the manager.CONFLuEnCE: Implementation and Application Design 8 9. Group-By Scalar: (int, float, String, decimal, etc.) Complex: (array, record, matrix) E.g.: /entities/hashtag ghost/entities/hashtagvampirevampire zombies ghostWindow spec:Size=2 tokenStep=2 tokenDelete_used_events=false zombiesCONFLuEnCE: Implementation and Application Design9 10. Push Communication Push communication patterns: Broadcast Publish/Subscribe In->Out Out->InPort WF WFProducer Producerinputinput HybridPortWFProducer inputProducer MediatorProducerCONFLuEnCE: Implementation and Application Design10 11. Overview Motivation Continuous Workflow Model CONFLuEnCE Keplers Actor Oriented Modeling Continuous Workflow Director Windowed Operator Push Communication CWf Application Scenarios ConclusionsCONFLuEnCE: Implementation and Application Design 11 12. CONFLuEnCE: CONtinuousworkFLow ExeCution Engine Implements our Continuous Workflow model, inJava, as a module in Kepler Keplers benefits Open-source scientific workflow system Actor-based workflow modeling Built on top of PtolemyII(modeling, simulating, designing concurrent, real-timesystems) Well defined models of computation extendible, pluggable Large number of basic and specialized actors (taskcomponents) High-level visual languageCONFLuEnCE: Implementation and Application Design 12 13. Keplers Actor Oriented ModelingPorts each actor has a set of input and output ports produce/consume data (a.k.a. tokens) CONFLuEnCE: Implementation and Application Design 13 14. Keplers Actor Oriented ModelingDataflow Connections unidirectional actor communication channels connect output ports with input portsCONFLuEnCE: Implementation and Application Design 14 15. Keplers Actor Oriented ModelingSub-workflows / Composite Actors composite actors wrap sub-workflows hierarchical workflows (arbitrary nesting levels) CONFLuEnCE: Implementation and Application Design 15 16. Keplers Actor Oriented ModelingPN DirectorDirectors SDF Director defines the execution and communication semantics of workflow graphs executes workflow graph (some schedule) sub-workflows may have different directors promotes reusabilityCONFLuEnCE: Implementation and Application Design 16 17. Kepler Directors Models ofComputationDirectors separate the concerns of orchestration andscheduling from conceptual design Synchronous Dataflow (SDF) Process Networks (PN) Dynamic Data Flow (DDF) Continuous Time (CT) Discrete Event (DE) CONFLuEnCE: Implementation and Application Design 17 18. Continuous Workflow Director CWfs require continuous execution of the actors Stream data are events in time. Require timestamps CWf director: Extends the PN director Add timestamps on events using TimeKeeper on each actor. Add Window Operators on buffer queues (receivers)CONFLuEnCE: Implementation and Application Design 18 19. Windowed Receiver Kepler extension to support window semantics CWF Director I/O PortsProducer Consumerwindowed receiverCONFLuEnCE: Implementation and Application Design 19 20. Push Communication Implemented JSON WebSocket Server Actor (Out->In) Listens to predefined port Converts JSON objects to RecordToken(s) Enables continuous connectivity with web-browsers Implemented HTTP Socket Stream Source Actor (In->Out) Connects directly to an HTTP stream source (e.g., twitter) and receives data continuously Implemented the hybrid approach using PubSubHubbub[http://code.google.com/apis/pubsubhubbub/]CONFLuEnCE: Implementation and Application Design 20 21. Overview Motivation Continuous Workflow Model CONFLuEnCE CWf Application Scenarios Supply Chain Management Astroshelfs collaboration Backend Conclusions CONFLuEnCE: Implementation and Application Design 21 22. Supply Chain Management Real-time monitoring of a supply chain 4 User Roles: Customer, Warehouse Mgr, Company Mgr, Admin 22 23. Supply Chain ManagementCWF Director CONFLuEnCE: Implementation and Application Design 23 24. Astroshelf A collaboration platform for astrophysicists Annotate sky objects and events. CONFLuEnCE: Live annotations & Integration.Astroshelf team: Liz Marai Timothy Luciani Rebecca Hachey Roxana Gheorghiu Boyu SunAstronomers: Arthur Kosowsky Jeffrey Newman Michael Wood-Vasley Brian CherincaCONFLuEnCE: Implementation and Application Design Anja Weyant 24 25. AstroshelfCONFLuEnCE: Implementation and Application Design 25 26. Conclusions The Continuous Workflow model Foundation for CONFLuEnCE CONtinuous workFLow ExeCution Engine Built on top of Kepler Includes a new director, windowed receiver and, source actors enabling Push communication. Two Monitoring and Collaborative Applicationimplementations. Future: Design a director, which implementsscheduling, sensitive to QoS requirements. CONFLuEnCE: Implementation and Application Design 26 27. Supported by NSF grants: IIS-0534531 and OIA-1028162http://db.cs.pitt.edu/group/projects/confluencehttp://db.cs.pitt.edu/group/projects/astroshelf Special thanks to:Astroshelf team: Astronomy collaborators: Liz Marai, Arthur Kosowsky, Timothy Luciani, Jeffrey Newman, Rebecca Hachey, Michael Wood- Roxana GheorghiuVasley, Brian Cherinca, Anja Weyant. CONFLuEnCE: Implementation and Application Design27 28. Conclusions The Continuous Workflow model Foundation for CONFLuEnCE CONtinuous workFLow ExeCution Engine Built on top of Kepler Includes a new director, windowed receiver and, source actors enabling Push communication. Two Monitoring and Collaborative Applicationimplementations. Future: Design a director, which implementsscheduling, sensitive to QoS requirements. http://db.cs.pitt.edu/group/projects/confluence http://db.cs.pitt.edu/group/projects/astroshelf28 29. Workflows vs. DSMS vs. CWfsDSMS CWfsWFs StaticconfigurationFlexibility QoS/QoDGeneral purposedriven Declarative & StreamProcedural processingHumanintegration Declarative FeedbackLoopsCONFLuEnCE: Implementation and Application Design29