Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June...

Streaming Data, Continuous Queries, and Adaptive Dataflow

Michael FranklinUC Berkeley

NRC June 2002

Data Stream ProcessingNetworked data streams central to current and

future computing.Existing data management and query processing

infrastructure is lacking:– Adaptability– Continuous and Incremental Processing– Work Sharing for large scale– Resource scalability: from “smart dust” up to

clusters to grids.XML provides additional opportunites.

Example 1: “Transactional Flows”

E-Commerce, clickstream, swipestream, logs…

Network Monitoring B2B and Enterprise apps

– Supply-Chain, CRM, ERP (Quasi) real-time flow of events and data Must manage these flows to drive business

processes. Mine flows to create and adjust business rules. Can also “tap into” flows for on-line analysis.

Example 2: Information Dissemination

User Profiles

Filtered Data

Data Sources

•Doc creation or crawler initiates flow of data towards users.•profiles are aggregated back towards data.

Example 3: Sensor Nets

Tiny (or not so tiny) devices measure the physical world.

– Berkeley “motes”, Smart Dust, Smart Tags, … Many monitoring applications

– Transportation, Seismic, Energy, Military… Form dynamic ad hoc networks. Aggregate and communicate streams of values. Not one way – can actuate to effect or actively

monitor the environment

Common Features Centrality of Dataflow and Data Routing

– Architecture is focused on data movement– Moving streams of data through code in a network

Volatility of the environment– Dynamic resources & topology, partial failures– Long-running (never-ending?) tasks– Potential for user interaction during the flow– Large Scale: users, data, resources, …

Resource Constraints– Bandwidth, memory,processing,battery,…– Time and human attention

In The Beginning

Result

Pub Sub/CQ/Filtering

Queries

Result

•Effectively processes all queries simultaneously.•Shares work for common sub-expressions.

Telegraph/PSoup: Query & Data Duality

Queries

Result

DataData

Telegraph/PSoup: Query & Data Duality

Queries

Result

PSoup – Query Invocation

PSoup continuously maintains materialized views over streaming data and queries.

Data is returned to user when query is invoked.– Invocation requires applying “windows” to precomputed

results. Adaptive approach allows system to continuously

absorb new data and new queries without recompilation.

Lots of issues to study: – Query indexing, Spilling to disk, bulk processing– Other semantics and interaction models (e.g., alerts)

Stream Processing Research Agenda Need continuously-adaptive processing. Need appropriate data model & query lang.

– Window semantics: input and output– Notification semantics & thresholds

Approximation, satisficing, and QoS– must be driven by user needs and context– adapt to available resources & time constraints

Integration & interaction with “pooled” data.– time travel, archiving, “normal” databases

Structured, semi-, and un- data; XML etc. Sensor-sensitive processing. Metrics and Benchmarks (challenge problems).

Conclusions Dataflow and streaming are central to many

emerging application areas.– Solutions require a mixture of database and

networking approaches:adaptivity and tolerance of partial failureexploitation of user, app, and data semantics

A new infrastructure is needed for solving these problems. – Duality of Data and Queries

Currently a topic of major interest in the research community.

Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June...

Documents

Transcript of Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June...

Future Dataflow Bottlenecks

Dataflow Lab

Dataflow Management HiW

Streaming Data, Continuous Queries, and Adaptive Dataflow

Dataflow Architectures for Memcached - Hot · PDF fileMichaela Blott, Kees Vissers ... Dataflow architectures for KVS – Why dataflow architectures – Prototype architecture –

a LRTAP dataflow

PacketScope: Monitoring the Packet Lifecycle Inside a Switchjrex/thesis/ross-teixeira... · 2020-06-28 · Spark-like dataflow language to express these queries. To minimize the overhead

Scalable Stream Processing MillWheel and Cloud Dataflow · 2020-06-16 · Motivation I Google’sZeitgeistpipeline:tracking trendsin web queries I Ingests acontinuous inputof search

ONESouRcE DaTaFLow

VHDL Dataflow and Structural Modeling and Testbencheseng.umb.edu/~cuckov/classes/engin341/Lectures/L02 - VHDL - Dataflow...VHDL –Dataflow and Structural Modeling and Testbenches

dataflow - Lunds tekniska högskola · 3 motivation the need for a parallel programming model dataflow programming actors, dataflow, and the CAL actor language dataflow perspectives

DataFlow & Beam

Haad Dataflow Guide

Dataflow VHDL

t Pl Dataflow

FG Dataflow Diagram

Google Cloud Dataflow

Dataflow I: Dataflow Analysis

Dataflow Diagram

Dataflow Monitoring