Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3....

59
Requirements for Time Series Data Platforms: A Technology Overview with Benchmarks April 25 th , 2018 Albuquerque, NM Sean Patrick Murphy Michael Andersen Kevin Jones

Transcript of Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3....

Page 1: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Requirements for Time Series Data Platforms: A Technology Overview with Benchmarks

April 25th, 2018Albuquerque, NM

Sean Patrick Murphy Michael Andersen Kevin Jones

Page 2: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• Platform Requirements - 4 Key Drivers1. The Nature of Time Series2. The Real World is Messy (or Sensors)3. Analytics as First Class Citizens4. Size Matters

• Benchmarking Implications• What is a Platform?

2

Outline

Page 3: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

The Nature of Time Series

3

Page 4: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 44

(timestamp, value)

Page 5: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 5

(timestamp, value)

64-bit integer 64-bit float

[−(263), 263 − 1]Or

[-9,223,372,036,854,775,808, 9,223,372,036,854,775,807]

Page 6: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 6

(1515112200000000000, 50.6285209655761)

(1515112200008333333, 50.6273155212402)

(1515112200016666666, 50.6269416809082)

(1515112200024999999, 50.6258087158203)

(1515112200033333332, 50.6216735839843)

(1515112200041666665, 50.6205940246582)

(1515112200049999998, 50.6227645874023)

(1515112200058333331, 50.6207199096679)

(1515112200066666664, 50.6192970275878)

(1515112200074999997, 50.6227836608886)

(1515112200083333330, 50.6249427795410)

(timestamp, value)

Page 7: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 7

Intrinsic Redundancy

• Can compress timestamps and values– Lossy and Lossless– Intra-stream and Inter-stream

• Many approaches available

• Classic space/time tradeoff• We achieve ~3:1 lossless compression

P. Lindstrom and M. Isenburg. Fast and Efficient Compression of Floating-Point Data. Visualization and Computer Graphics, IEEE Transactions on, 12(5):1245–1250, 2006.P. Ratanaworabhan, J. Ke, and M. Burtscher. Fast Lossless Compression of Scientific Floating-Point Data. In DCC, pages 133–142. IEEE Computer Society, 2006.

Page 8: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 8Time

Write Patterns

Page 9: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 9Time

Write Patterns

Page 10: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

1. Human interaction and exploration of the data

2. Analytics

10

Read Patterns

Page 11: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

The Visual Information-Seeking Mantra [Shneiderman, 1996] summarizes many visual design guidelines and provides an excellent framework for designing information visualization applications.

11

“Overview first, zoom and filter, then details-on-demand.”

Read Patterns

Page 12: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

About 4 billion data points

Random, Multi-Resolution Read Patterns

Page 13: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Random, Multi-Resolution Read Patterns

Page 14: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Random, Multi-Resolution Read Patterns

Page 15: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Random, Multi-Resolution Read Patterns

Page 16: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Random, Multi-Resolution Read Patterns

Page 17: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Random, Multi-Resolution Read Patterns

Page 18: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

50 datapoints(8 orders of magnitude)

Random, Multi-Resolution Read Patterns

Page 19: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Copy on write K-ary Tree Partitioning static time (1933 to 2079)

Leaf nodes- Time, value pairs + length

Internal nodes- Pointers to children- Version annotations for children- Aggregates for children

- Min, Mean, Max, Count- Any associative operator

Berkeley tree Data Structure

Page 20: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

The Real World is Messy

20

Page 21: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:
Page 22: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:
Page 23: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

v2

Page 24: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

v2v3v3

Page 25: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

v2v3v3 v4

Page 26: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

v2v3v3 v4

Q: What changed between v2 and v3?

Page 27: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

v2v3v3 v4

A: These ranges:

Q: What changed between v2 and v3?

Page 28: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

v2v3v3 v4

Q: What changed between v2 and v3?

Change Set

Page 29: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

29

Who Cares?

Page 30: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• Efficiently update calculations on out of order data

30

Who Cares?

Page 31: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• Efficiently update calculations on out of order data

• Idempotent calculations????

31

Who Cares?

Page 32: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• Efficiently update calculations on out of order data

• Idempotent calculations????• Rewinding data arrival to understand and

diagnose problems

32

Who Cares?

Page 33: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Versioning Source Code

Page 34: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Analytics as First Class Citizens

34

Page 35: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 3535

Common Analytics Patterns

Page 36: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 3636

Clean

Common Analytics Patterns

Page 37: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 3737

Clean

Filter

Common Analytics Patterns

Page 38: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 3838

Clean

Filter

Compute Power

Common Analytics Patterns

Page 39: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

39

Directed Acyclic Graph of Calculations

Page 40: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• Windowing operations• Spatial/Frequency Transforms

– Wavelet– Fourier– Stockwell

• Indexing• Clustering• Classification• Categorization• Anomaly/Event/Novelty

Detection• Motif Discovery

40

But Wait, There’s More

Page 41: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 4141 41

Deep Blue beats Gary Kasparov 1997

AlphaGo beats Lee Sedol2016

Watson beats champions2011

Old Paradigm - Software Engineering• Humans write the code

• Limited by ability to describe exactly what must be

done without error

New Paradigm - Machine Learning• Data teaches algorithms to perform function or task

• Limited by the amount of data and algorithms

• Algorithms need ***ALL*** available data

• Capable of tackling high dimensional problems

But Wait, There’s Even More!

Page 42: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• Analytics must be first class citizens of the platform

• The platform must be built from the ground up to support relevant analytic use cases

42

Who Cares?

Page 43: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Size Matters

43

Page 44: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 44

# o

f St

ream

sMeasurements Per Second

Sampling RatesPMU DFRSCADA

Page 45: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 45

# o

f St

ream

sRequired Network Bandwidth (Lower Bound)

Sampling RatesPMU DFRSCADA

Page 46: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 46

# o

f St

ream

s

Sampling RatesPMU DFRSCADA

Annual Data Volumes (Lower Bound)

Page 47: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Big Data Conclusions

• Don’t move the data, move the calculations.– (Why analytics are first class citizens)

• We are going to need a bigger [machine(s)].

Page 48: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 4848

Two Options

Page 49: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 4949

2003

Map Reduce

Distributed file system over commodity hardware.

2004

Big Table

Spanner

Google File System

Google’s globally distributed database.

Pregel

A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing.

2006 20152010 2013

The Dataflow

Model

Distributed processing framework to simplify parallel programming tasks.

A distributed storage system for structured data.

A large scale graph processing system.

How this Played Out

Page 50: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

• No single point of failure/resilient• Much more cost effective• Buy more capacity when you need it• Flexible and adaptable

50

Who Cares?

Page 51: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Benchmark Considerations

51

Page 52: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 52

Core Benchmarks – Reading and Writing Data

Page 53: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 53

Analytics Benchmarks

Page 54: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

What is a Platform?

54

Page 55: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 5555

Time Series

Not Time Series

“Small” Data “Big” Data

What isn’t a Time Series Platform?

Page 56: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 5656

What Does a Time Series Data Platform Do?

Ingest/EgestClean/Condition/Fix

StoreVisualize

Access/Use/Build WithAnalyze

Learn From

Page 57: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 5757

The Goal of the Platform

Allow utility subject matter experts

to create value from (sensor) data.

Page 58: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Spring 2018 NASPI | 5858

“A platform is a business based on enabling value-creating interactions between external producers and consumers. The platform provides an open, participative infrastructure for these interactions and sets governance conditions for them. The platform’s overarching purpose: to consummate matches among users and facilitate the exchange of goods, services, or social currency, thereby enabling value creation for all participants.

Strategy has moved from controlling unique internal resources and erecting competitive barriers to orchestrating external resources and engaging vibrant communities. And innovation is no longer the province of in-house experts and research and development labs, but is produced through crowdsourcing and the contribution of ideas by independent participants in the platform. External resources don’t completely replace internal resources—more often they serve as a complement. But platform firms emphasize ecosystem governance more than product optimization, and persuasion of outside partners more than control of internal employees.”

Platform?

Page 59: Requirements for Time Series Data Platforms...v3. v2. v4. Q: What changed between v2 and v3? v3. v3. v2. v4. A: These ranges: Q: What changed between v2 and v3? v3. v3. v2. v4. Q:

Pre

dict

iveG

ridT

M Diverse Sensor Population

𝑓𝑓 𝑥𝑥,𝑦𝑦, 𝑧𝑧

𝑉𝑉𝑉𝑉 cosθ

Hi-speed Historical Data

Ingest

Deep Learning and Machine Learning at Scale

Spreadsheets

Interactive Data Exploration and Analysis

Rapid Deployment of Production Analytics

Ingest Engine

Modeling and Simulation

Connectivity

Real-Time & Historical Analytic Computing

Time Series Datastore

Bespoke Web and Mobile Applications

Fusion of Simulation and Sensor Data

C37.118

GEP/STTP

COMTRADE

CSV, .d, .d2

IngestorN

BTrDB

𝑃𝑃 + 𝑗𝑗𝑗𝑗

𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝑁𝑁

Customizable Real-Time Dashboards

Human-scale Data Engagement

API Layer

Analytic Pipeline

PMU, uPMU, DFR,

PQube

Hi-speed Data Export

Multi-format Data Export

[email protected]