Mongo db 2.4 time series data - Brignoli

38
Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb

description

Le slide di Massimo Brignoli presentate a Codemotion Roma 2014

Transcript of Mongo db 2.4 time series data - Brignoli

Page 1: Mongo db 2.4 time series data - Brignoli

Time Series Data in MongoDB

Senior Solutions Architect, MongoDB Inc.

Massimo Brignoli

#mongodb

Page 2: Mongo db 2.4 time series data - Brignoli

Agenda

• What is time series data?

• Schema design considerations

• Broader use case: operational intelligence

• MMS Monitoring schema design

• Thinking ahead

• Questions

Page 3: Mongo db 2.4 time series data - Brignoli

What is time series data?

Page 4: Mongo db 2.4 time series data - Brignoli

Time Series Data is Everywhere

• Financial markets pricing (stock ticks)

• Sensors (temperature, pressure, proximity)

• Industrial fleets (location, velocity, operational)

• Social networks (status updates)

• Mobile devices (calls, texts)

• Systems (server logs, application logs)

Page 5: Mongo db 2.4 time series data - Brignoli

Time Series Data at a Higher Level

• Widely applicable data model

• Applies to several different “data use cases”

• Various schema and modeling options

• Application requirements drive schema design

Page 6: Mongo db 2.4 time series data - Brignoli

Time Series Data Considerations

• Resolution of raw events

• Resolution needed to support– Applications– Analysis– Reporting

• Data retention policies– Data ages out– Retention

Page 7: Mongo db 2.4 time series data - Brignoli

Schema Design Considerations

Page 8: Mongo db 2.4 time series data - Brignoli

Designing For Writing and Reading

• Document per event

• Document per minute (average)

• Document per minute (second)

• Document per hour

Page 9: Mongo db 2.4 time series data - Brignoli

Document Per Event

{

server: “server1”,

load: 92,

ts: ISODate("2013-10-16T22:07:38.000-0500")

}

• Relational-centric approach

• Insert-driven workload

• Aggregations computed at application-level

Page 10: Mongo db 2.4 time series data - Brignoli

Document Per Minute (Average){

server: “server1”,

load_num: 92,

load_sum: 4500,

ts: ISODate("2013-10-16T22:07:00.000-0500")

}

• Pre-aggregate to compute average per minute more easily

• Update-driven workload

• Resolution at the minute-level

Page 11: Mongo db 2.4 time series data - Brignoli

Document Per Minute (By Second){

server: “server1”,

load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }

ts: ISODate("2013-10-16T22:07:00.000-0500")

}

• Store per-second data at the minute level

• Update-driven workload

• Pre-allocate structure to avoid document moves

Page 12: Mongo db 2.4 time series data - Brignoli

Document Per Hour (By Second){

server: “server1”,

load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }

ts: ISODate("2013-10-16T22:00:00.000-0500")

}

• Store per-second data at the hourly level

• Update-driven workload

• Pre-allocate structure to avoid document moves

• Updating last second requires 3599 steps

Page 13: Mongo db 2.4 time series data - Brignoli

Document Per Hour (By Second){

server: “server1”,

load: {

0: {0: 15, …, 59: 45},

….

59: {0: 25, …, 59: 75}

ts: ISODate("2013-10-16T22:00:00.000-0500")

}

• Store per-second data at the hourly level with nesting

• Update-driven workload

• Pre-allocate structure to avoid document moves

• Updating last second requires 59+59 steps

Page 14: Mongo db 2.4 time series data - Brignoli

Characterzing Write Differences

• Example: data generated every second

• Capturing data per minute requires:– Document per event: 60 writes– Document per minute: 1 write, 59 updates

• Transition from insert driven to update driven– Individual writes are smaller– Performance and concurrency benefits

Page 15: Mongo db 2.4 time series data - Brignoli

Characterizing Read Differences

• Example: data generated every second

• Reading data for a single hour requires:– Document per event: 3600 reads– Document per minute: 60 reads

• Read performance is greatly improved– Optimal with tuned block sizes and read ahead– Fewer disk seeks

Page 16: Mongo db 2.4 time series data - Brignoli

MMS Monitoring Schema Design

Page 17: Mongo db 2.4 time series data - Brignoli

MMS Monitoring

• MongoDB Management System Monitoring

• Available in two flavors– Free cloud-hosted monitoring– On-premise with MongoDB Enterprise

• Monitor single node, replica set, or sharded cluster deployments

• Metric dashboards and custom alert triggers

Page 18: Mongo db 2.4 time series data - Brignoli

MMS Monitoring

Page 19: Mongo db 2.4 time series data - Brignoli

MMS Monitoring

Page 20: Mongo db 2.4 time series data - Brignoli

MMS Application Requirements

Resolution defines granularity of stored data

Range controls the retention policy, e.g. after 24 hours only 5-minute resolution

Display dictates the stored pre-aggregations, e.g. total and count

Page 21: Mongo db 2.4 time series data - Brignoli

Monitoring Schema Design

• Per-minute document model

• Documents store individual metrics and counts

• Supports “total” and “avg/sec” display

{ timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 }}

Page 22: Mongo db 2.4 time series data - Brignoli

Monitoring Data Updates

• Single update required to add new data and increment associated counts

db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} })

Page 23: Mongo db 2.4 time series data - Brignoli

Monitoring Data Management

• Data stored at different granularity levels for read performance

• Collections are organized into specific intervals

• Retention is managed by simply dropping collections as they age out

• Document structure is pre-created to maximize write performance

Page 24: Mongo db 2.4 time series data - Brignoli

Use Case: Operational Intelligence

Page 25: Mongo db 2.4 time series data - Brignoli

What is Operational Intelligence

• Storing log data– Capturing application and/or server generated

events

• Hierarchical aggregation– Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly

• Pre-aggregated reports– Processing data to generate reporting from raw

events

Page 26: Mongo db 2.4 time series data - Brignoli

Storing Log Data

{ _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"}

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”

Page 27: Mongo db 2.4 time series data - Brignoli

Pre-Aggregation

• Analytics across raw events can involve many reads

• Alternative schemas can improve read and write performance

• Data can be organized into more coarse buckets

• Transition from insert-driven to update-driven workloads

Page 28: Mongo db 2.4 time series data - Brignoli

Pre-Aggregated Log Data{ timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 }}

• Leverage time-series style bucketing

• Track individual metrics (ex. page views)

• Improve performance for reads/writes

• Minimal processing overhead

Page 29: Mongo db 2.4 time series data - Brignoli

Hierarchical Aggregation

• Analytical approach as opposed to schema approach– Leverage built-in Aggregation Framework or

MapReduce

• Execute multiple tasks sequentially to aggregate at varying levels

• Raw events Hourly Weekly Monthly

• Rolling approach distributes the aggregation workload

Page 30: Mongo db 2.4 time series data - Brignoli

Thinking Ahead

Page 31: Mongo db 2.4 time series data - Brignoli

Before You Start

• What are the application requirements?

• Is pre-aggregation useful for your application?

• What are your retention and age-out policies?

• What are the gotchas?– Pre-create document structure to avoid

fragmentation and performance problems– Organize your data for growth – time series data

grows fast!

Page 32: Mongo db 2.4 time series data - Brignoli

Down The Road

• Scale-out considerations– Vertical vs. horizontal (with sharding)

• Understanding the data– Aggregation– Analytics– Reporting

• Deeper data analysis– Patterns– Predictions

Page 33: Mongo db 2.4 time series data - Brignoli

Scaling Time Series Data in MongoDB

• Vertical growth– Larger instances with more CPU and memory– Increased storage capacity

• Horizontal growth– Partitioning data across many machines– Dividing and distributing the workload

Page 34: Mongo db 2.4 time series data - Brignoli

Time Series Sharding Considerations

• What are the application requirements?– Primarily collecting data– Primarily reporting data– Both

• Map those back to– Write performance needs– Read/write query distribution– Collection organization (see MMS Monitoring)

• Example: {metric name, coarse timestamp}

Page 35: Mongo db 2.4 time series data - Brignoli

Aggregates, Analytics, Reporting

• Aggregation Framework can be used for analysis– Does it work with the chosen schema design?– What sorts of aggregations are needed?

• Reporting can be done on predictable, rolling basis– See “Hierarchical Aggregation”

• Consider secondary reads for analytical operations– Minimize load on production primaries

Page 36: Mongo db 2.4 time series data - Brignoli

Deeper Data Analysis

• Leverage MongoDB-Hadoop connector– Bi-directional support for reading/writing– Works with online and offline data (e.g. backup

files)

• Compute using MapReduce– Patterns– Recommendations– Etc.

• Explore data– Pig– Hive

Page 37: Mongo db 2.4 time series data - Brignoli

Questions?

Page 38: Mongo db 2.4 time series data - Brignoli

Resources

• Schema Design for Time Series Data in MongoDBhttp://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

• Operational Intelligence Use Casehttp://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence

• Data Modeling in MongoDBhttp://docs.mongodb.org/manual/data-modeling/

• Schema Design (webinar)http://www.mongodb.com/events/webinar/schema-design-oct2013