Webinar: Time-Series Data in MongoDB

38
Time Series Data in MongoDB Partner Technical Services, MongoDB Inc. Sandeep Parikh #mongodb

description

Time series data can be found everywhere around you, from financial markets to social networks to sensors. There are a multitude of sources of time series data but they have some common attributes: large in volume, ordered by time, and primarily aggregated for access. Time series data is a great fit for MongoDB and in this webinar we will take a closer look at how to model time series data in MongoDB by exploring the schema of a tool that has become very popular in the community: MongoDB Management Service (MMS). We'll walk through different schema design considerations and how those impact the features and functionality of MMS and review workload differences across different designs.

Transcript of Webinar: Time-Series Data in MongoDB

Page 1: Webinar: Time-Series Data in MongoDB

Time Series Data in MongoDB

Partner Technical Services, MongoDB Inc.

Sandeep Parikh

#mongodb

Page 2: Webinar: Time-Series Data in MongoDB

Agenda

• What is time series data?

• Schema design considerations

• Broader use case: operational intelligence

• MMS Monitoring schema design

• Thinking ahead

• Questions

Page 3: Webinar: Time-Series Data in MongoDB

What is time series data?

Page 4: Webinar: Time-Series Data in MongoDB

Time Series Data is Everywhere

• Financial markets pricing (stock ticks)

• Sensors (temperature, pressure, proximity)

• Industrial fleets (location, velocity, operational)

• Social networks (status updates)

• Mobile devices (calls, texts)

• Systems (server logs, application logs)

Page 5: Webinar: Time-Series Data in MongoDB

Time Series Data at a Higher Level

• Widely applicable data model

• Applies to several different “data use cases”

• Various schema and modeling options

• Application requirements drive schema design

Page 6: Webinar: Time-Series Data in MongoDB

Time Series Data Considerations

• Resolution of raw events

• Resolution needed to support– Applications– Analysis– Reporting

• Data retention policies– Data ages out– Retention

Page 7: Webinar: Time-Series Data in MongoDB

Schema Design Considerations

Page 8: Webinar: Time-Series Data in MongoDB

Designing For Writing and Reading

• Document per event

• Document per minute (average)

• Document per minute (second)

• Document per hour

Page 9: Webinar: Time-Series Data in MongoDB

Document Per Event

{

server: “server1”,

load: 92,

ts: ISODate("2013-10-16T22:07:38.000-0500")

}

• Relational-centric approach

• Insert-driven workload

• Aggregations computed at application-level

Page 10: Webinar: Time-Series Data in MongoDB

Document Per Minute (Average){

server: “server1”,

load_num: 92,

load_sum: 4500,

ts: ISODate("2013-10-16T22:07:00.000-0500")

}

• Pre-aggregate to compute average per minute more easily

• Update-driven workload

• Resolution at the minute-level

Page 11: Webinar: Time-Series Data in MongoDB

Document Per Minute (By Second){

server: “server1”,

load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }

ts: ISODate("2013-10-16T22:07:00.000-0500")

}

• Store per-second data at the minute level

• Update-driven workload

• Pre-allocate structure to avoid document moves

Page 12: Webinar: Time-Series Data in MongoDB

Document Per Hour (By Second){

server: “server1”,

load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }

ts: ISODate("2013-10-16T22:00:00.000-0500")

}

• Store per-second data at the hourly level

• Update-driven workload

• Pre-allocate structure to avoid document moves

• Updating last second requires 3599 steps

Page 13: Webinar: Time-Series Data in MongoDB

Document Per Hour (By Second){

server: “server1”,

load: {

0: {0: 15, …, 59: 45},

….

59: {0: 25, …, 59: 75}

ts: ISODate("2013-10-16T22:00:00.000-0500")

}

• Store per-second data at the hourly level with nesting

• Update-driven workload

• Pre-allocate structure to avoid document moves

• Updating last second requires 59+59 steps

Page 14: Webinar: Time-Series Data in MongoDB

Characterzing Write Differences

• Example: data generated every second

• Capturing data per minute requires:– Document per event: 60 writes– Document per minute: 1 write, 59 updates

• Transition from insert driven to update driven– Individual writes are smaller– Performance and concurrency benefits

Page 15: Webinar: Time-Series Data in MongoDB

Characterizing Read Differences

• Example: data generated every second

• Reading data for a single hour requires:– Document per event: 3600 reads– Document per minute: 60 reads

• Read performance is greatly improved– Optimal with tuned block sizes and read ahead– Fewer disk seeks

Page 16: Webinar: Time-Series Data in MongoDB

MMS Monitoring Schema Design

Page 17: Webinar: Time-Series Data in MongoDB

MMS Monitoring

• MongoDB Management System Monitoring

• Available in two flavors– Free cloud-hosted monitoring– On-premise with MongoDB Enterprise

• Monitor single node, replica set, or sharded cluster deployments

• Metric dashboards and custom alert triggers

Page 18: Webinar: Time-Series Data in MongoDB

MMS Monitoring

Page 19: Webinar: Time-Series Data in MongoDB

MMS Monitoring

Page 20: Webinar: Time-Series Data in MongoDB

MMS Application Requirements

Resolution defines granularity of stored data

Range controls the retention policy, e.g. after 24 hours only 5-minute resolution

Display dictates the stored pre-aggregations, e.g. total and count

Page 21: Webinar: Time-Series Data in MongoDB

Monitoring Schema Design

• Per-minute document model

• Documents store individual metrics and counts

• Supports “total” and “avg/sec” display

{ timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 }}

Page 22: Webinar: Time-Series Data in MongoDB

Monitoring Data Updates

• Single update required to add new data and increment associated counts

db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} })

Page 23: Webinar: Time-Series Data in MongoDB

Monitoring Data Management

• Data stored at different granularity levels for read performance

• Collections are organized into specific intervals

• Retention is managed by simply dropping collections as they age out

• Document structure is pre-created to maximize write performance

Page 24: Webinar: Time-Series Data in MongoDB

Use Case: Operational Intelligence

Page 25: Webinar: Time-Series Data in MongoDB

What is Operational Intelligence

• Storing log data– Capturing application and/or server generated

events

• Hierarchical aggregation– Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly

• Pre-aggregated reports– Processing data to generate reporting from raw

events

Page 26: Webinar: Time-Series Data in MongoDB

Storing Log Data

{ _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"}

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”

Page 27: Webinar: Time-Series Data in MongoDB

Pre-Aggregation

• Analytics across raw events can involve many reads

• Alternative schemas can improve read and write performance

• Data can be organized into more coarse buckets

• Transition from insert-driven to update-driven workloads

Page 28: Webinar: Time-Series Data in MongoDB

Pre-Aggregated Log Data{ timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 }}

• Leverage time-series style bucketing

• Track individual metrics (ex. page views)

• Improve performance for reads/writes

• Minimal processing overhead

Page 29: Webinar: Time-Series Data in MongoDB

Hierarchical Aggregation

• Analytical approach as opposed to schema approach– Leverage built-in Aggregation Framework or

MapReduce

• Execute multiple tasks sequentially to aggregate at varying levels

• Raw events Hourly Weekly Monthly

• Rolling approach distributes the aggregation workload

Page 30: Webinar: Time-Series Data in MongoDB

Thinking Ahead

Page 31: Webinar: Time-Series Data in MongoDB

Before You Start

• What are the application requirements?

• Is pre-aggregation useful for your application?

• What are your retention and age-out policies?

• What are the gotchas?– Pre-create document structure to avoid

fragmentation and performance problems– Organize your data for growth – time series data

grows fast!

Page 32: Webinar: Time-Series Data in MongoDB

Down The Road

• Scale-out considerations– Vertical vs. horizontal (with sharding)

• Understanding the data– Aggregation– Analytics– Reporting

• Deeper data analysis– Patterns– Predictions

Page 33: Webinar: Time-Series Data in MongoDB

Scaling Time Series Data in MongoDB

• Vertical growth– Larger instances with more CPU and memory– Increased storage capacity

• Horizontal growth– Partitioning data across many machines– Dividing and distributing the workload

Page 34: Webinar: Time-Series Data in MongoDB

Time Series Sharding Considerations

• What are the application requirements?– Primarily collecting data– Primarily reporting data– Both

• Map those back to– Write performance needs– Read/write query distribution– Collection organization (see MMS Monitoring)

• Example: {metric name, coarse timestamp}

Page 35: Webinar: Time-Series Data in MongoDB

Aggregates, Analytics, Reporting

• Aggregation Framework can be used for analysis– Does it work with the chosen schema design?– What sorts of aggregations are needed?

• Reporting can be done on predictable, rolling basis– See “Hierarchical Aggregation”

• Consider secondary reads for analytical operations– Minimize load on production primaries

Page 36: Webinar: Time-Series Data in MongoDB

Deeper Data Analysis

• Leverage MongoDB-Hadoop connector– Bi-directional support for reading/writing– Works with online and offline data (e.g. backup

files)

• Compute using MapReduce– Patterns– Recommendations– Etc.

• Explore data– Pig– Hive

Page 37: Webinar: Time-Series Data in MongoDB

Questions?

Page 38: Webinar: Time-Series Data in MongoDB

Resources

• Schema Design for Time Series Data in MongoDBhttp://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

• Operational Intelligence Use Casehttp://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence

• Data Modeling in MongoDBhttp://docs.mongodb.org/manual/data-modeling/

• Schema Design (webinar)http://www.mongodb.com/events/webinar/schema-design-oct2013