MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and...
-
Upload
mongodb -
Category
Technology
-
view
231 -
download
6
description
Transcript of MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and...
![Page 1: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/1.jpg)
Consulting Engineer, MongoDBBryan Reinero
#ConferenceHashTag
Time Series Data- Part 2Aggregations in Action
![Page 2: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/2.jpg)
Real Time Traffic Data Project
Our network of 16,000 speed sensors report data every minute.
![Page 3: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/3.jpg)
What we want from our data
Charting and Trending
![Page 4: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/4.jpg)
What we want from our data
Historical & Predictive Analysis
![Page 5: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/5.jpg)
What we want from our data
Real Time Traffic Dashboard
![Page 6: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/6.jpg)
Document Structure
{ _id: ObjectId("5382ccdd58db8b81730344e2"),linkId: 900006,date: ISODate("2014-03-12T17:00:00Z"),data: [ { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, ... ], conditions: { status: "Snow / Ice Conditions", pavement: "Icy Spots", weather: "Light Snow" }}
![Page 7: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/7.jpg)
Sample Document Structure
Compound, uniqueIndex identifies theIndividual document
{ _id: ObjectId("5382ccdd58db8b81730344e2"),linkId: 900006,date: ISODate("2014-03-12T17:00:00Z"),data: [ { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, ... ], conditions: { status: "Snow / Ice Conditions", pavement: "Icy Spots", weather: "Light Snow" }}
![Page 8: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/8.jpg)
Sample Document Structure
Saves an extra index{ _id: “900006:14031217”, data: [ { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, ... ], conditions: { status: "Snow / Ice Conditions", pavement: "Icy Spots", weather: "Light Snow" }}
![Page 9: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/9.jpg)
{ _id: “900006:14031217”, data: [ { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, ... ], conditions: { status: "Snow / Ice Conditions", pavement: "Icy Spots", weather: "Light Snow" }}
Sample Document Structure
Range queries:/^900006:1403/
Regex must be left-anchored &case-sensitive
![Page 10: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/10.jpg)
{ _id: “900006:140312”, data: [ { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, { speed: NaN, time: NaN }, ... ], conditions: { status: "Snow / Ice Conditions", pavement: "Icy Spots", weather: "Light Snow" }}
Sample Document Structure
Pre-allocated,60 element array of per-minute data
![Page 11: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/11.jpg)
Charts
Mon Mar 10 2014 04:57:00 GMT-0700 (PDT)Tue Mar 11 2014 06:30:00 GMT-0700 (PDT)Wed Mar 12 2014 07:04:00 GMT-0700 (PDT)0
10203040506070
Chart Title
Series1
db.linkData.find( { _id : /^20484097:2014031/ } )
![Page 12: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/12.jpg)
Rollups{ _id: "20484097:20140204", hours: [ { speed: { sum: 1889, count: 60 } time: { sum: 20562, count: 60 }, conditions: { status: "Snow / Ice Conditions", pavement: "Icy Spots", weather: "Light Snow" } }, { speed: {m: 1892, count: 60 }, time: {sum: 20442, count: 60 }, conditions: { status: "Snow / Ice Conditions", pavement: "Slush", weather: "Light Snow" } } ]}
![Page 13: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/13.jpg)
Document retention
Doc per hour
Doc per day
2 days
2 months1year
Doc per Month
![Page 14: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/14.jpg)
Analysis with The Aggregation Framework
![Page 15: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/15.jpg)
Pipelining operations
grep |sort | uniq
Piping command line operations
![Page 16: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/16.jpg)
Pipelining operations
$match $group | $sort|
Piping aggregation operations
Stream of documents Result documents
![Page 17: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/17.jpg)
What is the average speed for a given road segment?
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1 } } , { $unwind: "$data"}, { $group: { _id: “”, ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
![Page 18: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/18.jpg)
What is the average speed for a given road segment?
Select documents on the target segment
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
![Page 19: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/19.jpg)
What is the average speed for a given road segment?
Keep only the fields we really need
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
![Page 20: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/20.jpg)
What is the average speed for a given road segment?
Loop over the array of data points
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
![Page 21: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/21.jpg)
What is the average speed for a given road segment?
Use the handy $avg operator
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
![Page 22: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/22.jpg)
More Sophisticated Pipelines: average speed with variance
{ "$project" : { mean: "$meanSpd", spdDiffSqrd : { "$map" : { "input": { "$map" : { "input" : "$speeds", "as" : "samp", "in" : { "$subtract" : [ "$$samp", "$meanSpd" ] } } }, as: "df", in: { $multiply: [ "$$df", "$$df" ] }} } } },{ $unwind: "$spdDiffSqrd" },{ $group: { _id: mean: "$mean", variance: { $avg: "$spdDiffSqrd" } } }
![Page 23: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/23.jpg)
Historic Analysis
How does weather and road conditions affect traffic?
The Ask: what are the average speeds per weather, status and pavement
![Page 24: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/24.jpg)
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
![Page 25: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/25.jpg)
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
“Snow”, 34
![Page 26: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/26.jpg)
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
“Icy spots”, 34
![Page 27: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/27.jpg)
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
“Delays”, 34
![Page 28: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/28.jpg)
MapReduce
![Page 29: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/29.jpg)
MapReduce
Weather: “Rain”, speed: 44
![Page 30: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/30.jpg)
MapReduce
Weather: “Rain”, speed: 39
![Page 31: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/31.jpg)
MapReduce
Weather: “Rain”, speed: 46
![Page 32: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/32.jpg)
MapReduce
function reduce ( key, values ) {
var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }
![Page 33: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/33.jpg)
MapReduce
function reduce ( key, values ) {
var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }
![Page 34: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/34.jpg)
Resultsresults: [{ "_id" : "Generally Clear and Dry Conditions", "value" : { "count" : 902, "speedSum" : 45100 } }, { "_id" : "Icy Spots", "value" : { "count" : 242, "speedSum" : 9438 } }, { "_id" : "Light Snow", "value" : { "count" : 122, "speedSum" : 7686 } }, { "_id" : "No Report", "value" : { "count" : 782, "speedSum" : NaN } }
![Page 35: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/35.jpg)
Processing Large Data Sets
• Need to break data into smaller pieces• Process data across multiple nodes
Hadoop
Hadoop Hadoop Hadoop
Hadoop Hadoop Hadoop Hadoo
pHadoo
p
Hadoop
![Page 36: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/36.jpg)
Benefits of the Hadoop Connector
• Increased parallelism• Access to analytics libraries• Separation of concerns• Integrates with existing tool chains
![Page 37: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/37.jpg)
• Drivers will be accessing the data via web, mobile devices, and navigation systems
• We need to provide current average speed, travel time and weather per road segment
Real-time Dashboard
![Page 38: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/38.jpg)
Current Real-Time Conditions
Last ten minutes of speeds and times
{ _id : “I-87:10656”, description : "NYS Thruway Harriman Section Exits 14A - 16", update : ISODate(“2013-10-10T23:06:37.000Z”), speeds : [ 52, 49, 45, 51, ... ], times : [ 237, 224, 246, 233,... ], pavement: "Wet Spots", status: "Wet Conditions", weather: "Light Rain”, averageSpeed: 50.23, averageTime: 234, maxSafeSpeed: 53.1, location" : { "type" : "LineString", "coordinates" : [ [ -74.056, 41.098 ], [ -74.077, 41.104 ] }}
![Page 39: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/39.jpg)
{ _id : “I-87:10656”, description : "NYS Thruway Harriman Section Exits 14A - 16", update : ISODate(“2013-10-10T23:06:37.000Z”), speeds : [ 52, 49, 45, 51, ... ], times : [ 237, 224, 246, 233,... ], pavement: "Wet Spots", status: "Wet Conditions", weather: "Light Rain”, averageSpeed: 50.23, averageTime: 234, maxSafeSpeed: 53.1, location" : { "type" : "LineString", "coordinates" : [ [ -74.056, 41.098 ], [ -74.077, 41.104 ] }}
Current Real-Time Conditions
Pre-aggregated metrics
![Page 40: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/40.jpg)
{ _id : “I-87:10656”, description : "NYS Thruway Harriman Section Exits 14A - 16", update : ISODate(“2013-10-10T23:06:37.000Z”), speeds : [ 52, 49, 45, 51, ... ], times : [ 237, 224, 246, 233,... ], pavement: "Wet Spots", status: "Wet Conditions", weather: "Light Rain”, averageSpeed: 50.23, averageTime: 234, maxSafeSpeed: 53.1, location" : { "type" : "LineString", "coordinates" : [ [ -74.056, 41.098 ], [ -74.077, 41.104 ] }}
Current Real-Time Conditions
Geo-spatially indexed road segment
![Page 41: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/41.jpg)
db.linksAvg.update( {"_id" : linkId}, { "$set" : {"update " : date}, "$push" : { "times" : { "$each" : [ time ], "$slice" : -10 }, "speeds" : {"$each" : [ speed ], "$slice" : -10} }})
Maintaining the current conditions
Each update pops the last element off the array and pushes the new value
![Page 42: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/42.jpg)
Putting it all together
![Page 43: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/43.jpg)
Patterns common to time series data:• You need to store and manage an incoming
stream of data samples• You need to compute derivative data sets
based on these samples• You need low latency access to up-to-date
data
![Page 44: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/44.jpg)
Patterns common to time series data:• You need to store and manage an incoming
stream of data samples• You need to compute derivative data sets
based on these samples• You need low latency access to up-to-date
dataIntroducing The High Volume Data Feed
![Page 45: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/45.jpg)
HVDF: Reference Implementation
Screech -- High Volume Data Feed engine
REST Service
API
Processor Plugins
Inline
Batch
Stream
Channel Data Storage
Raw Channel
Data
Aggregated Rollup
T1
Aggregated Rollup
T2
Query Processor Streaming spout
Custom Stream Processing Logic
Incoming Sample Stream
POST /feed/channel/data
GET /feed/channeldata?time=XXX&range=YYY
Real-time Queries
![Page 46: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/46.jpg)
HVDF:https://github.com/10gen-labs/hvdf
Hadoop Connector:https://github.com/mongodb/mongo-hadoop
![Page 47: MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop](https://reader033.fdocuments.in/reader033/viewer/2022061103/5400244a8d7f724c088b4b1e/html5/thumbnails/47.jpg)
Consulting Engineer, MongoDB Inc.Bryan Reinero
#MongoDBWorld
Thank You