Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Realtime Analytics with MongoDB - MongoDB Meetup NYC
-
Upload
jared-rosoff -
Category
Technology
-
view
14.439 -
download
6
description
Transcript of Realtime Analytics with MongoDB - MongoDB Meetup NYC
![Page 1: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/1.jpg)
Yottaa Inc. 2 Canal Park 5th FloorCambridge MA 02141http://www.yottaa.com
Realtime Analytics with MongoDB & Rails
Jared Rosoff@forjared
![Page 2: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/2.jpg)
2
Overview
• About Yottaa • Engineering challenges• Approaches we considered• How we did it • How it works
![Page 3: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/3.jpg)
©Yottaa Confidential. Do Not Distribute.
Who’s driving your website?
3
Is your website slow?http://stop-the-damage.com/2010/08/276/
![Page 4: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/4.jpg)
©Yottaa Confidential. Do Not Distribute.
We can help you make it faster
4
OMG!! 15 seconds?
WTF?
![Page 5: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/5.jpg)
©Yottaa Confidential. Do Not Distribute.
Knowing is half the battle
5
San Francisco
Washington DC
London
RFC2616
![Page 6: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/6.jpg)
©Yottaa Confidential. Do Not Distribute.
Data data everywhere
• We collect lots of data– 14,000+ URLs being tracked– Up to 300 samples per URL per day– Some samples are >1mb (firebug)– Missing a sample isn’t a big deal
• We try to make everything real time– No batch jobs, everything displayed as it
happens– “Check Now” button runs tests on
demand6
![Page 8: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/8.jpg)
8
Engineering Challenges
• High write volume from day 1– Sample collection is like having millions of users on the
first day – After 60 days, we have > 150GB of data– Adding about 5gb / day today
• Small engineering team – 1 built data ware house & portal, 1 built monitoring
agents– Bigger team now, but this was how we started
• Must be Agile – We didn’t know exactly what features we’d need– Requirements change daily
• Limited operations budget– No full time operations staff– 100% in the cloud
![Page 9: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/9.jpg)
©Yottaa Confidential. Do Not Distribute.
Rails default architecture
MySQL
Data Source Collection Server
User Reporting Server
“Just” a Rails App
Performance Bottleneck: Too much load
![Page 10: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/10.jpg)
©Yottaa Confidential. Do Not Distribute.
Let’s add replication!
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Replication
Data Source Collection Server
User Reporting Server
Off the shelf!Scalable Reads!
Performance Bottleneck: Still can’t scale
writes
![Page 11: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/11.jpg)
©Yottaa Confidential. Do Not Distribute.
What about sharding?
MySQLMasterMySQL
MasterMySQLMaster
Data Source Collection Server
User Reporting Server
Shar
ding
Shar
ding
Scalable Writes!
Development Bottleneck:
Need to write custom code
![Page 12: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/12.jpg)
©Yottaa Confidential. Do Not Distribute.
Key Value stores to the rescue?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Scalable Writes!
Development Bottleneck:
Reporting is limited / hard
![Page 13: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/13.jpg)
©Yottaa Confidential. Do Not Distribute.
Can I Hadoop my way out of this?
MySQLMasterMySQL
MasterCassandra
orVoldemort
Data Source Collection Server
User Reporting Server
Hadoop
MySQLMasterMySQL
MasterMySQLSlave
MySQLMaster
Scalable Writes!
Flexible Reports!
“Just” a Rails App
Development Bottleneck:
Too many systems!
![Page 14: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/14.jpg)
©Yottaa Confidential. Do Not Distribute.
MongoDB!
MySQLMasterMySQL
MasterMongoDB
Data Source Collection Server
User Reporting Server
Scalable Writes!
“Just” a rails app
Flexible Reporting!
![Page 15: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/15.jpg)
MongoD
MongoD
MongoD
Data Source
App Server
CollectionN
ginx
Pass
enge
r
Mon
gos
ReportingUser
Sharding!
High ConcurrencyScale-Out
LoadBalancer
Easy as Rails!
![Page 16: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/16.jpg)
3 Steps to Real Time Analytics
16
1. Collect data 2. Store Data 3. Display Reports
![Page 17: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/17.jpg)
3 Steps to Real Time Analytics
17
1. Collect data 2. Store Data 3. Display Reports
![Page 18: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/18.jpg)
Collecting Data
18
Data Source
Collection ServerData
Source
Data Source
Collection Server
Collection Server
Collection Server
Load Balancer
POST http://collector.com/samples
We use Amazon ELB
We use Amazon EC2
![Page 19: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/19.jpg)
Collecting Data
19
- Sample data is passed in body of POST request - Rails makes it really easy to parse JSON, XML, YML (we use JSON)- We have a bunch of other stuff that happens when data arrives, but
all you really need to do is write the data
![Page 20: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/20.jpg)
A Sample Sample!
20
{ url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234}
![Page 21: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/21.jpg)
A more complicated example
21
![Page 22: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/22.jpg)
22
"{\"location\":\"aws-us-east\",\"timestamp\":\"08/05/2010 07:11:54\",\"http_archive\":{\"log\":{\"creator\":{\"name\":\"Firebug\",\"version\":\"1.4.3\"},\"version\":\"1.1\",\"pages\":[{\"title\":\"\\u4e2d\\u56fd\\u7f51\\u7edc\\u7535\\u89c6\\u53f0-CNTV\",\"id\":\"page_0\",\"startedDateTime\":\"2010-08-05T08:11:51.897 01:00\",\"pageTimings\":{\"onContentLoad\":1883,\"onLoad\":2828}}],\"entries\":[{\"timings\":{\"connect\":null,\"wait\":561,\"blocked\":null,\"receive\":19,\"send\":0,\"dns\":0},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":2067,\"content\":{\"size\":4467,\"mimeType\":\"text/html\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":580,\"startedDateTime\":\"2010-08-05T08:11:51.897 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://www.cntv.cn/\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}},{\"timings\":{\"connect\":null,\"wait\":188,\"blocked\":null,\"receive\":1,\"send\":0,\"dns\":0},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":740,\"content\":{\"size\":740,\"mimeType\":\"image/jpeg\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":370,\"startedDateTime\":\"2010-08-05T08:11:52.481 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_bg.jpg\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}},{\"timings\":{\"connect\":null,\"wait\":3,\"blocked\":null,\"receive\":1,\"send\":0,\"dns\":1280},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":2933,\"content\":{\"size\":7377,\"mimeType\":\"application/x-javascript\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":1285,\"startedDateTime\":\"2010-08-05T08:11:52.483 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://www.cctv.com/Library/a2.js\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}},{\"timings\":{\"connect\":null,\"wait\":171,\"blocked\":null,\"receive\":83,\"send\":0,\"dns\":363},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":76508,\"content\":{\"size\":76508,\"mimeType\":\"image/png\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":716,\"startedDateTime\":\"2010-08-05T08:11:52.489 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_top.png\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}},{\"timings\":{\"connect\":null,\"wait\":156,\"blocked\":null,\"receive\":1,\"send\":0,\"dns\":472},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":5351,\"content\":{\"size\":5351,\"mimeType\":\"image/png\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":629,\"startedDateTime\":\"2010-08-05T08:11:52.490 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_link.png\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}},{\"timings\":{\"connect\":null,\"wait\":147,\"blocked\":null,\"receive\":0,\"send\":0,\"dns\":470},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":2068,\"content\":{\"size\":2068,\"mimeType\":\"image/png\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":617,\"startedDateTime\":\"2010-08-05T08:11:52.492 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://www.cntv.cn/nettv/homepage2010/globalhomepage_image/r_bottom.png\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}},{\"timings\":{\"connect\":null,\"wait\":278,\"blocked\":null,\"receive\":1,\"send\":0,\"dns\":667},\"response\":{\"statusText\":\"OK\",\"headersSize\":-1,\"httpVersion\":\"HTTP/1.1\",\"bodySize\":43,\"content\":{\"size\":43,\"mimeType\":\"image/gif\"},\"status\":200,\"redirectURL\":\"\"},\"cache\":{},\"pageref\":\"page_0\",\"time\":947,\"startedDateTime\":\"2010-08-05T08:11:53.777 01:00\",\"request\":{\"headersSize\":-1,\"method\":\"GET\",\"url\":\"http://cntv.wrating.com/a.gif?a=12a411781af&t=&i=-8a7b8e17f.12a411781b0.0.1a46b8aed32bf8&b=http://www.cntv.cn/&c=860010-1101020100&s=1364x768x16&l=en-us&z=1&j=1&f=-&r=http://cntv.cn/&kw=&ut=30&n=&js=0,1.292&ck=1\",\"httpVersion\":\"HTTP/1.1\",\"bodySize\":-1}}],\"browser\":{\"name\":\"Firefox\",\"version\":\"3.5.8\"}}},\"url\":\"http://cntv.cn\"}
![Page 23: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/23.jpg)
3 Steps to Real Time Analytics
23
1. Collect data 2. Store Data 3. Display Reports
![Page 24: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/24.jpg)
Thinking in rows
24
URL
Location Connect First Byte
Last Byte Timestamp{ url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }
{ url: ‘www.google.com’, location: “NYC” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }
![Page 25: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/25.jpg)
Thinking in rows
25
URL
Location Connect First Byte
Last Byte Timestamp
What was the average connect time for google on friday?
From SFO?From NYC?Between 1AM-2AM?
![Page 26: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/26.jpg)
Thinking in rows
26
URL
Location Connect First Byte
Last Byte Timestamp
AVG
AVG
AVG
Day 1
Day 2
Day 3
Result
Up to 100’s of samples per
URL per day!!
30 days average query
range
An “average” chart had to hit
600 rows
![Page 27: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/27.jpg)
Thinking in Documents
27
URL www.google.com
Day 9/20/2010
Last Byte
Sum 2312
Count 12
SFO
NYC
Sum 1200
Count 5
Sum 1112
Count 7
This document contains all data for www.google.com collected during 9/20/2010
This tells us the average value for this metric for this url / time period
Average value from SFO
Average value from NYC
![Page 28: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/28.jpg)
Storing a sample
28
Create the document if it doesn’t already exist
Update the location specific value
Update the aggregate value
Which document we’re updating
Atomically update the document
db.metrics.dailies.update( { url: ‘www.google.com’,
day: new Date(2010,9,2)}, { ‘$inc’: { ‘connect.sum’:1234,
‘connect.count’:1, ‘connect.sfo.sum’:1234, ‘connect.sfo.count’:1 } }, true // upsert );
![Page 29: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/29.jpg)
An example document
29
{ "_id": ObjectId("4bb55c59c3666e02fc000001"), "url": ”http://www.google.com/", "date": "Mon Jun 07 2010 00:00:00 GMT", "connect":{ "sum": 999, # sum of all the locations "sum_of_squares": 99999, "count": 99, ”san_francisco":{ "sum": 555, # sum of this location "sum_of_squares": 55555, "count": 55, "values": [ [”Mon Jun 07 2010 20:00:00 GMT", 12], [”Mon Jun 07 2010 20:10:00 GMT", 13], ......... ] },
![Page 30: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/30.jpg)
Putting it together
30
{ url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }
Atomically update the daily
data
1
Atomically update the
weekly data
2
Atomically update the
monthly data
3
![Page 31: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/31.jpg)
Sharding our Data
31
Shard 1
Shard 2
Shard 3
Shard 4
Reporting Server
Collection Server
URL 1
URL 2
URL 3
URL 4
URL 5
URL 6
URL 7
URL 8
Shard by URL
Write load evenlydistributed
Most reads hit a single shard
![Page 32: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/32.jpg)
3 Steps to Real Time Analytics
32
1. Collect data 2. Store Data 3. Display Reports
![Page 33: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/33.jpg)
Drawing connect time graph
33
We just want connect time data. But we can include as many metrics as we want
Data for google
The range of dates for the chart
Compound index to make this query fast
db.metrics.dailies.ensureIndex({url:1,day:-1})
db.metrics.dailies.find( { url: ‘www.google.com’,
day: { “$gte”: new Date(2010,9,1), “$lte”: new Date(2010,9,30)},
{ ‘connect’:true});
![Page 34: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/34.jpg)
More efficient charts
34
URL Day <data>
AVG
AVG
AVG
Day 1
Day 2
Day 3
Result
1 Document per URL per
Day
30 days == 30 documents
Average chart hits 30
documents.
20x fewer
![Page 35: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/35.jpg)
Real Time UpdatesURL Most Recent Data
Single query to fetch all metric data for a URL
Fast enough that browser can poll
constantly for updated data without impacting
server
![Page 36: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/36.jpg)
Evaluation
36
• High write volume– Currently handling 1000’s of db writes per second on a
single MongoDB server – Adding ~5GB per day
• Small Engineering Team – Core system built by 2 engineers in <1 month
• Agile – BDD using Rails
• Limited operations budget– Runs on a handful of EC2 instances– No major issues
![Page 37: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/37.jpg)
Final thoughts
37
• Love MongoDB. (It’s now my default when starting a new project)
• Using MongoMapper as ORM, but think there must a better way, more in tune with document model rather than a port of AR
• There’s magic in documents but it requires thinking about your data in new ways.
![Page 38: Realtime Analytics with MongoDB - MongoDB Meetup NYC](https://reader038.fdocuments.in/reader038/viewer/2022102613/5554ac20b4c90502618b52ce/html5/thumbnails/38.jpg)
38
Q & AThank you for viewing