Post on 12-Jun-2015
© EnerNOC Inc.
Mongo for the Mission-Critical Enterprise
Thom NicholsPrincipal Software Engineer
Advanced Technology@thom_nic
2
What am I talking about?
Integrating Mongo into a mission-critical system
• EnerNOC provides a 24/ 7 system to help ensure electric grid stability
• How do we introduce Mongo to help meet scalability needs?
• How do we do so in a way that ensures stability?
3
Background
We provide stability to the electric grid.
How we do it:
1. Instability on the grid (peaking load, transmission line failure)
2. Grid operator sends a “demand response” signal to us
3. We send a signal to our customers (large electric consumers)
4. Customers reduce their energy usage
5. ???
6. Profit!
We monitor customer telemetry in near-realtime
4
Background
• We’ve experienced tremendous growth
• Software (esp. database) struggles to keep up
• Hundreds of tables
• Hibernate Inefficient queries
• Our solution? Make everything a stored proc
• Better, except…
• A few tables take up 99% of our storage
• It’s still slow
• Looking at $$millions in hardware cost to scale
5
What’s worse than a database that won’t scale?
Two databases that won’t scale.
6
Planning for Mongo
We listened
• Replication from day 1
• Sharding shortly after
7
Planning for Mongo
• Bucketing
• Pre-allocated stubbed out documents
• Hashed IDs for data locality
pid = 1234 # natural key
hash_pid = pid.to_s.rjust(15,'0').reverse!.to_i
_id = { :p => hash_pid, :s => Date.new }
8
The Software Architect’s Haiku
Avoid vendor lock-in
Mongo might solve scaling problems
But increase dependencies
9
What did we do?
Create a Data Management Service
• Common HTTP interface for data in, data out
• Hides ID hashing & bucketing
• Interface for on-the-fly aggregation
• Caching behind the service
• Native client libraries – Java, Ruby, R
10
The Basic Idea
Woo! Problem solved!
11
So What?
12
Ok, it’s not quite that simple…
13
What’s the point? Centralized Auth
14
15
API Documentation
16
Java API Method Example
17
What about the other services?
Independently scalable based on usage
18
Aggregation – Long running service operations
19
Scaling the Architecture
20
Scaling the Architecture
21
Questions?
Thom Nichols
@thom_nic