Data at Scale - Michael Peacock, Cloud Connect 2012

Data at Scale

Data problems and solutions with the connected world

Michael Peacock

Web Systems Developer

Telemetry Team

Smith Electric Vehicles

Lead Developer

Occasional conference speaker

Technical Author

• Worlds largest manufacturer of all electric commercial vehicles

• Founded in 1920• US facility opened 2009• US buyout in 2011

Commercial electric vehicles?

Electric Vehicles

• 16,500 – 26,000 lbs gross vehicle weight• Commercial Electric Delivery Trucks• 7,121 – 16,663 lbs payload• 50 – 240km• Top Speed 80km/h

Electric Vehicles

• New, continually evolving, technology• Viability evidence required• Government research

EV Data

• Performance analysis and metrics• Proving the technology: Government

research• Evaluating driver training conversions• Diagnostics, Service and Warranty Issues• Continuous Improvement

Current Status

• ~500 telemetry enabled vehicles• Telemetry is now fitted as standard in our

vehicles• Our MySQL solution processes:

– 1.5 billion inserts per day– Constant minimum of 4000 inserts per second

CANBus: 101

CANBus and Telemetry

• Sample the buses: once per second• Only sample buses with useful

performance and diagnostic information on them

Vehicle Data• Drive train information:

– Motor speed– Pedal positions– Temperatures– Fault Codes

• Battery information:– Current, Voltage & Power– Capacity– Temperatures

Connected World: The Problem

• Connected infrastructure– EV Charging stations– Utilities

• Home based telemetry– Smart Meters– Smart Homes

Our problem

• Hundreds of connected devices, each with numerous sensors giving us 2,500 pieces of data per second per vehicle

• Broadcast time we can’t plan for• Vehicles rolling off the production line• New requirements for more data

How it started

Issue 1: Availability

Issue 2: Capacity

Sometimes data is too

much to cope with

www.flickr.com/photos/eveofdiscovery/3149008295

Issue 2: Capacity

Option: Cloud Infrastructure

• Cloud based infrastructure gives:– More capacity– More failover– Higher availability

Cloud Infrastructure: Problem

• Huge volumes of data inserts into a MySQL solution: sub-optimal on virtualised environments

• Existing enterprise hardware investment• Security and legal issues for us storing the

data off-site

Cloud Infrastructure: Enabler

www.flickr.com/photos/gadl/89650415/inphotostream

AMQP

Advanced Message Queuing Protocol

Queuing

• Downtime• Capacity• Maintenance Windows

What if...

• Queuing allows us to cope with:– Downtime of our own systems– Capacity problems

• Queuing doesnt allow us to cope with:– An outage of a queuing infrastructure

Buffer

www.flickr.com/photos/brapps/403257780

Cloud based infrastructure

• Use a Message Queue to ensure data is only processed when you have the resources to process it

SAN

• Backbone to most cloud-based systems• Powers our MySQL solution• Supports:

– Huge volumes of data– Lots of processing– Fast connection to your servers– Backups and snapshots

SAN Tips

• When dealing with data on a huge scale every aspect of your application and infrastructure needs to be optimised, this includes your SAN – something which is commonly overlooked.

• http://www.samlambert.com/2011/07/how-to-push-your-san-with-open-iscsi_13.html

New Architecture

Speed: Stream Batch

• Streams of continuously flowing data can be difficult to process

• Turn the stream into small, quick batches

• MySQL: LOAD DATA INFILE

Shard 1: Hardware

• As the amount of data increased, we hit a huge performance problem. This was solved by sharding at a hardware level.

• Each data collection device was given its own database, which could be on any number of separate machines, with a single database acting as a registry

Rationalisation & Extrapolation

• Remember the CANBus– Always telling us information, which we

sample every second?– Do we always need that?

• Extrapolate and assume

Getting information from data

• Vehicle performance information involves:– Looking at 20 – 30 data points for each

second of a vehicles operation in a day– Analysing the data– Performing calculations, which vary

depending on certain data points

• Getting this data was slow– How far did Customer A’s fleet travel last

week?

Regular processing

• Instead of processing data on demand, process it regularly

• Nightly scheduled task to evaluate performance information

Regular Processing: Problems

You need to pull the data out faster and faster than before!

Shard 2: Tables

• All our data has a timestamp associated with it

• Looking up data for a particular day was slow. Very slow.

• We sharded the data again, this time with a table per week within a vehicles specific database

Sharding: Fallbacks and logic

• What about data before you implemented sharding?

• Which table do I need to look at?

Aggregation

• With data segregated on a per vehicle and per week basis, lookups were much faster

• Performance calculations could be scheduled nightly, with a single record recorded for each vehicle for each day in a central database

• Allows for easy aggregation:– How far did my fleet travel last week?– How much energy did they use last month?

Backups and Archives

• SAN backups and snapshots• With date based sharding:

– Dump a table– Copy it elsewhere– Drop it / Flush it (if archiving)

Outsource to the cloud

• Why waste resources doing things that cloud based services do better (where legal, security and privacy reasons allow?)

• Maps• Email delivery• Even phone integration

Data Type Optimization

• When prototyping a system and designing a database schema, its easy to be sloppy with your data types, and fields

• DONT BE• Use as little storage space as you can

– Ensure the data type uses as little as you can– Use only the fields you need

Sharding: An excuse

• Sharding was a large project for us, and involved extensive re-architecting of the system.

• We had to make changes to every query we have in our code

• Gave us an excuse to:– Optimise the queries– Optimise the indexes

Query Optimization

• Run every query through EXPLAIN EXTENDED

• Check it hits the indexes• Remove functions like CURDATE from

queries, to ensure query cache is hit

Index Optimization

• Keep it small• From our legacy days of one database on

one server, we had a column that told us which vehicle the data related to– This was still there...as part of an

index...despite the fact the application hadn’t required it for months

Live data: dashboard

Live data: Maps

Live data

• Original database design dictated:• Each type of data point required a separate

query, sub-query or join to obtain

• Collection device and processing service dictated:• GPS Co-ordinates can be up to 6 separate

data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction

Dashboards: Caching

• Don’t query if you don’t have to

• Cache what you can; access direct

• With message queuing its possible to route messages to two or more places: one to be processed and another to display the latest information directly

Exporting data: Group

• Where possible group exports and reports together by the same shard/table/index

Code considerations

• Race conditions• Number of concurrent requests – group

them

Application Quality

• When dealing with lots of data, quickly, you need to ensure:– You process it correctly– You can act fast if there is a bug– You can act fast when refactoring

Deployment

• When dealing with a stream of data, rolling out new code can mean pausing the processing work that is done

• Put deployment measures in place to make a deployment switch over instantaneous

Technical Tips

• Measure your applications performance, data throughput and so on– A data at scale problem itself

• Use as much RAM on your servers as is safe to do so– We give 80% per DB server to MySQL of 100

– 140GB

What do we have now?• Now we have a fast, stable reliable system• Pulling in millions of messages from a queue per

day• Decoding those messages into 1.5 billion data

points per day• Inserting 1.5 billion data points into MySQL per

day• Performance data generated, and grant

authority reports exported daily• More sleep on a night than we used to

Questions

Data at Scale - Michael Peacock, Cloud Connect 2012

Technology

Transcript of Data at Scale - Michael Peacock, Cloud Connect 2012