Data at Scale - Michael Peacock, Cloud Connect 2012

62
Data at Scale Data problems and solutions with the connected world

Transcript of Data at Scale - Michael Peacock, Cloud Connect 2012

Page 1: Data at Scale - Michael Peacock, Cloud Connect 2012

Data at Scale

Data problems and solutions with the connected world

Page 2: Data at Scale - Michael Peacock, Cloud Connect 2012

Michael Peacock

Web Systems Developer

Telemetry Team

Smith Electric Vehicles

Lead Developer

Occasional conference speaker

Technical Author

Page 3: Data at Scale - Michael Peacock, Cloud Connect 2012

• Worlds largest manufacturer of all electric commercial vehicles

• Founded in 1920• US facility opened 2009• US buyout in 2011

Page 4: Data at Scale - Michael Peacock, Cloud Connect 2012

Commercial electric vehicles?

Page 5: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 6: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 7: Data at Scale - Michael Peacock, Cloud Connect 2012

Electric Vehicles

• 16,500 – 26,000 lbs gross vehicle weight• Commercial Electric Delivery Trucks• 7,121 – 16,663 lbs payload• 50 – 240km• Top Speed 80km/h

Page 8: Data at Scale - Michael Peacock, Cloud Connect 2012

Electric Vehicles

• New, continually evolving, technology• Viability evidence required• Government research

Page 9: Data at Scale - Michael Peacock, Cloud Connect 2012

EV Data

• Performance analysis and metrics• Proving the technology: Government

research• Evaluating driver training conversions• Diagnostics, Service and Warranty Issues• Continuous Improvement

Page 10: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 11: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 12: Data at Scale - Michael Peacock, Cloud Connect 2012

Current Status

• ~500 telemetry enabled vehicles• Telemetry is now fitted as standard in our

vehicles• Our MySQL solution processes:

– 1.5 billion inserts per day– Constant minimum of 4000 inserts per second

Page 13: Data at Scale - Michael Peacock, Cloud Connect 2012

CANBus: 101

Page 14: Data at Scale - Michael Peacock, Cloud Connect 2012

CANBus and Telemetry

• Sample the buses: once per second• Only sample buses with useful

performance and diagnostic information on them

Page 15: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 16: Data at Scale - Michael Peacock, Cloud Connect 2012

Vehicle Data• Drive train information:

– Motor speed– Pedal positions– Temperatures– Fault Codes

• Battery information:– Current, Voltage & Power– Capacity– Temperatures

Page 17: Data at Scale - Michael Peacock, Cloud Connect 2012

Connected World: The Problem

• Connected infrastructure– EV Charging stations– Utilities

• Home based telemetry– Smart Meters– Smart Homes

Page 18: Data at Scale - Michael Peacock, Cloud Connect 2012

Our problem

• Hundreds of connected devices, each with numerous sensors giving us 2,500 pieces of data per second per vehicle

• Broadcast time we can’t plan for• Vehicles rolling off the production line• New requirements for more data

Page 19: Data at Scale - Michael Peacock, Cloud Connect 2012

How it started

Page 20: Data at Scale - Michael Peacock, Cloud Connect 2012

Issue 1: Availability

Page 21: Data at Scale - Michael Peacock, Cloud Connect 2012

Issue 2: Capacity

Sometimes data is too

much to cope with

www.flickr.com/photos/eveofdiscovery/3149008295

Page 22: Data at Scale - Michael Peacock, Cloud Connect 2012

Issue 2: Capacity

Page 23: Data at Scale - Michael Peacock, Cloud Connect 2012

Option: Cloud Infrastructure

• Cloud based infrastructure gives:– More capacity– More failover– Higher availability

Page 24: Data at Scale - Michael Peacock, Cloud Connect 2012

Cloud Infrastructure: Problem

• Huge volumes of data inserts into a MySQL solution: sub-optimal on virtualised environments

• Existing enterprise hardware investment• Security and legal issues for us storing the

data off-site

Page 25: Data at Scale - Michael Peacock, Cloud Connect 2012

Cloud Infrastructure: Enabler

Page 26: Data at Scale - Michael Peacock, Cloud Connect 2012

www.flickr.com/photos/gadl/89650415/inphotostream

Page 27: Data at Scale - Michael Peacock, Cloud Connect 2012

AMQP

Advanced Message Queuing Protocol

Page 28: Data at Scale - Michael Peacock, Cloud Connect 2012

Queuing

• Downtime• Capacity• Maintenance Windows

Page 29: Data at Scale - Michael Peacock, Cloud Connect 2012

What if...

• Queuing allows us to cope with:– Downtime of our own systems– Capacity problems

• Queuing doesnt allow us to cope with:– An outage of a queuing infrastructure

Page 30: Data at Scale - Michael Peacock, Cloud Connect 2012

Buffer

www.flickr.com/photos/brapps/403257780

Page 31: Data at Scale - Michael Peacock, Cloud Connect 2012

Cloud based infrastructure

• Use a Message Queue to ensure data is only processed when you have the resources to process it

Page 32: Data at Scale - Michael Peacock, Cloud Connect 2012

SAN

• Backbone to most cloud-based systems• Powers our MySQL solution• Supports:

– Huge volumes of data– Lots of processing– Fast connection to your servers– Backups and snapshots

Page 33: Data at Scale - Michael Peacock, Cloud Connect 2012

SAN Tips

• When dealing with data on a huge scale every aspect of your application and infrastructure needs to be optimised, this includes your SAN – something which is commonly overlooked.

• http://www.samlambert.com/2011/07/how-to-push-your-san-with-open-iscsi_13.html

Page 34: Data at Scale - Michael Peacock, Cloud Connect 2012

New Architecture

Page 35: Data at Scale - Michael Peacock, Cloud Connect 2012

Speed: Stream Batch

• Streams of continuously flowing data can be difficult to process

• Turn the stream into small, quick batches

• MySQL: LOAD DATA INFILE

Page 36: Data at Scale - Michael Peacock, Cloud Connect 2012

Shard 1: Hardware

• As the amount of data increased, we hit a huge performance problem. This was solved by sharding at a hardware level.

• Each data collection device was given its own database, which could be on any number of separate machines, with a single database acting as a registry

Page 37: Data at Scale - Michael Peacock, Cloud Connect 2012

Rationalisation & Extrapolation

• Remember the CANBus– Always telling us information, which we

sample every second?– Do we always need that?

• Extrapolate and assume

Page 38: Data at Scale - Michael Peacock, Cloud Connect 2012

Getting information from data

• Vehicle performance information involves:– Looking at 20 – 30 data points for each

second of a vehicles operation in a day– Analysing the data– Performing calculations, which vary

depending on certain data points

• Getting this data was slow– How far did Customer A’s fleet travel last

week?

Page 39: Data at Scale - Michael Peacock, Cloud Connect 2012

Regular processing

• Instead of processing data on demand, process it regularly

• Nightly scheduled task to evaluate performance information

Page 40: Data at Scale - Michael Peacock, Cloud Connect 2012

Regular Processing: Problems

You need to pull the data out faster and faster than before!

Page 41: Data at Scale - Michael Peacock, Cloud Connect 2012

Shard 2: Tables

• All our data has a timestamp associated with it

• Looking up data for a particular day was slow. Very slow.

• We sharded the data again, this time with a table per week within a vehicles specific database

Page 42: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 43: Data at Scale - Michael Peacock, Cloud Connect 2012

Sharding: Fallbacks and logic

• What about data before you implemented sharding?

• Which table do I need to look at?

Page 44: Data at Scale - Michael Peacock, Cloud Connect 2012

Aggregation

• With data segregated on a per vehicle and per week basis, lookups were much faster

• Performance calculations could be scheduled nightly, with a single record recorded for each vehicle for each day in a central database

• Allows for easy aggregation:– How far did my fleet travel last week?– How much energy did they use last month?

Page 45: Data at Scale - Michael Peacock, Cloud Connect 2012
Page 46: Data at Scale - Michael Peacock, Cloud Connect 2012

Backups and Archives

• SAN backups and snapshots• With date based sharding:

– Dump a table– Copy it elsewhere– Drop it / Flush it (if archiving)

Page 47: Data at Scale - Michael Peacock, Cloud Connect 2012

Outsource to the cloud

• Why waste resources doing things that cloud based services do better (where legal, security and privacy reasons allow?)

• Maps• Email delivery• Even phone integration

Page 48: Data at Scale - Michael Peacock, Cloud Connect 2012

Data Type Optimization

• When prototyping a system and designing a database schema, its easy to be sloppy with your data types, and fields

• DONT BE• Use as little storage space as you can

– Ensure the data type uses as little as you can– Use only the fields you need

Page 49: Data at Scale - Michael Peacock, Cloud Connect 2012

Sharding: An excuse

• Sharding was a large project for us, and involved extensive re-architecting of the system.

• We had to make changes to every query we have in our code

• Gave us an excuse to:– Optimise the queries– Optimise the indexes

Page 50: Data at Scale - Michael Peacock, Cloud Connect 2012

Query Optimization

• Run every query through EXPLAIN EXTENDED

• Check it hits the indexes• Remove functions like CURDATE from

queries, to ensure query cache is hit

Page 51: Data at Scale - Michael Peacock, Cloud Connect 2012

Index Optimization

• Keep it small• From our legacy days of one database on

one server, we had a column that told us which vehicle the data related to– This was still there...as part of an

index...despite the fact the application hadn’t required it for months

Page 52: Data at Scale - Michael Peacock, Cloud Connect 2012

Live data: dashboard

Page 53: Data at Scale - Michael Peacock, Cloud Connect 2012

Live data: Maps

Page 54: Data at Scale - Michael Peacock, Cloud Connect 2012

Live data

• Original database design dictated:• Each type of data point required a separate

query, sub-query or join to obtain

• Collection device and processing service dictated:• GPS Co-ordinates can be up to 6 separate

data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction

Page 55: Data at Scale - Michael Peacock, Cloud Connect 2012

Dashboards: Caching

• Don’t query if you don’t have to

• Cache what you can; access direct

• With message queuing its possible to route messages to two or more places: one to be processed and another to display the latest information directly

Page 56: Data at Scale - Michael Peacock, Cloud Connect 2012

Exporting data: Group

• Where possible group exports and reports together by the same shard/table/index

Page 57: Data at Scale - Michael Peacock, Cloud Connect 2012

Code considerations

• Race conditions• Number of concurrent requests – group

them

Page 58: Data at Scale - Michael Peacock, Cloud Connect 2012

Application Quality

• When dealing with lots of data, quickly, you need to ensure:– You process it correctly– You can act fast if there is a bug– You can act fast when refactoring

Page 59: Data at Scale - Michael Peacock, Cloud Connect 2012

Deployment

• When dealing with a stream of data, rolling out new code can mean pausing the processing work that is done

• Put deployment measures in place to make a deployment switch over instantaneous

Page 60: Data at Scale - Michael Peacock, Cloud Connect 2012

Technical Tips

• Measure your applications performance, data throughput and so on– A data at scale problem itself

• Use as much RAM on your servers as is safe to do so– We give 80% per DB server to MySQL of 100

– 140GB

Page 61: Data at Scale - Michael Peacock, Cloud Connect 2012

What do we have now?• Now we have a fast, stable reliable system• Pulling in millions of messages from a queue per

day• Decoding those messages into 1.5 billion data

points per day• Inserting 1.5 billion data points into MySQL per

day• Performance data generated, and grant

authority reports exported daily• More sleep on a night than we used to

Page 62: Data at Scale - Michael Peacock, Cloud Connect 2012

Questions