Fast Cars, Big Data - How Streaming Can Help Formula 1

54
© 2016 MapR Technologies © 2016 MapR Technologies MapR Confidential © 2016 MapR Technologies 1 Fast Cars, Big Data How Streaming Can Help Formula 1 Tugdual Grall @tgrall

Transcript of Fast Cars, Big Data - How Streaming Can Help Formula 1

© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential © 2016 MapR Technologies1

Fast Cars, Big Data How Streaming Can Help Formula 1

Tugdual Grall@tgrall

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall

{“about” : “me”}Tugdual “Tug” Grall • MapR

• Technical Evangelist • MongoDB

• Technical Evangelist • Couchbase

• Technical Evangelist • eXo

• CTO • Oracle

• Developer/Product Manager • Mainly Java/SOA

• Developer in consulting firms

• Web • @tgrall • http://tgrall.github.io • tgrall

• NantesJUG co-founder

• Pet Project : • http://www.resultri.com

[email protected][email protected]

© 2016 MapR Technologies 3

Agenda• What’s the point of data in motorsports? • Live demo • Architecture • What’s next?

© 2016 MapR Technologies 4

How data plays in F1 motorsports

© 2016 MapR Technologies 5

© 2016 MapR Technologies 6

Data in Motorsports

http

://f1

fram

ewor

k.bl

ogsp

ot.d

e/20

13/0

8/sh

ort-g

uide

-to-f1

-tele

met

ry-s

pa-c

ircui

t.htm

l

© 2016 MapR Technologies 7

Difference is due to later and sharper braking

© 2016 MapR Technologies 8

Real Analytics as Well as Visualization• Inputs

• Predictive analysis of consumables and tires • Physical models of car + driver performance

• Tire wear slows lap times, lower fuel weight speeds lap times • Competitors’ options • Weather conditions • Current GP points status

• Outputs • Tactical options, outcome distributions

© 2016 MapR Technologies 9

Data for Marketing as well

http

://fo

rmul

a1.fe

rrar

i.com

/en/

info

raci

ng-h

unga

rian-

gp-2

015/

© 2016 MapR Technologies 10

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car• Up to 2000 channels

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race• 5 billions for a full race weekend

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race• 5 billions for a full race weekend• 5/6Gb of compressed data per car for 90mn

Some Examples?

© 2016 MapR Technologies 10

• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race• 5 billions for a full race weekend• 5/6Gb of compressed data per car for 90mn

US Grand Prix 2014 : 243 Tb (race teams combined)

Some Examples?

© 2016 MapR Technologies 11

So how does that work? Especially for real-time data?

© 2016 MapR Technologies 12

Production System Outline

© 2016 MapR Technologies 13

Simplified Demo System Outline

Archive MapR DB

Jetty /Bootstrap /

d3

Apache Drill (SQL access)

TORCS race simulator

MapR Streams

© 2016 MapR Technologies 14

TORCS for Cars, Physics and Drivers

TORCS is a pseudo-physics based racing simulator with full graphics output and pluggable control modules.

TORCS is commonly used for AI research, but the control model can just as well collect data

© 2016 MapR Technologies 15

Let’s see it work!

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 16

IoT : Racing Cars

Producers Consumers

sensors data

Real Time

Analytics

https://github.com/mapr-demos/racing-time-series

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 17

IoT : Racing Cars

sensors data

https://github.com/mapr-demos/racing-time-series

Kafka Producer

(Java)

Kafka Consumer +

OJAI (Java)

Kafka Consumer +

WebSocket (Java + JS)

SQL

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 18

Big Datastore

Distributed File SystemHDFS/MapR-FS

NoSQL DatabaseHBase/MapR-DB

….

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 19

Store data as File or Row?

HDFS / MapR-FS • Data stores as “files” • Fast with Large Scans • Slow random read/writes

NoSQL (HBase/MapR-DB) • Data stores as row/documents • Fast with random read/writes

© 2016 MapR Technologies 20

Kafka & MapR Streams

© 2016 MapR Technologies 21

What is Kafka?

• http://kafka.apache.org/ • Created at LinkedIn, open sourced in 2011 • Implemented in Scala / Java • Distributed messaging system built to scale

© 2016 MapR Technologies 22

Key Concepts

• Feeds of messages are organised in topics • Processes that publish messages are called producers • Processes that subscribed to topic and process messages are

consumers • A Kafka cluster is made of one or more brokers (== node)

© 2016 MapR Technologies 23

Topics and Partitions

• Split topics into partitions for scalability

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5

0 1 2 3 4 5 6 7

Partition 0

Partition 1

Partition 2

Writes

© 2016 MapR Technologies 24

Consumer Groups

• Single consumer abstraction for scalability • Max 1 consumer per partition • Any number of consumer groups

© 2016 MapR Technologies 25

Produce MessagesProducerRecord<String, byte[]> rec = new ProducerRecord<>( “/stream/car_1_topic“, eventName, value.toString().getBytes());

producer.send(rec, (recordMetadata, e) -> { if (e != null) { … });

producer.flush();

© 2016 MapR Technologies 26

Consume Messageslong pollTimeOut = 800; while(true) { ConsumerRecords<String, String> records = consumer.poll(pollTimeOut); if (!records.isEmpty()) { Iterable<ConsumerRecord<String, String>> iterable = records::iterator; StreamSupport.stream(iterable.spliterator(), false).forEach((record) -> { // work with record object

… record.value();…

}); consumer.commitAsync(); } }

© 2016 MapR Technologies 27

Big Picture

Producer

Producer

Producer

Consumer

Consumer

Consumer

© 2016 MapR Technologies 28

More real life Kafka …

Zookeeper

Broker 1

Topic A Topic B

Broker 2

Topic A Topic B

Broker 3

Topic A Topic B

Producer

Producer

Producer

Consumer

Consumer

Consumer

© 2016 MapR Technologies 29

• Distributed messaging system built to scale • Use Apache Kafka API 0.9.0 • No code change • Does not use the same “broker” architecture

• Log stored in MapR Storage(Scalable, Secured, Fast, Multi DC)

• No Zookeeper

© 2016 MapR Technologies 30

Kafka

Zookeeper

Broker 1

Topic A Topic B

Broker 2

Topic A Topic B

Broker 3

Topic A Topic B

Producer

Producer

Producer

Consumer

Consumer

Consumer

© 2016 MapR Technologies 31

MapR Streams

Stream

Topic A Topic B

Stream

Topic A Topic B

Stream

Topic A Topic B

Producer

Producer

Producer

Consumer

Consumer

Consumer

© 2016 MapR Technologies 32

What’s next?

© 2016 MapR Technologies 3330

Sensor Data V1• 3 main data points:

• Speed (m/s) • RPM • Distance (m)

• Buffered

{ "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, "RPM":1673.264526 }, "racetime":0.556, "timestamp":1458141858 },

© 2016 MapR Technologies 3431

Sensor Data V2• 3 main data points:

• Speed (m/s) • RPM • Distance (m) • Throttle • Gear • …

• Buffered

{ "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806, "gear" : 2 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, “RPM":1673.264526, "gear" : 2 }, "racetime":0.556, "timestamp":1458141858 },

© 2016 MapR Technologies 35

• It works, is available on github, ASL 2

• Data collected is unrealistically limited, lacks – Tire pressure, temperature x 4 – Brake usage, temperature x 8 – Engine monitoring is primitive (RPMs only, no KERS) – Data rate is fixed, real data comes in at highly variable rates – Real data has variable delays due to RF dropout + buffering

• Data collected is in pure JSON – Real data is columnar compressed blobs

© 2016 MapR Technologies 36

Next Steps• Near Real Time Data Processing

• Aggregation • Machine Learning • Alerts

© 2016 MapR Technologies 37

• Cluster Computing Platform • Extends “MapReduce” with

extensions – Streaming – Interactive Analytics

• Run in Memory • http://spark.apache.org/

© 2016 MapR Technologies 38

• Streaming Dataflow Engine • Datastream/Dataset APIs • CEP, Graph, ML

• Run in Memory • https://flink.apache.org/

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 39

IoT : Racing Cars V2.0

sensors data

https://github.com/mapr-demos/racing-time-series

Alerts

© 2016 MapR Technologies 40

Spark & Streams val topics = “/app/racing/stream:all_cars" val sparkConf = new SparkConf().setAppName(“SensorStream") val ssc = new StreamingContext(sparkConf, Seconds(2))

// Create direct kafka stream with brokers and topics val topicsSet = topics.split(",").toSet val kafkaParams = Map[String, String]( ConsumerConfig.GROUP_ID_CONFIG -> "race1", ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer", ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer", ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "earliest", ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> "false", "spark.kafka.poll.time" -> "1000" )

val messages = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet) val sensorDStream = messages.map(_._2).map(parseSensor)

sensorDStream.foreachRDD { rdd => // There exists at least one element in RDD if (!rdd.isEmpty) { ….. } }

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 41

Streaming Architecture & Formula 1• Stream data in real time

• Big Data Store to deal with the scale • NoSQL Database, Distributed File System

• Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….

© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 42

Streaming Architecture & Formula 1• Stream data in real time

• Big Data Store to deal with the scale • NoSQL Database, Distributed File System

• Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….

This is not only about Formula 1! (Telco, Finance, Retail, Content, IT)

© 2016 MapR Technologies 43

MapR Converged Data Platform

Open Source Engines & Tools Commercial Engines & Applications

Utility-Grade Platform Services

Dat

aPr

oces

sing

Enterprise StorageMapR-FS MapR-DB MapR Streams

Database Event Streaming

Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy

Search & Others

Cloud & Managed Services

Custom Apps

Unified M

anagement and M

onitoring

© 2016 MapR Technologies 44

MapR Platform Services: Open API ArchitectureAssures Interoperability, Avoids Lock-in

MapR-FS Enterprise Storage

MapR-DB NoSQL Database

MapR Streams Global Event Streaming

HDFS API

POSIX NFS

SQL, Hbase

APIJSON API

Kafka API

© 2016 MapR Technologies 45

© 2016 MapR Technologies 46

Q & A@mapr | @tgrall maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential 47