Fast Cars, Big Data - How Streaming Can Help Formula 1
-
Upload
tugdual-grall -
Category
Technology
-
view
163 -
download
3
Transcript of Fast Cars, Big Data - How Streaming Can Help Formula 1
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential © 2016 MapR Technologies1
Fast Cars, Big Data How Streaming Can Help Formula 1
Tugdual Grall@tgrall
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall
{“about” : “me”}Tugdual “Tug” Grall • MapR
• Technical Evangelist • MongoDB
• Technical Evangelist • Couchbase
• Technical Evangelist • eXo
• CTO • Oracle
• Developer/Product Manager • Mainly Java/SOA
• Developer in consulting firms
• Web • @tgrall • http://tgrall.github.io • tgrall
• NantesJUG co-founder
• Pet Project : • http://www.resultri.com
© 2016 MapR Technologies 3
Agenda• What’s the point of data in motorsports? • Live demo • Architecture • What’s next?
© 2016 MapR Technologies 6
Data in Motorsports
http
://f1
fram
ewor
k.bl
ogsp
ot.d
e/20
13/0
8/sh
ort-g
uide
-to-f1
-tele
met
ry-s
pa-c
ircui
t.htm
l
© 2016 MapR Technologies 8
Real Analytics as Well as Visualization• Inputs
• Predictive analysis of consumables and tires • Physical models of car + driver performance
• Tire wear slows lap times, lower fuel weight speeds lap times • Competitors’ options • Weather conditions • Current GP points status
• Outputs • Tactical options, outcome distributions
© 2016 MapR Technologies 9
Data for Marketing as well
http
://fo
rmul
a1.fe
rrar
i.com
/en/
info
raci
ng-h
unga
rian-
gp-2
015/
© 2016 MapR Technologies 10
• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms
Some Examples?
© 2016 MapR Technologies 10
• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race
Some Examples?
© 2016 MapR Technologies 10
• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race• 5 billions for a full race weekend
Some Examples?
© 2016 MapR Technologies 10
• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race• 5 billions for a full race weekend• 5/6Gb of compressed data per car for 90mn
Some Examples?
© 2016 MapR Technologies 10
• Up to 300 sensors per car• Up to 2000 channels• Sensor data are sent to the paddock in 2ms• 1.5 billions of data points for a race• 5 billions for a full race weekend• 5/6Gb of compressed data per car for 90mn
US Grand Prix 2014 : 243 Tb (race teams combined)
Some Examples?
© 2016 MapR Technologies 13
Simplified Demo System Outline
Archive MapR DB
Jetty /Bootstrap /
d3
Apache Drill (SQL access)
TORCS race simulator
MapR Streams
© 2016 MapR Technologies 14
TORCS for Cars, Physics and Drivers
TORCS is a pseudo-physics based racing simulator with full graphics output and pluggable control modules.
TORCS is commonly used for AI research, but the control model can just as well collect data
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 16
IoT : Racing Cars
Producers Consumers
sensors data
Real Time
Analytics
https://github.com/mapr-demos/racing-time-series
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 17
IoT : Racing Cars
sensors data
https://github.com/mapr-demos/racing-time-series
Kafka Producer
(Java)
Kafka Consumer +
OJAI (Java)
Kafka Consumer +
WebSocket (Java + JS)
SQL
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 18
Big Datastore
Distributed File SystemHDFS/MapR-FS
NoSQL DatabaseHBase/MapR-DB
….
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 19
Store data as File or Row?
HDFS / MapR-FS • Data stores as “files” • Fast with Large Scans • Slow random read/writes
NoSQL (HBase/MapR-DB) • Data stores as row/documents • Fast with random read/writes
© 2016 MapR Technologies 21
What is Kafka?
• http://kafka.apache.org/ • Created at LinkedIn, open sourced in 2011 • Implemented in Scala / Java • Distributed messaging system built to scale
© 2016 MapR Technologies 22
Key Concepts
• Feeds of messages are organised in topics • Processes that publish messages are called producers • Processes that subscribed to topic and process messages are
consumers • A Kafka cluster is made of one or more brokers (== node)
© 2016 MapR Technologies 23
Topics and Partitions
• Split topics into partitions for scalability
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5
0 1 2 3 4 5 6 7
Partition 0
Partition 1
Partition 2
Writes
© 2016 MapR Technologies 24
Consumer Groups
• Single consumer abstraction for scalability • Max 1 consumer per partition • Any number of consumer groups
© 2016 MapR Technologies 25
Produce MessagesProducerRecord<String, byte[]> rec = new ProducerRecord<>( “/stream/car_1_topic“, eventName, value.toString().getBytes());
producer.send(rec, (recordMetadata, e) -> { if (e != null) { … });
producer.flush();
© 2016 MapR Technologies 26
Consume Messageslong pollTimeOut = 800; while(true) { ConsumerRecords<String, String> records = consumer.poll(pollTimeOut); if (!records.isEmpty()) { Iterable<ConsumerRecord<String, String>> iterable = records::iterator; StreamSupport.stream(iterable.spliterator(), false).forEach((record) -> { // work with record object
… record.value();…
}); consumer.commitAsync(); } }
© 2016 MapR Technologies 28
More real life Kafka …
Zookeeper
Broker 1
Topic A Topic B
Broker 2
Topic A Topic B
Broker 3
Topic A Topic B
Producer
Producer
Producer
Consumer
Consumer
Consumer
© 2016 MapR Technologies 29
• Distributed messaging system built to scale • Use Apache Kafka API 0.9.0 • No code change • Does not use the same “broker” architecture
• Log stored in MapR Storage(Scalable, Secured, Fast, Multi DC)
• No Zookeeper
© 2016 MapR Technologies 30
Kafka
Zookeeper
Broker 1
Topic A Topic B
Broker 2
Topic A Topic B
Broker 3
Topic A Topic B
Producer
Producer
Producer
Consumer
Consumer
Consumer
© 2016 MapR Technologies 31
MapR Streams
Stream
Topic A Topic B
Stream
Topic A Topic B
Stream
Topic A Topic B
Producer
Producer
Producer
Consumer
Consumer
Consumer
© 2016 MapR Technologies 3330
Sensor Data V1• 3 main data points:
• Speed (m/s) • RPM • Distance (m)
• Buffered
{ "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, "RPM":1673.264526 }, "racetime":0.556, "timestamp":1458141858 },
© 2016 MapR Technologies 3431
Sensor Data V2• 3 main data points:
• Speed (m/s) • RPM • Distance (m) • Throttle • Gear • …
• Buffered
{ "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806, "gear" : 2 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, “RPM":1673.264526, "gear" : 2 }, "racetime":0.556, "timestamp":1458141858 },
© 2016 MapR Technologies 35
• It works, is available on github, ASL 2
• Data collected is unrealistically limited, lacks – Tire pressure, temperature x 4 – Brake usage, temperature x 8 – Engine monitoring is primitive (RPMs only, no KERS) – Data rate is fixed, real data comes in at highly variable rates – Real data has variable delays due to RF dropout + buffering
• Data collected is in pure JSON – Real data is columnar compressed blobs
© 2016 MapR Technologies 36
Next Steps• Near Real Time Data Processing
• Aggregation • Machine Learning • Alerts
© 2016 MapR Technologies 37
• Cluster Computing Platform • Extends “MapReduce” with
extensions – Streaming – Interactive Analytics
• Run in Memory • http://spark.apache.org/
© 2016 MapR Technologies 38
• Streaming Dataflow Engine • Datastream/Dataset APIs • CEP, Graph, ML
• Run in Memory • https://flink.apache.org/
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 39
IoT : Racing Cars V2.0
sensors data
https://github.com/mapr-demos/racing-time-series
Alerts
© 2016 MapR Technologies 40
Spark & Streams val topics = “/app/racing/stream:all_cars" val sparkConf = new SparkConf().setAppName(“SensorStream") val ssc = new StreamingContext(sparkConf, Seconds(2))
// Create direct kafka stream with brokers and topics val topicsSet = topics.split(",").toSet val kafkaParams = Map[String, String]( ConsumerConfig.GROUP_ID_CONFIG -> "race1", ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer", ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer", ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "earliest", ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> "false", "spark.kafka.poll.time" -> "1000" )
val messages = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet) val sensorDStream = messages.map(_._2).map(parseSensor)
sensorDStream.foreachRDD { rdd => // There exists at least one element in RDD if (!rdd.isEmpty) { ….. } }
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 41
Streaming Architecture & Formula 1• Stream data in real time
• Big Data Store to deal with the scale • NoSQL Database, Distributed File System
• Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 42
Streaming Architecture & Formula 1• Stream data in real time
• Big Data Store to deal with the scale • NoSQL Database, Distributed File System
• Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….
This is not only about Formula 1! (Telco, Finance, Retail, Content, IT)
© 2016 MapR Technologies 43
MapR Converged Data Platform
Open Source Engines & Tools Commercial Engines & Applications
Utility-Grade Platform Services
Dat
aPr
oces
sing
Enterprise StorageMapR-FS MapR-DB MapR Streams
Database Event Streaming
Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
Search & Others
Cloud & Managed Services
Custom Apps
Unified M
anagement and M
onitoring
© 2016 MapR Technologies 44
MapR Platform Services: Open API ArchitectureAssures Interoperability, Avoids Lock-in
MapR-FS Enterprise Storage
MapR-DB NoSQL Database
MapR Streams Global Event Streaming
HDFS API
POSIX NFS
SQL, Hbase
APIJSON API
Kafka API
© 2016 MapR Technologies 46
Q & A@mapr | @tgrall maprtech
Engage with us!
MapR
maprtech
mapr-technologies