Spark streaming for the internet of flying things 20160510.pptx

Post on 12-Apr-2017

296 views 7 download

Transcript of Spark streaming for the internet of flying things 20160510.pptx

M. ÁNGEL FERNÁNDEZ DÍAZ

Computer Science (URJC)

1 year RUG (Groningen - Netherlands)

Master in Innovation

Stratio Crossdata Scrum Master

WHO ARE WE?

BACKGROUND

@miguel_afd

https://github.com/miguel0afd

https://www.linkedin.com/in/miguel0afd

FIND ME AT:

PABLO FCO. PÉREZ HIDALGO

B.S. + M.S. Computer Science (UPM)

You may remember me from such activities as:• Resource optimization• Financial IT• Distributed Systems

Functional & Reactive Programming advocate.

Stratio Crossdata Software Engineer.

WHO ARE WE?

BACKGROUND

@pfcoperez

https://github.com/pfcoperez

https://goo.gl/OOdTib

FIND ME AT:

http://stackoverflow.com/users/1893995

http://www.pablofranciscoperez.info

Internet of ThingsSome of them in the sky

Ubiquitous:

• Wearables

• Appliances

• Multimedia

Sensors

Connected

Real/Near-Real time information feeds

COMMON FEATURES

COMMON FEATURES

• Ubiquitous:• Wearables• Appliances• Multimedia

• Sensors• Connected• Real/Near-Real time information feeds

WHY ARE THEY ATTRACTIVE FOR THEIR USERS?

Cool perks

Automation

Make their life more comfortable

Provides them with information on their own lives

Know more about their habits

Synchronize all your gadgets

Optimize your time and your resources

WHAT’S THE REVENUE FOR IOT SERVICE PROVIDERS?

Service providers, benefits beyond salesHuge amounts of incoming data from their users.Anonymous profiles are built (Customer centric).That information is marketized:

Consumption trends.Decisions taken upon data analysis results: Netflix, Amazon…

Logistics antizipation (Amazon patent: Anticipatory package shipping).

QUITE A HOT TOPIC

EXPONENTIAL GROWTH OF INTEREST

GARTNER HYPE CYCLE

INTERNET OF (FLYING?) THINGS

...he who is skilled in attack flashes forth from the topmost heights of heaven…Sun Tzu - The Art of War

• Keep & deploy the sensors• Make them fly• Forget about users perks, or not?

INTERNET OF FLYING THINGS

• Keep & deploy the sensors• Make them fly• Forget about users perks, or not?

...he who is skilled in acquiring data flashesforth from the topmost heights of heaven…

Sun Tzu - The ark of war , Thug life

I THOUGHT REACHING THE HEAVENS WAS EXTREMELY EXPENSIVE!

I THOUGHT REACHING THE SKIES WAS EXPENSIVE!

THE COMMON RECIPE TO LOWER YOUR ENGINEERING EXPENSES

1. Get humans out of the vehicles

2. Scale horizontally, not vertically

3. Use commodity hardware!

Big Data...And IoT with Apache Spark

Volume

Variety

Velocity

Let’s scale Horizontally!

Let’s use Event Streams

Be polyglot but better if we use common lang

STREAMING: WATER IS A GOOD MACROSCOPIC ANALOGY

• A stream is the description of a data SOURCE, a SINK and the transformations connecting them.

• As well as with fluids, data can be merged into the same stream from different SOURCEs: join, union, ...

• And be swallowed by heterogeneous SINKs

Source A

Source B

Sink

Using higher order functions!

flatMap, map, filter, collect, ...

EVENTS, THE QUANTUM UNIT OF INFORMATION

• Streams aren’t continuous.

• Data is divided into events.

• Events are the units of information flowing through the stream:

• Can be uniquely identified• Are usually marked with an event timestamp

Is actually

f(x)

EVENTS, THE QUANTUM UNIT OF INFORMATION

Event content,juicy information

Unique id

Timestamp

Map(f)

FROM CONTROL FLOW TO DATA FLOW

FROM CONTROL FLOW TO DATA FLOW: AN OLD SCHOOL FLOWCHART

FROM CONTROL FLOW TO DATA FLOW: SWITCHING PLACES, NOW DATA RULES!

Eventstream

f(x) = shouldConsider(x)

filter Output stream

μBATCHING VS REACTIVE STREAMING: REACTIVE STREAMING

Map(f)

f( ) f( ) f( ) f( )

t

Map(f)

μBATCHING VS REACTIVE STREAMING: μBATCHING, THE SPARK WAY

t

RDD( ) map f RDD( , ) map fRDD( ) map f

μBATCHING VS REACTIVE STREAMING: μBATCHING, THE SPARK WAY

μBATCHING VS REACTIVE STREAMING: μBATCHING, THE SPARK WAY

μBATCHING VS REACTIVE STREAMING: μBATCHING, THE SPARK WAY

●Spark Streaming Pros:

○Most streaming apps are for [Complex] Event Processing.

○Using stateful operations on time windows.Real streaming is overkill in many cases.

○Batch algorithms reuse.

●Cons:

○Not real-real streaming: Delayed event response

○Tuning is required to avoid performance issues &

○OME situations

Leveraging Spark Streamingto analyze swarms behaviour

OUR COMMODITY HARDWARE

• Chinese carbon fiber hobbyists’ FPV racing quadcopter.

• Flight controller: Revolution + OPLink

• Real time telemetry data feed:

3D Acceleration

3D Attitude

Pilot orders

Barometric altitude

[GPS, SONAR, Battery state, …]

UAV CONTROL PLATFORM: LIBREPILOT & CC3D REVOLUTION

• LibrePilot Open platform providing HW design and SW:

• Flight control & assistance

• Ground Control Station Desktop app:• Hardware settings, calibration, ...• Telemetry management: • Autopilot

• http://www.librepilot.org/

• Revolution vehicle controller:

• Sensors: 3D Accelerometer, 3D Gyroscope, 3D Magnetometer, Barometric pressure, ...

• Actuator: PWM out feeding ESCs

• Connectivity: PWM/PPM Input (radio controllers), USB, I2C, …, 433Mhz Radio link!

SWARM?

swarm1 /swɔrm/ n. [countable] (www.wordreference.com)

1. Insects a body of honeybees that leave a hive and fly off together, accompanied by a queen, to start a new colony.

2. a great number of things or persons moving together:A swarm of reporters descended on her.

SWARM?

swarm1 /swɔrm/ n. [countable] (www.wordreference.com)

1. Insects a body of honeybees that leave a hive and fly off together, accompanied by a queen, to start a new colony.

2. a great number of things or persons moving together:A swarm of reporters descended on her.

Our definition: A, relatively, great number of UAVs, moving around the same area and managed by the same system.

• Initially, remotely driven by humans.

• Acquiring data: Events generated and propagated to the master system in real time.

• Collective sensor composed of the equipment on each airframe.

THIS POC GOAL: FIND AND FIRE BAD PILOTS

• SOURCE: Make telemetry events reach the Spark cluster.

• TRANSLATION: Translate them into easy to process stream events.

• EP: Develop bump detection algorithms: Event Processing.

• SINK: Persist and visualize complex events.

EVENT FLOW ARCHITECTURE

GCS

/dev/ttyACMx

TCP IPServer

JSONs

sock

etTe

xtSt

ream

Internet

EVENT FLOW ARCHITECTURE: SWARMING

GCS GCS GCS GCS

Internet Internet

EVENT FLOW ARCHITECTURE: SWARMING NOW OUR DRONES ARE BORGS!

MAKING GCS TO SING LIKE A CANARY: THANKS TO THE IOT PLUGIN

• Subscription service listening to TCP port 7891.

• Any event received from the drone gets sent to all clients connected to that port.

• Text, JSON, socket interface.

MAKING GCS TO SING LIKE A CANARY: THANKS TO THE IOT PLUGIN

MAKING GCS TO SING LIKE A CANARY: THANKS TO THE IOT PLUGIN

MAKING GCS TO SING LIKE A CANARY: THANKS TO THE IOT PLUGIN

MAKING SPARK STREAMING LISTEN TO LIBREPILOT SECRETS

No man can be in all rooms at all times. I have many little birds in the North, my lord...

MAKING SPARK STREAMING LISTEN TO LIBREPILOT SECRETS

• Spark Input Streams: networkStream, actorStream, fileStream, …, socketTextStream

• socketTextStream: 1 received line <-> 1 Event

• A raw JSON string is to analysis good as...

UNDERSTANDING THE JSON BULK...OR HOW TO COOK HAMBURGERS FROM RAW MEAT

THE NATURE OF OUR DATA

• Timestamp: ms from Jan 1st 1970• Drone Id: String• Sensors values:

• Attitude:• Yaw: [-180, 180]º• Pitch: [-180, 180]º• Roll: [-180, 180]º

• 3D Relative Acceleration:• X: m/s^2• Y: m/s^2• Z: m/s^2

• Angular velocity:• X: º/s• Y: º/s• Z: º/s

BUMP DETECTION

• Our definition of bump: Any acceleration event indicating the physical action of hitting something with a drone.

• We’d like to detect vertical bumps: Bounces.

• So we are only interested in z-axis acceleration: 9.8 m/s²

NAÏVE BOUNCE DETECTION: ALGORITHM

1. Filter event stream keeping acceleration events.

2. Filter (1) stream keeping events with z-axis acceleration within a range.

NAÏVE BOUNCE DETECTION: NUTS & BOLTS

Algorithm:

NAÏVE BUMP DETECTION

Problems:

• False negatives: What about hitting the roof from below? Wouldn’t be detected as bumps

• False positives: And prolongated ascends? Would generate countless bump events.

Solution:

• Keep track of your previous state, at least for a while

AVERAGE OUTLIERS BUMP DETECTION: ALGORITHM

For each time window of, let say 5 seconds:

1. Calculate z-axis acceleration: Mean & Std. Deviation.

2. Filter out all window z-axis acceleration samples within n Std. Deviations from the mean.

3. Filter out from (2) result those entries with z-axis acceleration below a threshold.

AVERAGE OUTLIERS BUMP DETECTION: NUTS & BOLTS

KRAPS-TREAMING? THE CONVENIENCE OF μBATCHING

DID YOU CONSIDER THE SENSORS SUBJECTIVITY?

Something is wrong withgravity today!

DID YOU CONSIDER THE SENSORS SUBJECTIVITY?

Z-axis acceleration is -9.8 m/s²

I’d rather say it is 9.8 m/s²!

DID YOU CONSIDER THE SENSORS SUBJECTIVITY? BUMP DETECTION DOESN’T CARE ABOUT ATTITUDE

Yes, indeed!Z-axis acceleration is -9.8 m/s²

DID YOU CONSIDER THE SENSORS SUBJECTIVITY? OR HOW TO MAKE A GREY BLUR FROM A DRONE

Well… Acceleration values are vectors… So what!

You can rotate

vectors!!! HA HA HA

DID YOU CONSIDER THE SENSORS SUBJECTIVITY? OR HOW TO MAKE A GREY BLUR FROM A DRONE

y-axis

x-axisz-axis

Just rotate your Acceleration vector by your attitude Roll, Pitch and Yaw angles.As easy as 3 matrix multiplications.

OR HOW TO MAKE A GREY BLUR FROM A DRONE WITH SCALA

OR HOW TO MAKE A GREY BLUR FROM A DRONE

Problem: Which Actitude entry should I use to normalize?

Solution (1 of many): Keep track of close Attitude events and choose the closest.

OR HOW TO MAKE A GREY BLUR FROM A DRONE

Attitude

Accel

THE PIPE PATTERN: OR THE PATH OF THE PEACEFUL SOUL

• Nothing unexpected will happen with your data flow...

• … data flow which happens to be your reasoning flow.

Results persistence & visualization

THE EVENT RECORDER

• All events share the same key: (drone Id, timestamp).

• No analysis search is expected to be done over values.

• Using Java Datastax driver.

• Inferring schema on-the-fly.

• Creating tables on-the-fly.

• High insertion rates are required.

• From distributed agents.

DATA FLOW FROM SOCKET TO CASSANDRA

Socket ...

μBat

ch

μBat

ch

Grouped bumps

Desired attitude

Actual attitude

Acceleration

SO WHAT?

VISUALIZATION: FROM CASSANDRA TO THE USER

A bit of ScienceFiction, or not?

AMAZON PRIME AIR

Once established amazon will probably like to evaluate their pilot proficiency:

• Smooth flights to avoid cargo damage…

• … can be analyzed for all their fleet using technics alike the developed for our PoC.

Computation & memory demand will increase as their fleet does

• Horizontally scalable systems are a perfect match for this growth pattern.

Prime Air vehicles will take advantage of sophisticated “sense and avoid” technology, as well as a high degree of automation, to safely operate beyond the line of sight to distances of 10 miles or more. (http://www.amazon.com/b?node=8037720011)

WIRELESS SENSORS NETWORKS (WSN)

• Main caveat: Deployment, which is easily solved using a n-copter swarm.

• The equipment used today already includes temp and pressure sensors.

• Big Data’s V V Variability plays a main role in this kind of applications.

WSNs are spatially distributed autonomous sensors to monitor physical or environmental conditions, such as temperature, sound, pressure, etc (https://en.wikipedia.org/wiki/Wireless_sensor_network)

DRONE RACING ANALYTIC

• Detect loops, barrel rolls, acceleration peaks

• Keep track of each pilot performance

• Update Score Boards with achievements counters

DRONE RACING ANALYTIC

• Detect loops, barrel rolls, acceleration peaks

• Keep track of each pilot performance

• Update Score Boards with achievements counters

Thanks for listening!