© 2015 IBM Corporation
IBM Analytics
Spark Analytics with Informix
Pradeep Natarajan, IBM@pradeepnatara
2
Agenda Context: Informix / Spark high-level value propositions IoT use-cases Challenges Prototype and implementation What’s next?
3
Informix to Spark
Context
4
Informix for Internet of Things• Optimized Database for environments, such as:
• Low or no database administration• Embedded: gateways, routers
• Very high transaction rates and uptime characteristics • Widely deployed in the retail sector, where the low administration
overhead makes it essential for in-store deployments.• Informix supports key Internet-of-Things solutions
• Native support for time-based data: Timeseries • Small footprint• Low administration requirements
5
Apache Spark Speed Ease of use, Unified Engine Sophisticated analytics
6
Apache Spark
• Cluster computing framework• Fast and general engine for large-scale data processing• In-memory computing
7
Apache Spark Streaming
Extends Spark for big data stream processing
ROW DATA STREAM Processed DataDistributed Stream Processing System
Scaling, low latency, Recovery Integrate Batch and interactive processing
8
Informix to Spark
Use cases
9
Real-Time Operational Database Streaming Analytics with Spark
Applications that drive business have positioned relational databases at the center of operations.
To continue their success, businesses need to use streaming analytics to gain real-time insights into their operations and take actions to optimize outcomes.
Infrequent batch analytics on “stale” data losing competitive edge. Increasing demand for real-time analytics to stay in the lead.
10
SENSE -> ANALYZE -> ACT As data ages, business value diminishes. Sense → Analyze → Act in seconds/ milliseconds, not days
or weeks
Sense
Analyze
ActSense
Analyze
ActDays Days
Seconds
Days
Seconds
Batch
Real-time
11
Connected Vehicles Energy & Utilities Health Care
Driving behavior matching Power consumption
Continuously streaming data from IBM Informix to analytics platform
Streaming analytics service sample scenarios
…
How does power consumption
correlated between House
A,B,C D?
Detect abnormal patterns in ECG
series
Detect the anomaly driving behavior cause higher fuel
consumptions
Increasing demand for real-time analytics
Finance
Detect the anomaly by price change rate
in time window
Steady price change
Vibration in short period
Market Manipulation Detection Heart Attack Prevention
Cloud Service Operation
Detect the system resource peak and valley, correlates with workload
information
Server health diagnosis
12
Real-time analytics - Industry
Information technology – Systems & Network monitoring IoT - sensor data analytics and processing Financial transactions – authentication, fraud detection,
validation Inventory control – consumer trends and demands Website analytics – ad targeting Many others….
13
Real-time analytics - applications
Data analyzed as it arrives – data in motion Simple: Monitoring, alerts/reports, statistics Complex: predictive analytics (regressions,
machine learning, etc…), K-means clusters (classification, anomaly detection)
Many store events as well, combine with later batch processing.
Immediate actions possible.
14
Informix to Spark
Challenges
15
Exploring data and discovering actionable business insights
The problem - Often users will not know what exact analytics they want to do
Difficult to justify cost/risk of a complex solution without specific business value
Need to reduce the cost/risk of adding real-time data analytics pipeline to application architecture
Let data scientist explore data to find useful data analytics without interfering with existing business.
16
We're running an Informix database. How to incorporate real-time analytics into our
application architecture?
Application Server Database
17
Out-dated approach - requires additional complexityIncreased risk and cost.
Application Server
Additional Component
Additional Component
18
Informix to Spark
Prototype Implementation
19
Real-Time Operational Database Streaming Analytics with Spark
Newly prototyped feature for the Informix database. Enables Informix customers to stream data added to their
database in real-time via MQTT, which can then be consumed by an analytics platform such as Apache Spark.
20
Informix MQTT Streamer – Enable real-time analytics pipeline which drastically
reduces complexity, cost and risk
Informix MQTTStreamer
21
How is it implemented?
Uses Informix Virtual-Index Interface (VII)VII allows us to write UDRs that will be triggeredwhenever certain SQL statements are executedThis is typically used to create indexes for customdata types. Instead, we use it to write data to a socket during INSERT/UPDATE statements
VII UDR:Publish to MQTT broker
MQTT broker
22
Installation and basic usage
Open Sourced! Available on github –
https://github.com/IBM-IoT/InformixSparkStreaming Run install script Add the streaming index to the column whose values
you want to stream
create index stream on table(col1, col2) USING streaming_index;
The Nitty gritty
• Installed into Informix is a set of custom UDRs that convert data into MQTT messages and sends them to a specified address
• Virtual Table Indexes detect data insert/update/deletes as they happen and trigger the messages to be sent
• Once in an MQTT broker, almost anything can consume it– MQTT clients available for most programming languages (include
Java for Apache spark)• Spark can analyze the data, compare it to historical data,
use streaming k-means algorithms to determine changes in the data
24
The Nitty gritty continued Once installed, the custom “streaming_index” index type
will be available for use. Running the “create index” command and specifying to use
the “streaming_index” index type will run the code in the custom UDRs that will push the data via MQTT.
Then, whenever you run the INSERT statement on the column that you created the streaming index on, the data that you inserted will automatically be published to an MQTT broker.
See the “IBM Informix Virtual-Index Interface Programmer's Guide” for more details.
25
In-depth Does the prototype work for Temp. tables?
No specific index-related restrictions to temp. tables Do we lock the tables?
The VII will lengthen the amount of time a lock is held Future item - multiple concurrent writers to a per-table
queue, flushed asynchronously by a separate thread Would this work for multi-nodes (sharding)?
The current prototype is really delegating this to Spark, where multiple input streams could be merged into one
26
In-depth
Installs in seconds No need to upgrade database No need to restart database server Can be installed and activated on a live production
database! Minimal interference with existing business
application
27
Informix to Spark
Demo
Heart To Spark
• Demonstration for real time streaming of data from the Informix engine into a message broker for digestion by one or more services
• Simulates IOT data from a heart rate monitor• Watches for trends in heart rates
– Poor health/stress can cause a rise in baseline heartrate which is measurable
• Uses Spark Analytics to determine baseline heartrates and plots the trend (heartrate rising, steady, or falling)
• Graphing tools in browser show us a view of the data
30
Demo - Installation
Heart Monitor
Informix
Message Broker
Apache Spark
Analytics
Display Results
IOT devices send data into the Informix
server
Data Streams from Informix into an MQTT
broker
From MQTT Data is streamed into Spark for real-time Analysis
Results from both Informix and Spark available to the end
user
Overview
32
Not limited to Apache Spark
Can be used by any application/platform that can consume TCP socket data.
IBM Infosphere Streams Apache Storm Custom applications (most programming languages
have MQTT libraries) Many, many others.
33
Informix to Spark
What’s next?
34
Endless possibilities
Check out Apache Spark for more information about analytics and machine learning
http://spark.apache.org/ Learn more about Machine Learning and its
potential https://www.coursera.org/learn/machine-learning Contact IBM Informix
35
Questions?
Pradeep Natarajan@pradeepnatara
35
Top Related