Post on 13-Jun-2015
description
Capturing & Analyzing
High Velocity High Volume
Machine Data
Jason Lobel CEO @jasonlobel
December 3, 2013
Internet of Endpoints “THINGS” (IOT) Everything (IOE)
Data & Machines
50B
12.5B
Data is
Machine readable (API)
Accessible on-demand
Possibly even open (Public)
Includes non-machine generated data or streaming data (catalogs, locations, historical data, etc.)
Primarily sensor-based
Collect > Unify > Transform > Report > Predict
Capturing Streaming Data – Considerations
Backend Architecture Why Important
NoSQL datastore Long-term scale with data volume
No joins for queries in reporting
Auto scaling cloud hosting
(AppEngine, AWS)
High availability
Ideal for unpredictable demand
Spend less time on server tuning
Enable REST APIs
Writeable and Retrievable
JSON over XML
APIs for history, real-time, query (SQL), and even predictive
Enable JavaScript & mobile applications
Real-time data
Power dashboards or visualizations
Tracking/ How is data consumed
Unify with other sources
OAuth2.0 Security API management
Multi-party (internet/external) access
Dedicated caching Faster data retrieval speed
Smart storage / backend setup is a key catalyst for downstream analysis
APIs Fuel Any Channel & Big Data Analytics Public vs. Private: Estimate 10x more private APIs
Open: Gartner predicts 75% of the Fortune 500 are predicted to have open APIs by 2014
Competition: By 2015, APIs will be default, like websites in 2000 (Kin Lane, ex White House Fellow)
Growth In Public APIs
Unify IOT Data with Other Sources
APIs Fuel Interactive Visualizations D3.js (d3js.org)
JavaScript library for manipulating documents using HTML, SVG and CSS
APIs => Programmable => Smart Controls
Make Apps Smarter with Machine Learning Recommendation: Analyzes users' preferences and finds items users might like Frequent Pattern Mining: Discovers unique frequently co-occurring items in a transaction list
Classification: Learns from existing categorized data and assigns a category to
uncategorized data
Clustering: Organizes items from a large volume of data into groups of similar items
and features
Machine Learning Algorithm APIs?
Hard Eas{ier} Human
Finding a data scientist
Technical
Database selection
Algorithm(s) selection
Model training & iteration
Embedding predictions into applications
Security
Query speed / caching
Scaling
On-Demand Access
Human
Finding an engineer that can use an API
Training (if needed)
Technical
Common ML Applications for Retail Item Recommendation: observes what the user likes and finds similar items
(“I like the Chicago Bulls, I may like the Chicago Bears”)
User Recommendation: recommend items finding similar users and sees what they like (e.g., Kin and I are friends. He likes IPAs. I may like IPAs)
Item/Action Affinity: if X user wants X, what else is Y user likely to want based on the relationship between X and Y (men who buy diapers, also buy beer)
Predict Inventory: based on history, predict future sales (next 7, 30 days, etc.)
Discover Customer Segments: examine purchasing habits to identify clusters of shopper segments
Prevent Fraud: identify anomalies in cashier activity, such as voids (is this likely fraud? yes/no)
What We Do with Streaming Data Focus = at least one massive data source can be transformed into many insights that were not possible before at a fraction of the cost of legacy tools Supermarkets: point-of-sale data, product catalog, sensors, etc. eCommerce: web behavior, point-of-sale data, product catalog, etc.
Supermarket / C-Store Retail/eCommerce Before SwiftIQ Unable to store POS order and cashier history After SwiftIQ Detailed transaction history available on-demand Able to pursue real-time supply chain initiatives Now can analyze product affinity to plan merchandising
strategies, promotions and optimize localization Capable of visualizing data or generating interactive reports Able to better predict inventory requirements Better optimize hiring Identify cashier fraud
Before SwiftIQ Unable to unify disparate data (POS, web, mobile, CRM) Unlikely to store web behavior After SwiftIQ Enable relevant, personalized digital experiences Know specific customer segments vs. using intuition Analyze product affinity to plan merchandising strategies,
promotions and optimize localization Capable of visualizing data or generating interactive reports Able to better predict inventory requirements