Internet of Things Chicago - Meetup

12
Capturing & Analyzing High Velocity High Volume Machine Data Jason Lobel CEO @jasonlobel December 3, 2013

description

Capturing and Analyzing High Volume, High Velocity Machine Generated Data

Transcript of Internet of Things Chicago - Meetup

Page 1: Internet of Things Chicago - Meetup

Capturing & Analyzing

High Velocity High Volume

Machine Data

Jason Lobel CEO @jasonlobel

December 3, 2013

Page 2: Internet of Things Chicago - Meetup

Internet of Endpoints “THINGS” (IOT) Everything (IOE)

Data & Machines

50B

12.5B

  Data is

  Machine readable (API)

  Accessible on-demand

  Possibly even open (Public)

  Includes non-machine generated data or streaming data (catalogs, locations, historical data, etc.)

  Primarily sensor-based

Page 3: Internet of Things Chicago - Meetup

Collect > Unify > Transform > Report > Predict

Page 4: Internet of Things Chicago - Meetup

Capturing Streaming Data – Considerations

Backend Architecture Why Important

  NoSQL datastore   Long-term scale with data volume

  No joins for queries in reporting

  Auto scaling cloud hosting

(AppEngine, AWS)

  High availability

  Ideal for unpredictable demand

  Spend less time on server tuning

  Enable REST APIs

  Writeable and Retrievable

  JSON over XML

  APIs for history, real-time, query (SQL), and even predictive

  Enable JavaScript & mobile applications

  Real-time data

  Power dashboards or visualizations

  Tracking/ How is data consumed

  Unify with other sources

  OAuth2.0 Security   API management

  Multi-party (internet/external) access

  Dedicated caching   Faster data retrieval speed

Smart storage / backend setup is a key catalyst for downstream analysis

Page 5: Internet of Things Chicago - Meetup

APIs Fuel Any Channel & Big Data Analytics   Public vs. Private: Estimate 10x more private APIs

  Open: Gartner predicts 75% of the Fortune 500 are predicted to have open APIs by 2014

  Competition: By 2015, APIs will be default, like websites in 2000 (Kin Lane, ex White House Fellow)

Growth In Public APIs

Page 6: Internet of Things Chicago - Meetup

Unify IOT Data with Other Sources

Page 7: Internet of Things Chicago - Meetup

APIs Fuel Interactive Visualizations D3.js (d3js.org)

  JavaScript library for manipulating documents using HTML, SVG and CSS

Page 8: Internet of Things Chicago - Meetup

APIs => Programmable => Smart Controls

Page 9: Internet of Things Chicago - Meetup

Make Apps Smarter with Machine Learning Recommendation:  Analyzes users' preferences and finds items users might like Frequent Pattern Mining:  Discovers unique frequently co-occurring items in a transaction list

Classification:   Learns from existing categorized data and assigns a category to

uncategorized data

Clustering:  Organizes items from a large volume of data into groups of similar items

and features

Page 10: Internet of Things Chicago - Meetup

Machine Learning Algorithm APIs?

Hard Eas{ier} Human

  Finding a data scientist

Technical

  Database selection

  Algorithm(s) selection

  Model training & iteration

  Embedding predictions into applications

  Security

  Query speed / caching

  Scaling

  On-Demand Access

Human

  Finding an engineer that can use an API

  Training (if needed)

Technical

Page 11: Internet of Things Chicago - Meetup

Common ML Applications for Retail   Item Recommendation: observes what the user likes and finds similar items

(“I like the Chicago Bulls, I may like the Chicago Bears”)

 User Recommendation: recommend items finding similar users and sees what they like (e.g., Kin and I are friends. He likes IPAs. I may like IPAs)

  Item/Action Affinity: if X user wants X, what else is Y user likely to want based on the relationship between X and Y (men who buy diapers, also buy beer)

 Predict Inventory: based on history, predict future sales (next 7, 30 days, etc.)

 Discover Customer Segments: examine purchasing habits to identify clusters of shopper segments

 Prevent Fraud: identify anomalies in cashier activity, such as voids (is this likely fraud? yes/no)

Page 12: Internet of Things Chicago - Meetup

What We Do with Streaming Data Focus = at least one massive data source can be transformed into many insights that were not possible before at a fraction of the cost of legacy tools   Supermarkets: point-of-sale data, product catalog, sensors, etc.   eCommerce: web behavior, point-of-sale data, product catalog, etc.

Supermarket / C-Store Retail/eCommerce Before SwiftIQ   Unable to store POS order and cashier history After SwiftIQ   Detailed transaction history available on-demand   Able to pursue real-time supply chain initiatives   Now can analyze product affinity to plan merchandising

strategies, promotions and optimize localization   Capable of visualizing data or generating interactive reports   Able to better predict inventory requirements   Better optimize hiring   Identify cashier fraud

Before SwiftIQ   Unable to unify disparate data (POS, web, mobile, CRM)   Unlikely to store web behavior After SwiftIQ   Enable relevant, personalized digital experiences   Know specific customer segments vs. using intuition   Analyze product affinity to plan merchandising strategies,

promotions and optimize localization   Capable of visualizing data or generating interactive reports   Able to better predict inventory requirements