Real time machine learning

27
1 Real-time Machine Learning Vinoth Kannan Intelligent software architecture using Modified Lambda architecture & Apache Mahout SkillFactory 71 [email protected]

description

 

Transcript of Real time machine learning

Page 1: Real time machine learning

1

Real-time Machine Learning

Vinoth Kannan

Intelligent software architecture using Modified Lambda architecture & Apache Mahout

SkillFactory 71

[email protected]

Page 2: Real time machine learning

2

Agenda

What is Machine Learning ?Need for Real Time Machine LearningWhat is Lambda architecture ?What is Mahout ?How does a basic recommendor engine works ?Some Use Cases

Page 3: Real time machine learning

3

What is machine learning?

Page 4: Real time machine learning

4

IntroductionMachine Learning from Streaming Data

Model that considers

recent history Model that is updatable

Machine Learning

It has been sunny and 30 degrees in the last two days, it is unlikely that it will be -10 degrees and snowing the next day

A retail sales model that remains accurate as the business gets larger

Dont they both mean the same ??

Page 5: Real time machine learning

5

IntroductionMachine Learning from Streaming Data

Time-series prediction non-stationary data distributions

weather Retail sales

Model that considers

recent history Model that is updatable

Page 6: Real time machine learning

6

IntroductionMachine Learning from non-stationary data distributions

Incremental Algorithms

non-stationary data distributions

Batch algorithm

These are machine learning algorithms that learn incrementally over the data. 

These are machine learning algorithms that re-trains periodically with a batch algorithm. 

Page 7: Real time machine learning

7

IntroductionThe Challenge for the Best Big Data Technology

Hadoop

Batch processing System that can churn huge volume of data

Storm

Real time complex event processing System that can process data stream

Page 8: Real time machine learning

Wrong Fight !!!

Page 9: Real time machine learning

9

+ =Real-timeBig Data

Its a Chance not a Challenge

Lambda Architecture!!!

Page 10: Real time machine learning

10

Lambda ArchitectureOverview

Speed Layer

Serving layer

Batch layer

Page 11: Real time machine learning

Speed Layer• Only new data• Compensates for high latency

Serving layer updates• Batch layer overrides speed

layer

Serving layer• Loads and expose the batch

views for querying • Random access to batch views

Batch layer• Immutable, constantly growing

datasets• Batch views are computed from

this raw dataset

Lambda ArchitectureOverview with description

Page 12: Real time machine learning

12

Basic Idea behind Lambda architecture

query =  function(all data)- Nathan Marz

Big Data - Principles and best practices of scalable realtime data systems

Page 13: Real time machine learning

13

Basic Idea behind Lambda

𝑓 (𝑎0…𝑎𝑚)Perform some function from real-time data “0“ to the history data “n“

Real Time Big Data

= +

Lambda Architecture

Hadoop ProcessStorm ProcessReal Time Big Data

} } }

Letting the History data processed by Hadoop makes process faster

Page 14: Real time machine learning

14

The Problem

= +

Batch ProcessReal-timeReal Time Big Data

} } }• How to define the boundery between Real-time and Batch

Process ?• How to synchronize the computation between the two

system ?• How to avoid gaps and overlaps ?• What algorithm to use?• How to avoid failure and have fault tolerance mechanism ?

Questions to be answered

Unanswered questions of Lambda architecture

Page 15: Real time machine learning

Modified Lamda ArchitecturePresentation Layer

• Presentation layer must aggregate the output of Storm and Hadoop outputs

• User will see the result of his events in less than 2 seconds

• Seamless merge between short and long term data

Page 16: Real time machine learning

16

Machine Learning with Mahout

Page 17: Real time machine learning

17

What is Mahout ?Introduction

• Apache Software Foundation Java library• Scalable “machine learning“ library that runs on Hadoop mostly• Currently Mahout supports mainly four use cases

Recommendation Clustering

Classification Frequent Itemset mining

• Core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm

Page 18: Real time machine learning

18

Basic Recommendor algorithmHow it works

Today‘s FOCUS : Suggesting item to user based on current search

Page 19: Real time machine learning

19

Basic Recommendor algorithmDefining recommendation

Two broad categories of recommender engine algorithms

Mahout implements a collabrative filtering framework

User-based

Recommends items by finding similar users.Harder to scale because of dynamic nature of users

Item-based

Calculate similiarty between items and make recommendations.Items usually dont change much and hence could be calculated offline

Page 20: Real time machine learning

20

Basic Recommendor algorithmDefining recommendation

User Preference to an Item

• Like Something• Dont Like something• Dont Care

1 Click = 1 Like = Uniform Preference

Safe to assume

Page 21: Real time machine learning

Mahout Library of AlgorithmsLots of algorithms to Choose From

Page 22: Real time machine learning

Use CasesReal Time Machine Learning

eCommerce

Objective : Increase sales revenue

Match potential customer to the right productPersonalise user experience on web and emailCustomer lifecycle management

Page 23: Real time machine learning

Use CasesReal Time Machine Learning

Financial Services

Objective : Real Time Fraud Detection

Compute patterns/ predictors for individual customers

Classify and Cluster custumers and recalculate patterns and predictors

Set threshold across all data

Page 24: Real time machine learning

Use CasesReal Time Machine Learning

Media

Objective : Generating Meta Data

Video/ Audio/Text analysisFind patterns/cluster for people, places,

products, things

Page 25: Real time machine learning

Use CasesReal Time Machine Learning

Carbookplus

Objective : Generating Meta Data

Match potential trips to right destinationRecommend best gas station Recommend contacts whom user might knowMatch right advertisers to customer based on

vehcile needs

Page 26: Real time machine learning

26

Summary

Ability to create real time systems based on lambda architectureUsefulness of predictive algorithms Reason to concentrate on real time predicitionsMore Read

http://storm-project.net/http://mahout.apache.org/http://hadoop.apache.org/

Page 27: Real time machine learning

27

Thank You