A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation Algorithms

Post on 12-Apr-2017

338 views 0 download

Transcript of A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation Algorithms

A Public Cloud Based SOA Workflow for Machine Learning

Based Recommendation Algorithms

Presented By Srinivasan Thanukrishnan, Founder & CEO

Ram G Athreya, Research InternGlosys Technology Solutions Pvt. Ltd.

Chennai, India

5th IEEE International Conference on Cloud & Service Computing – SC2 2015

Outline

• Challenges

• Motivation

• Proposed System

• Design Components

• Experimental Results

• Conclusion

• Future Work

Challenges• Existing workflows for web based (SOA)

applications elaborate only on certain aspects such as– Cloud Computing– Backend– Frontend– Development & Testing– Machine Learning

• How do all these work together at a big picture level?

Motivation• Our aim is to combine these vast and disparate fields and

provide a cohesive framework for building such applications

• We propose a multi layered architecture that will simulate the structure of a cloud environment in terms of frontend and backend, and examine how it can leverage Machine Learning

• For completeness we created an actual Retail Application which integrates the above technologies

Proposed System• It comprises three major modules which are

– Product Information System (PIS)

– Analytics Based Inventory Management (ABIM)

– Transaction Based Analytics (TBA)

• To build a framework that will simulate the architectural setup of an E – Commerce site

• To examine how it can improve its sales by employing intelligence

• To derive a general workflow on how such systems can be built end-to-end starting from the user interface up to the machine learning algorithm that powers it in the backend

Design Components

• Cloud Architecture• Back End Application Stack• User Interface Design• Development Environment• Load Testing• Machine Learning Based Recommendation

Algorithms

Cloud Architecture

• Core components of the Cloud Architecture are– Content Delivery Network (CDN)

– Load Balancer

– Server Instances

– Storage Services

Content Delivery Network (CDN)

• It is a large distributed network of servers across geographies

• It serves assets such as images, css, js

• The CDN caches requests• Thus load to origin server

is reduced

Load Balancers• To optimize resource use, maximize throughput, minimize response

time

• It employ round-robin or least recently used algorithms to route internet traffic

• The ability of auto-scaling

• During a traffic spike it automatically increases the number of application server instances

Server Instances• It generates responses with the help of backing services

• It contains only application code which is version controlled

• Creating a new instance is as simple as checking out the latest version of the codebase and deploying it within an instance

• They are commodity servers which can be scaled on demand

Storage Services

• Storage Services– Database

– Middleware (Cache)

– Static Assets (AWS S3)

Storage Services• To protect the database and ensure its availability, a

Master-Slave setup is required

• All database write operations happen at the master and are replicated to the slaves, while the read operations are carried out on the slave instances

• If master fails one of the slave nodes becomes the new master

Storage Services• The cache lies between the application servers and

the database

• It has in-memory (RAM) storage

• This ensures speeding up of requests since fewer queries hit the database

• AWS S3 was used to store static assets such as images, css & js

Cloud Architecture

Back End Application Stack• Model-View-Controller (MVC)

– Promotes the principle of ‘separation of concerns’

– The Model is responsible for managing the data required by the application

– The View is responsible for presentation of data triggered by a Controller action

– Template systems are used to embed dynamic data within the HTML structure of the View

– The Controller is responsible for responding to user requests

Back End Application Stack

User Interface Design

• Three technologies come into play in this regard

– HTML

– CSS

– JS

HTML

• Basically a set of tags within which content is placed

• Starts with <html> tag

• Has two major sections which are <head> and <body>

• <head> contains metadata

• <body> contains all the content

CSS• It achieves this in the form of rules that

are defined on HTML selectors

• Additionally LESS a CSS pre-processor is necessary

• LESS provides additional features such as variables, functions and mixins etc

• This makes CSS more maintainable, themable and extensible

JS• For dependency management, Bower is

used which is package manager for browser development

• Require.js is a library for asynchronously loading Javascript dependencies within a web page

• jQuery is used for DOM Navigation, Event Handling and AJAX calls with the server

Responsive Web Design

• Designing for large variety of devices with varying screen sizes and resolutions is difficult

• To support multiple devices, a web design methodology called responsive web design (RWD) was used to provide optimal viewing experience across a wide range of devices

• RWD achieves this capability with the help of CSS3 Media Queries which is a W3C Recommendation

User Interface Design

Development Environment• Ideally, the development

environment must be similar to the production environment

• Vagrant is a Free and Open Source Software (FOSS) for creating and configuring virtual development environments

• This setup ensures that environment related bugs are kept to a minimum

Development Environment

• Any Software Project would involve multiple developers working together.

• That fact brings about the need for a version control system (VCS) since a version control system makes tracking changes easy

• The Git Distributed Version Control System (DVCS) was used to commit and track code changes and was hosted in a GitHub repository

Load Testing

• Developers typically measure a Web application’s quality of service in terms of response time, throughput, and availability

• Load testing measures an application’s QoS performance based on actual customer behavior

• When customers access the site, a script recorder uses their requests to create interaction scripts

• A load generator then replays the scripts, possibly modified by test parameters, against the website

Load Testing

Machine Learning Based Recommendation Algorithms

• To illustrate the intelligence portion of the system, Apriori and sequential pattern based machine learning algorithms were employed

• Both algorithms take the transaction data of user purchases as input based on which each algorithm individually makes predictions on what the user might buy next

• Although both algorithms try to find frequently occurring patterns in the dataset, they employ different methodologies and hence come up with slightly different results

• The algorithms were implemented using the R programming language

Apriori Based Algorithm

• The Apriori algorithm takes the historical transaction data of users (stored in the database) so that it can identify frequently occurring itemsets that can then be formulated into association rules.

• For example a rule might be where a user who buys a smartphone is also likely to buy earphones, that is {smartphone} => {earphone}

• Such a rule can be found by the algorithm if there are enough transactions to support it

• These rules ultimately become insights on what the user might do next and can be given as product recommendations within the application

Apriori Based Algorithm

Sequential Pattern Mining Based Algorithm

• The sequential pattern mining algorithm also attempts to mine relevant patterns from available data, but it additionally takes the order of the pattern into account

• The algorithm tries to find patterns based on the order in which transactions take place

• There are many variations of the sequential pattern mining algorithm, the one used by the program is called SPADE (Sequential PAttern Discovery using Equivalence classes)

Sequential Pattern Mining Based Algorithm

Cloud Based Development Environment

Technology Software/Tool

CDN CloudFront

Load Balancer HA Proxy

Server Instances Ubuntu 14.04

Distributed Cache Redis

RDBMS MySQL

Assets Storage AWS S3

Orchestration & Provisioning Chef

Backend Development Environment

Technology Software/Tool

Server Language Node.js

Server Package Management NPM

MVC Framework Express.js

Template Engine Jade

ORM Node-ORM

Redis Library Node_redis

Authentication Passport.js

Front End Development Environment

Technology Software/Tool

Content HTML

Presentation CSS

Interactivity JS

RWD Framework Bootstrap

CSS Pre-processor LESS

Frontend Framework jQuery

Frontend Package Management Bower

Dynamic Script Injection Require.js

Other ToolsTechnology Software/Tool

Machine Learning Tool R

Load Testing Apache Jmeter

Development Virtualization Vagrant

VCS Git

VCS Hosting GitHub

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Conclusion

• We presented a workflow for creating online applications deployed in a cloud environment

• We looked at its cloud architecture in exhaustive detail and how different cloud appliances such as virtual machines, load balancers etc interact with each other

• We also focused on how such an application is built from the ground up, including its backend architecture, user interface built using the responsive web design technique as well as its development workflow

Conclusion• For completeness, we examined how such a cloud application

should be tested to ensure its reliability at scale

• Finally, we explored how such a system could leverage the vast amounts of data it collects and employed Apriori and sequential pattern based machine learning algorithms to generate insights about its users

• Using these insights the application can better assist its customers by providing relevant and timely recommendations based on their behavior

Future Work

• In future works, we plan to explore the performance of algorithms used in such a cloud application and interoperability between two or more algorithms and the usage of a more distributed architecture, such as Hadoop for the machine learning setup

Thank You

May 3, 2023 Data Mining: Concepts and Techniques 44

The Apriori Algorithm — Example

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2 C2Scan D

C3 L3itemset{2 3 5}

Scan D itemset sup{2 3 5} 2