mapReduce for machine learning

21
MapReduce for Machine Learning by pranya prabhakar S4 MCA 05

Transcript of mapReduce for machine learning

Page 1: mapReduce for machine learning

MapReduce for Machine Learning

by

pranya prabhakarS4 MCA

05

Page 2: mapReduce for machine learning

CONTENTSIntroductionMachine LearningMapReduceML on MapReduceApache mahout and its

installation stepsConclusion

Page 3: mapReduce for machine learning

Introduction• Data increasing rapidly

• It is necessary to process and to analyze the data

• Analyzing the data by machine as a human being. …Different

Page 4: mapReduce for machine learning

Machine LearningSupervised Learning: Generate a function based upon assigned labels that maps inputs to desired outputs.

Unsupervised Learning: Looks for patterns native to a dataset, and models it like clustering (e.g. Data mining &knowledge discovery).

Reinforcement Learning: Learns how to act given reward(or punishment) from the world.

Page 5: mapReduce for machine learning
Page 7: mapReduce for machine learning

Types of problems Classification:

data is labeled means it assigned a class - Learn a model from a manually classified data - Predict the class of a new object based on its features and the learned model e.g.: spam/non-spam, fraud/non-fraud

Clustering data is not labelled,but can be divided into groups based on similarity - Group similar looking objects - Notion of similarity: Distance measure: eg:organizing pictures by faces without names.

Regression Data is labeled with real value rather than a label

eg:time series data like the price of a stock over time.

Page 8: mapReduce for machine learning

Supervised LearningAlgorithms

Decision Treesk-Nearest NeighboursNaive BayesLogistic RegressionPerceptron and Multi-level

PerceptionsNeural NetworksSVM and Kernel estimation

Page 9: mapReduce for machine learning

Unsupervised LearningAlgorithmsClustering

◦k-Means, MinHash, Hierarchical Clustering

Hidden Markov ModelsFeature Extraction methodsSelf-organizing Maps (Neural

Nets)

Page 10: mapReduce for machine learning

uses

Spam filteringCredit card Fraud detectionFace recognition(computer

vision)Speech understandingMedical diagnosis and so on…

Page 11: mapReduce for machine learning

Current state of ML libraries

Lack scalabilityLack documentations and examplesLack Apache licensingAre not well testedAre Research orientedNot built over existing production

quality librariesLack “Deployability”

Page 12: mapReduce for machine learning

MapReduceIt’s a programming frameworkUsed for parallel processing over

large data setsApplication divided into small

fragments of works and distributed across the cluster

Computation unit of HadoopTwo functions: Map() and

Reduce()

Page 13: mapReduce for machine learning

Apache mahout

The starting place for MapReduce-based machine learning

A disparate collection of algorithms for

Recommendation Clustering Classification Frequency item Mining

Page 14: mapReduce for machine learning

Mahout installation Prerequisites

java Hadoop maven Java installation

1. sudo apt-get install sun java jdk 2. sudo gedit .bashrc set JAVA_HOME in .bashrc file Installation of maven

1. sudo apt-get install maven2 2. open .bashrc and add the lines ############## Apache-Maven ######### export M2_HOME=/usr/local/apache-maven-3.0.4 export M2=$M2_HOME/bin export PATH=$M2:$PATH export JAVA_HOME=$HOME/programs/jdk

Page 15: mapReduce for machine learning

Contd..

Run  mvn --version to verify that it is correctly installed.

Page 16: mapReduce for machine learning

Hadoop installation single node hadoop cluster has been set up as how java installed

Installation of Mahout 1.  http://www.apache.org/dyn/closer.cgi/lucene/mahout/  2. Create a folder and move the download file to the created directory say, mkdir usr/local/mahout 3.Mvn install..it shows as

Page 17: mapReduce for machine learning
Page 18: mapReduce for machine learning
Page 19: mapReduce for machine learning

Example showing 20news group’s database

Page 20: mapReduce for machine learning

Application of Mahout Collaborative Filtering Matrix factorization based recommenders A user based Recommender Clustering Canopy Clustering K-Means Clustering Fuzzy K-Means Affinity Propagation Clustering

Classification Naive Bayes

Page 21: mapReduce for machine learning

Conclusion

By using the mapReduce framework, we could parallelize a wide range of machine learning algorithms and apache mahout provide s a platform for machine learning in mapReduce paradigm.