MACHINE LEARNING ON MAPREDUCE FRAMEWORK

14
Abhijit Kumar Behera M.Tech (CSE) Roll No. 1350001 School of Computer Engineering Guided By : Dr. Laxman Sahoo

Transcript of MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Page 1: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Abhijit Kumar Behera

M.Tech (CSE)

Roll No. 1350001

School of Computer Engineering

Guided By : Dr. Laxman Sahoo

Page 2: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Contents

Introduction

Apache Hadoop related projects Application of Mahout Literature Survey

Plan of Action

Conclusion

References

Page 3: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Introduction

•The K-means algorithm is one of the most well-known clustering algorithms that has been frequently used to variety of problems. •MapReduce as the most popular cloud computing parallel framework is effective to handle massive data, the researches of K-means clustering algorithm which is based on MapReduce become a focus for scholars.

Page 4: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Components of Hadoop

HDFS •Name Node •Data Node •Secondary Name Node

Map Reduce •Map() •Combine() •Reduce()

YARN •Job Tracker •TaskTracker

HBase

Page 5: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

MapReduce Word count process

Page 6: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Hadoop ( HDFS and

MapReduce)

HBase

Mahout

Spark

HIVE

Zookeeper Sqoop

PIG

Apache Hadoop Projects

Page 7: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Application of Mahout

Collaborative Filtering Matrix factorization based recommenders

A user based Recommender

Clustering Canopy Clustering

K-Means Clustering

Fuzzy K-Means

Affinity Propagation Clustering

Classification Naive Bayes

Random forest classifier

Page 8: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Literature Survey

An Improved parallel K-means Clustering Algorithm with MapReduce Authors Name: Qing Liao, Fan Yang, Jingming Zhao Journal : Communication Technology (ICCT), IEEE Year of Publication:2014

Parallel K-means Algorithm 1) Initial 2) Mapper 3) Reducer

Page 9: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Literature Survey...

Page 10: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Literature Survey Clouds for Scalable Big Data Analytics

Authors Name: Domenico Talia Journal: IEEE Computer Society

Year of Publication:2013 In this paper, author describe how cloud computing enhance the development and

functionality of Big Data Analytics when it deployed into it. Cloud Service Model Features Users

Data analytics software as a service A single and complete data mining

application or task (including data sources)

offered as a service

End users, analytics managers, data

analysts

Data analytics platform as a service A data analysis suite or framework for

programming or developing high-level

applications, hiding the cloud

infrastructure and data storage

Data mining application developers,

data scientists

Data analytics infrastructure as a

service

A set of virtualized resources provided to a

programmer or data mining researcher for

developing, configuring, and running data

analysis frameworks or applications

Data mining programmers, data

management developers, data

mining researchers

Page 11: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Plan of Action

August - October 2014 Literature survey is done.

November 2014

Problem definition formulation is

done and problem solving outline are

yet to be done

December 2014- January 2015 Find out the appropriate solution of

the problem yet to be formulated

February-May 2015 Final implementation of the solution

with result yet to be done

Page 12: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

Conclusion

Large-scale data mining has been a new challenge in recent years. Using the Map-Reduce frame work the big data analytics can be accomplished. The K-means algorithm is one of the most well-known clustering algorithms. However, its processing performance has usually encountered a bottleneck if being utilized to deal with massive data. A parallel K-means algorithm with MapReduce which shows obvious advantage is implemented to handle massive data.

Page 13: MACHINE LEARNING ON MAPREDUCE FRAMEWORK

References

[1] Walisa Romsaiyud, Wichian Premchaiswadi, " An Adaptive Machine Learning on Map-

Reduce Framework for Improving performance of Large-Scale Data Analysis on EC ",

Eleventh IEEE Int'l Conf. on ICT and knowledge Engineering, 2014

[2] Domenico Talia," Clouds for Scalable Big Data Analytics ", IEEE Computer Society, 2013

[3] Feng Ye, Zhijan Wang , "Cloud-based Big Data Mining & Analyzing Services

Platform integrating R", IEEE International Conference on Advance Cloud and Big Data

, 2013

[4]. Apache-Hadoop -http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F

Page 14: MACHINE LEARNING ON MAPREDUCE FRAMEWORK