WP Machine Learn Hadoop

8/10/2019 WP Machine Learn Hadoop

1/2

White Paper

Machine Learning in the Enterprise Hadoop

2014 Alethe Labs All rights reserved. Alethe Labs and the Alethe Labs logo are trademarks or registered trademarks of Alethe Labs. All other

trademarks are the property of their respective companies. Information is subject to change without notice.

1.Abstract

Over the past few years organizations

are storing huge amounts of data sets

and using this Big Data as their

competitive advantage. The challengelies when a human has to intervene in

every scenario of processing and

analyzing Big Data. The ideal case is

made when machine can collaborate

and help humans in decision-making.

This white paper examines the role of

machine learning in the most popular

Big Data platform, Hadoop.

This is one of the many white papers we

plan to publish and understand the roleof machine learning with Hadoop to

enhance business processes.

2. Key Words

Hadoop, Big Data, Machine Learning

3. Introduction

Organizations today generate huge

amount of data. This data includes bothunstructured and structured data that

is stored in silos of databases and

archive solutions. Managing and

analyzing this data using legacy

systems is challenging and sometimes

impractical. Companies today focus on

generating business benefits out of their

huge data sets. Hadoop is unparalleled

as a high efficiency Big Data platform to

store, process and analyze data in cost

effective model.

The challenge lies in acquiring

knowledge base from the experts to

understand the processing of data sets,

the data interpretation techniques and

creating value that empowers business

decisions. Either we can create different

processes for knowledge base or we can

code the learning technique inside the

machine i.e. Machine Learning.

4.Apache Hadoop & Big Data

Apache Hadoop is an open source

distributed software platform for

storing and processing data running on

multiple servers. Hadoop is written inJava. Hadoop implements a

computational paradigm named

Map/Reduce, where the application is

divided into many small fragments of

work, each of which may be executed or

re-executed on any node in the cluster.

In addition, it provides a distributed file

system (DFS) that stores data on the

compute nodes, providing very high

aggregate bandwidth across the cluster.

Using Hadoop, you can store petabytesof data reliably on tens of thousands of

servers while scaling performance cost-

effectively by merely adding inexpensive

nodes to the cluster.

Following is the Hadoop physical

architecture with the master & slave

nodes and the Hadoop logical

architecture with Map Reduce.

8/10/2019 WP Machine Learn Hadoop

2/2

WP Machine Learn Hadoop

Documents

Transcript of WP Machine Learn Hadoop