WP Machine Learn Hadoop
-
Upload
alethelabs -
Category
Documents
-
view
219 -
download
0
Transcript of WP Machine Learn Hadoop
-
8/10/2019 WP Machine Learn Hadoop
1/2
White Paper
Machine Learning in the Enterprise Hadoop
2014 Alethe Labs All rights reserved. Alethe Labs and the Alethe Labs logo are trademarks or registered trademarks of Alethe Labs. All other
trademarks are the property of their respective companies. Information is subject to change without notice.
1.Abstract
Over the past few years organizations
are storing huge amounts of data sets
and using this Big Data as their
competitive advantage. The challengelies when a human has to intervene in
every scenario of processing and
analyzing Big Data. The ideal case is
made when machine can collaborate
and help humans in decision-making.
This white paper examines the role of
machine learning in the most popular
Big Data platform, Hadoop.
This is one of the many white papers we
plan to publish and understand the roleof machine learning with Hadoop to
enhance business processes.
2. Key Words
Hadoop, Big Data, Machine Learning
3. Introduction
Organizations today generate huge
amount of data. This data includes bothunstructured and structured data that
is stored in silos of databases and
archive solutions. Managing and
analyzing this data using legacy
systems is challenging and sometimes
impractical. Companies today focus on
generating business benefits out of their
huge data sets. Hadoop is unparalleled
as a high efficiency Big Data platform to
store, process and analyze data in cost
effective model.
The challenge lies in acquiring
knowledge base from the experts to
understand the processing of data sets,
the data interpretation techniques and
creating value that empowers business
decisions. Either we can create different
processes for knowledge base or we can
code the learning technique inside the
machine i.e. Machine Learning.
4.Apache Hadoop & Big Data
Apache Hadoop is an open source
distributed software platform for
storing and processing data running on
multiple servers. Hadoop is written inJava. Hadoop implements a
computational paradigm named
Map/Reduce, where the application is
divided into many small fragments of
work, each of which may be executed or
re-executed on any node in the cluster.
In addition, it provides a distributed file
system (DFS) that stores data on the
compute nodes, providing very high
aggregate bandwidth across the cluster.
Using Hadoop, you can store petabytesof data reliably on tens of thousands of
servers while scaling performance cost-
effectively by merely adding inexpensive
nodes to the cluster.
Following is the Hadoop physical
architecture with the master & slave
nodes and the Hadoop logical
architecture with Map Reduce.
-
8/10/2019 WP Machine Learn Hadoop
2/2