Bahan silabus

2
R is an open source software package to perform statistical analysis on data and Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. In order to process large data, the processing power of R can be vastly magnified by combining it with the power of a Hadoop. This course provides students with an introduction to the use of R in the Hadoop environment through the RHadoop packages. After completing this course, students will be able to use Hadoop Streaming with R, perform data analytics with R and Hadoop, analyze Big Data with machine learning, and import & export data from various databases. R is an open source software package to perform statistical analysis on data. R is a programming language used by data scientist statisticians and others who need to make statistical analysis of data and glean key insights from data using mechanisms, such as regression, clustering, classification, and text analysis. R is registered under GNU (General Public License). Apache Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. Hadoop is a top level Apache project, initiated and led by Yahoo! and Doug Cutting. It relies on an active community of contributors from all over the world for its success. Hence, in order to process large datasets, the processing power of R can be vastly magnified by combining it with the power of a Hadoop cluster. Hadoop is very a popular framework that provides such parallel processing capabilities. So, we can use R algorithms or analysis processing over Hadoop clusters to get the work done. With this agenda in mind, this book will cater to a wide audience including data scientists, statisticians, data architects, and engineers who are looking for solutions to process and analyze vast amounts of information using R and Hadoop. This course provides the experienced R programmer with an introduction to the use of R in the Hadoop environment through the RHadoop packages. This includes an overview and practical examples of use of the rhdfs, rhbase, and rmr packages for accessing the Hadoop’s Distributed File System (HDFS), interact with the HBase NoSQL database, and writing Map Reduce jobs from R respectively. The class uses a combination of lecture and labs.

description

sd

Transcript of Bahan silabus

R is an open source software package to perform statistical analysis on data and Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. In order to process large data, the processing power of R can be vastly magnified by combining it with the power of a Hadoop.

This course provides students with an introduction to the use of R in the Hadoop environment through the RHadoop packages. After completing this course, students will be able to use Hadoop Streaming with R, perform data analytics with R and Hadoop, analyze Big Data with machine learning, and import & export data from various databases. R is an open source software package to perform statistical analysis on data. R is a programming language used by data scientist statisticians and others who need to make statistical analysis of data and glean key insights from data using mechanisms, such as regression, clustering, classification, and text analysis. R is registered under GNU (General Public License).Apache Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. Hadoop is a top level Apache project, initiated and led by Yahoo! and Doug Cutting. It relies on an active community of contributors from all over the world for its success.Hence, in order to process large datasets, the processing power of R can be vastly magnified by combining it with the power of a Hadoop cluster. Hadoop is very a popular framework that provides such parallel processing capabilities. So, we can use R algorithms or analysis processing over Hadoop clusters to get the work done.With this agenda in mind, this book will cater to a wide audience including data scientists, statisticians, data architects, and engineers who are looking for solutions to process and analyze vast amounts of information using R and Hadoop.

This course provides the experienced R programmer with an introduction to the use of R in the Hadoop environment through the RHadoop packages. This includes an overview and practical examples of use of the rhdfs, rhbase, and rmr packages for accessing the Hadoops Distributed File System (HDFS), interact with the HBase NoSQL database, and writing Map Reduce jobs from R respectively. The class uses a combination of lecture and labs.

Content

Getting Ready to Use R and Hadoop

Installing R Installing RStudio Understanding the features of R language Installing Hadoop Understanding Hadoop features Learning the HDFS and MapReduce architecture Understanding Hadoop subprojects

Writing Hadoop MapReduce Programs

Understanding the basics of MapReduce Introducing Hadoop MapReduce Understanding the Hadoop MapReduce fundamentals Writing a Hadoop MapReduce example Learning the different ways to write Hadoop MapReduce in R

Integrating R and Hadoop

Introducing RHIPE Introducing RHadoop

Using Hadoop Streaming with R

Understanding the basics of Hadoop streaming Understanding how to run Hadoop streaming with R Exploring the HadoopStreaming R package

Learning Data Analytics with R and Hadoop

Understanding the data analytics project life cycle Understanding data analytics problems

Understanding Big Data Analysis with Machine Learning

Introduction to machine learning Supervised machine-learning algorithms Unsupervised machine learning algorithm Recommendation algorithms

Importing and Exporting Data from Various DBs

Learning about data files as database Understanding MySQL Understanding Excel Understanding MongoDB Understanding SQLite Understanding PostgreSQL Understanding Hive Understanding HBaseCase StudiesUsing RHadoop to predict website visitors Analyzing Big Data with R and Hadoop Data Science using R and Hadoop Analytics Analyze text from social media sites Identifying user feedback in social data Identifying and analyzing errors in machine data