Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at...

23
Course Details BIG DATA HADOOP & SPARK DEVELOPMENT enquiry @acadgild.com | ww w .acadgild.com | 8880025025

Transcript of Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at...

Page 1: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

Course Details

BIG DATAHADOOP & SPARK DEVELOPMENT

[email protected] | www.acadgild.com | 8880025025

Page 2: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

Brief About the CourseHadoop is considered as the most effective data platform for companies working with Big Data and is an integral part of storing, handling and Retrieving enormous amount of data in variety applications. In this course you will learn Hadoop Architecturein depth and also the key components oh Hadoop Ecosystem-Hive, Hbase, Sqoop, flume & pig.

01

Page 3: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

02

Who should take this courseAny graduate aiming to successfully build the career around Big Data can do this course. This course is beneficial for:

Software Developers and ArchitectsProfessionals with analytics and data management profileBusiness Intelligence ProfessionalsProject ManagersData ScientistsProfessionals with Business Intelligence, ETL and datawarehousing background

Professionals from testing and mainframes background.

Page 4: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

Why Learn1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook are hiring skilled professionals.

2. The average salary of a Big Data Developer is $82-100K.

3. A McKinsey Research Report on Big Data highlights that by end of 2018 the demand for analytics professionals in US is expected to be 60% higher than the anticipated supply.

4. A recent Dice survey reveals that 9 out of 10 high paid IT jobs require Big Data skills.

5. The Economic Times states that a programmer who knows what Hadoop is a hot commodity on the job circuit.

6. According to Glassdoor.com Hadoop is among the top 10 IT job trends in the market.

03

Page 5: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

04

Solving Big data problem& Hadoop framework

SYLL

ABU

S

• Why is Data So Important?

• Pre-requisite – Data Scale

• What is Big Data?

• Big Bank: Big Challenge

• Common Problems

• 3 Vs Of Big Data

• Defining Big Data

• Sources Of Data Flood

• Exploding Data Problem

• Redefining The

Challenges Of Big Data

• Possible Solutions:

Scaling Up Vs. Scaling Out

• Challenges Of Scaling Out

• Solution For Data

Explosion-Hadoop

• Hadoop: Introduction

• Hadoop In Layman's Term

• Hadoop Ecosystem

• Evolutionary Features Of

Hadoop

• Hadoop Timeline

• Why Learn Big Data

Technologies?

• Who Is Using Big Data?

• HDFS: Introduction

• Design Of HDFS

• HDFS Blocks

• Components Of Hadoop 1.X

• NameNode And Hadoop

Cluster

• Arrangement Of Racks

• Arrangement Of Machines

And Racks

• Local FS And HDFS

Day 1 2 Hours

Page 6: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

05

HDFS

• NameNode

• Checkpointing

• Replica Placement

• Benefits-Replica Placement And

Rack Awareness

• URI

• URL And URN

• HDFS Commands

• Problems With HDFS In

Hadoop 1.X

• HDFS Federation (Included In

Hadoop 2.X)

• HDFS Federation

• High Availability, Anatomy Of

File Read From HDFS

• Data Read Steps

• Important Java Classes To Write

Data To HDFS

• Anatomy Of File Write To HDFS

• Writing File To HDFS: Steps

Day 2 2 Hours

Page 7: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

06

Exploring MapReduce

• Building Principles

• Introduction To MapReduce

• MR Demo

• Pseudo Code

• Mapper Class

• Reducer Class

• Driver Code

• InputSplit

• InputSplit And Data Blocks –

Difference

• Why Is The Block Size 128 MB?

• RecordReader

• InputFormat

• Default Inputformat : TextIn

putFormat

• InputFormat

• OutputFormat

• Using A Different

OutputFormat

• Important Points

• Partitioner

• Using Partitioner

• Map Only Job

• Flow Of Operations In

MapReduce

Day 3 2 Hours

Page 8: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

07

Schedulers in YARN & Introduction to Pig

• Serialization In MapReduce

• Custom Writable In MapReduce

• Custom WritableComparable In

MapReduce

• Schedulers In YARN

• FIFO Scheduler

• Capacity Scheduler

• Fair Scheduler

• Differences Between Hadoop

1.X And Hadoop 2.X

• Introduction to Apache Pig

• Why Pig?

• Apache Pig Architecture

• Simple Data Types

• Complex Data Types

Day 4 2 Hours

Page 9: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

08

Exploring Pig

• Sample Execution

• Pig Operators demo

• Parameter Substitution

• Macros

• Anatomy Of Reduce-Side-Join

• Job Optimizations In Pig

• UDF's in Pig

•Execution Of XML and CSV Files

In Pig

Day 5 2 Hours

Page 10: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

09

Hive Introduction

• Introduction

• Hive DDL

• Demo: Databases.Ddl

• Demo: Tables.Ddl

• Hive Views

• Demo: Views.Ddl

• Architecture

• Primary Data Types

• Data Load

• Demo: ImportExport.Dml

• Demo: HiveQueries.Dml

• Demo: Explain.Hql Table Types

• Demo: ExternalTable.Ddl

• Complex Data Types

• Demo: Working With Complex

Datatypes

• Hive Variables

• Demo: Working With Hive

Variables

• Hive Variables And Execution

Customisation

Day 6 2 Hours

Page 11: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

10

Hive Operations

Day 7 2 Hours

• Working With Arrays

• Sort By And Order By

• Distribute By And Cluster By

• Partitioning

• Static And Dynamic Partitioning

• Bucketing Vs Partitioning

• Joins And Types

• Bucket-Map Join

• Sort-Merge-Bucket-Map Join

• Left Semi Join

• DDemo: Join Optimisations

• Input Formats In Hive

• Sequence Files In Hive

• RC File In Hive

• File Formats In Hive

• ORC Files In Hive

• Inline Index In ORC Files

• ORC File Configurations

In Hive

Page 12: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

11

Advanced Hive

Day 8 2 Hours

• SerDe In Hive

• Demo: CSVSerDe

• JSONSerDe

• RegexSerDe

• Analytic And Windowing In Hive

• Demo: Analytics.Hql

• Hcatalog In Hive,

• Demo: Using_HCatalog

• Accessing Hive With JDBC

• Demo: HiveQueries.Java

• HiveServer2 And Beeline

• Demo: Beeline

• UDF In Hive

• Demo: ToUpper.Java And

Working_with_UDF

• Optimizations In Hive

• Demo: Optimizations

Page 13: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

12

HBase

• Challenges With Traditional RDBMS

• Features Of NoSQL Databases

• NoSQL Database Types

• CAP Theorem

• What Is HBase Regions

• HBase HMaster ZooKeeper

• HBase First Read

• HBase Meta Table

• Region Split

• Apache HBase Architecture Benefits

• HBase Vs. RDBMS

• Shell Commands

• Hive Integration With

HBase

• Pig Integration With HBase

Day 9 2 Hours

Page 14: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

Oozie and Sqoop

• Introduction To Oozie

• Oozie Architecture

• Oozie Workflow Nodes

• Oozie Server

• Oozie Workflow

• Sqoop Architecture

• Sqoop Features

Day 10 2 Hours

13

Sqoop contd. & Apache Flume

• Sqoop Hands On

• Flume: Introduction

• Flume Architecture

• Example Description

• Transactions

• Batching

• Partitioning

• Exec Source

• Spooling Directory Source

• File Channel

• Memory Channel

• Logger Sink

• HDFS Sink

Day 11 2 Hours

Page 15: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

14

Project - 1 & Introduction toScala - Session I

• Project Discussion

• Introduction to Function Pro

gramming Language and Scala

• Functional vs OOP

• Variable

• Functions

• Using if

• while to define logic

• Loops in scala

• Collections in scala

Day 12 2 Hours

Project 1

Page 16: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

15

Scala - Session II

• Object Oriented Programming

• Classes and Objects

• Traits in Scala

• Constructors in Scala

• Method Overloading

• Implicit parameter usage

Day 13 2 Hours

Scala - Session III

• Inheritance - OOP

• Override modifier

• Polymorphism

• Invoking superclass methods

• Final members

• Traits in detail

Day 14 2 Hours

Page 17: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

16

Scala - Session IV

• Control Structures in detail

• Exception Handling

• Coding without break and continue

• Coding the functional way

• Case classes in Scala

• Implicit conversions and

Implicit

• Parameter in depth

Day 15 2 Hours

Introduction to Apache Spark

• Introduction to Apache Spark

• Map Reduce Limitations

• RDD's

• Spark Context - SQLContext

and HiveContext

Day 16 2 Hours

Page 18: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

17

RDD's in Spark

• Programming with RDD's

• Creating RDD's from text-files

• Transformations and Actions

• How does spark execution work

• RDD API's - filter

• flatMap

• fold

• foreach

• glom

• groupBy

• map

• reduceByKey

• zip

• persist

• unpersist

• Read/Write from storage

• RDD Examples

Day 17 2 Hours

Page 19: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

18

RDD's contd. & Introduction toDataframes

• RDD API's - aggregate

• cartesian

• checkpoint

• coalesce

• reparition

• cogroup

• collectAsMap

• combineByKey

• count and countApprox

functions

• More RDD Examples

• Schema - StructType

• StructFields

• DataType

• DataFrame API's and

examples

Day 18 2 Hours

Page 20: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

19

Spark SQL

• Create temporary tables

• SparkSQL

• Parquet vs Avro

• Examples and problem

solving on real data using

RDD and converting the

same to Dataframe

Day 19 2 Hours

Spark con�gurations

• Understanding spark configurations

better

• OpenSource Rest interfaces on top

of Spark (JobServer/Livy) .. We will

work with JobServer

• Demo of JobServer and its

usecase

Day 20 2 Hours

Page 21: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

20

Advanced Spark

• Accumulators, BroadCast Variables

• Query Execution Plan

• Internal of spark workings

• Spark Tuning - what should

your production

configuration be like

Day 21 2 Hours

Spark 2.X & Deployinga Spark application

• Spark 2.1.0 - what has changed

• Datasets, Create a Spark project.

SBT / Maven How do maven repo

work

• Creating and submitting an

application to

jobserver/livy

Day 22 2 Hours

Page 22: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

21

Spark Streaming & Project - II

• Spark Streaming • Project discussion

Day 23 2 Hours

Introduction to MLlib,GraphX & Project - II contd.

• Spark ML-lib

• Spark GraphX

• Project discussion contd.

Day 24 2 Hours

Page 23: Bigdata Hadoop and Spark Development1 Hadoop and...Why Learn 1. The Big Data Domain is growing at 10% every year and companies like Yahoo, Apple, eBay, Hortonworks, Walmart and Facebook

[email protected] | www.acadgild.com | 8880025025