Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom...
Transcript of Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom...
![Page 1: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/1.jpg)
Course Details
BIG DATAHADOOP & SPARK DEVELOPMENT
[email protected] | www.acadgild.com | 90360 10796
![Page 2: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/2.jpg)
Brief About the CourseHadoop is considered as the most effective data platform for companies working with Big Data and is an integral part of storing, handling and Retrieving enormous amount of data in variety applications. In this course you will learn Hadoop Architecturein depth and also the key components oh Hadoop Ecosystem-Hive, Hbase, Sqoop, flume & pig.
01
![Page 3: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/3.jpg)
02
Who should take this courseAny graduate aiming to successfully build the career around Big Data can do this course. This course is beneficial for:
Software Developers and ArchitectsProfessionals with analytics and data management profileBusiness Intelligence ProfessionalsProject ManagersData ScientistsProfessionals with Business Intelligence, ETL and datawarehousing background
Professionals from testing and mainframes background.
![Page 4: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/4.jpg)
03
Solving Big data problem& Hadoop framework
SYLL
ABU
S
• Why is Data So Important?
• Pre-requisite – Data Scale
• What is Big Data?
• Big Bank: Big Challenge
• Common Problems
• 3 Vs Of Big Data
• Defining Big Data
• Sources Of Data Flood
• Exploding Data Problem
• Redefining The
Challenges Of Big Data
• Possible Solutions:
Scaling Up Vs. Scaling Out
• Challenges Of Scaling Out
• Solution For Data
Explosion-Hadoop
• Hadoop: Introduction
• Hadoop In Layman's Term
• Hadoop Ecosystem
• Evolutionary Features Of
Hadoop
• Hadoop Timeline
• Why Learn Big Data
Technologies?
• Who Is Using Big Data?
• HDFS: Introduction
• Design Of HDFS
• HDFS Blocks
• Components Of Hadoop 1.X
• NameNode And Hadoop
Cluster
• Arrangement Of Racks
• Arrangement Of Machines
And Racks
• Local FS And HDFS
Day 1 2 Hours
![Page 5: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/5.jpg)
04
HDFS
• NameNode
• Checkpointing
• Replica Placement
• Benefits-Replica Placement And
Rack Awareness
• URI
• URL And URN
• HDFS Commands
• Problems With HDFS In
Hadoop 1.X
• HDFS Federation (Included In
Hadoop 2.X)
• HDFS Federation
• High Availability, Anatomy Of
File Read From HDFS
• Data Read Steps
• Important Java Classes To Write
Data To HDFS
• Anatomy Of File Write To HDFS
• Writing File To HDFS: Steps
Day 2 2 Hours
![Page 6: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/6.jpg)
05
Exploring MapReduce
• Building Principles
• Introduction To MapReduce
• MR Demo
• Pseudo Code
• Mapper Class
• Reducer Class
• Driver Code
• InputSplit
• InputSplit And Data Blocks –
Difference
• Why Is The Block Size 128 MB?
• RecordReader
• InputFormat
• Default Inputformat : TextIn
putFormat
• InputFormat
• OutputFormat
• Using A Different
OutputFormat
• Important Points
• Partitioner
• Using Partitioner
• Map Only Job
• Flow Of Operations In
MapReduce
Day 3 2 Hours
![Page 7: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/7.jpg)
06
Schedulers in YARN & Introduction to Pig
• Serialization In MapReduce
• Custom Writable In MapReduce
• Custom WritableComparable In
MapReduce
• Schedulers In YARN
• FIFO Scheduler
• Capacity Scheduler
• Fair Scheduler
• Differences Between Hadoop
1.X And Hadoop 2.X
• Introduction to Apache Pig
• Why Pig?
• Apache Pig Architecture
• Simple Data Types
• Complex Data Types
Day 4 2 Hours
![Page 8: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/8.jpg)
07
Exploring Pig
• Sample Execution
• Pig Operators demo
• Parameter Substitution
• Macros
• Anatomy Of Reduce-Side-Join
• Job Optimizations In Pig
• UDF's in Pig
•Execution Of XML and CSV Files
In Pig
Day 5 2 Hours
![Page 9: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/9.jpg)
08
Hive Introduction
• Introduction
• Hive DDL
• Demo: Databases.Ddl
• Demo: Tables.Ddl
• Hive Views
• Demo: Views.Ddl
• Architecture
• Primary Data Types
• Data Load
• Demo: ImportExport.Dml
• Demo: HiveQueries.Dml
• Demo: Explain.Hql Table Types
• Demo: ExternalTable.Ddl
• Complex Data Types
• Demo: Working With Complex
Datatypes
• Hive Variables
• Demo: Working With Hive
Variables
• Hive Variables And Execution
Customisation
Day 6 2 Hours
![Page 10: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/10.jpg)
09
Hive Operations
Day 7 2 Hours
• Working With Arrays
• Sort By And Order By
• Distribute By And Cluster By
• Partitioning
• Static And Dynamic Partitioning
• Bucketing Vs Partitioning
• Joins And Types
• Bucket-Map Join
• Sort-Merge-Bucket-Map Join
• Left Semi Join
• DDemo: Join Optimisations
• Input Formats In Hive
• Sequence Files In Hive
• RC File In Hive
• File Formats In Hive
• ORC Files In Hive
• Inline Index In ORC Files
• ORC File Configurations
In Hive
![Page 11: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/11.jpg)
10
Advanced Hive
Day 8 2 Hours
• SerDe In Hive
• Demo: CSVSerDe
• JSONSerDe
• RegexSerDe
• Analytic And Windowing In Hive
• Demo: Analytics.Hql
• Hcatalog In Hive,
• Demo: Using_HCatalog
• Accessing Hive With JDBC
• Demo: HiveQueries.Java
• HiveServer2 And Beeline
• Demo: Beeline
• UDF In Hive
• Demo: ToUpper.Java And
Working_with_UDF
• Optimizations In Hive
• Demo: Optimizations
![Page 12: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/12.jpg)
11
HBase
• Challenges With Traditional RDBMS
• Features Of NoSQL Databases
• NoSQL Database Types
• CAP Theorem
• What Is HBase Regions
• HBase HMaster ZooKeeper
• HBase First Read
• HBase Meta Table
• Region Split
• Apache HBase Architecture Benefits
• HBase Vs. RDBMS
• Shell Commands
Day 9 2 Hours
![Page 13: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/13.jpg)
Oozie and Sqoop
• Introduction To Oozie
• Oozie Architecture
• Oozie Workflow Nodes
• Oozie Server
• Oozie Workflow
• Sqoop Architecture
• Sqoop Features
Day 10 2 Hours
12
Sqoop contd. & Apache Flume
• Sqoop Hands On
• Flume: Introduction
• Flume Architecture
• Example Description
• Transactions
• Batching
• Partitioning
• Exec Source
• Spooling Directory Source
• File Channel
• Memory Channel
• Logger Sink
• HDFS Sink
Day 11 2 Hours
![Page 14: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/14.jpg)
13
Project - 1 & Introduction toScala - Session I
• Project Discussion
• Introduction to Function Pro
gramming Language and Scala
• Functional vs OOP
• Variable
• Functions
• Using if
• while to define logic
• Loops in scala
• Collections in scala
Day 12 2 Hours
![Page 15: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/15.jpg)
14
Scala - Session II
• Object Oriented Programming
• Classes and Objects
• Traits in Scala
• Constructors in Scala
• Method Overloading
• Implicit parameter usage
Day 13 2 Hours
Scala - Session III
• Inheritance - OOP
• Override modifier
• Polymorphism
• Invoking superclass methods
• Final members
• Traits in detail
Day 14 2 Hours
![Page 16: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/16.jpg)
15
Scala - Session IV
• Control Structures in detail
• Exception Handling
• Coding without break and continue
• Coding the functional way
• Case classes in Scala
• Implicit conversions
• Parameter in depth
Day 15 2 Hours
Introduction to Apache Spark
• Introduction to Apache Spark
• Map Reduce Limitations
• RDD's
• Spark Context - SQLContext
and HiveContext
Day 16 2 Hours
![Page 17: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/17.jpg)
16
RDD's in Spark
• Programming with RDD's
• Creating RDD's from text-files
• Transformations and Actions
• How does spark execution work
• RDD API's - filter
• flatMap
• fold
• foreach
• glom
• groupBy
• map
• reduceByKey
• zip
• persist
• unpersist
• Read/Write from storage
• RDD Examples
Day 17 2 Hours
![Page 18: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/18.jpg)
17
RDD's contd. & Introduction toDataframes
• RDD API's - aggregate
• cartesian
• checkpoint
• coalesce
• reparition
• cogroup
• collectAsMap
• combineByKey
• count and countApprox
functions
• More RDD Examples
• Schema - StructType
• StructFields
• DataType
• DataFrame API's and
examples
Day 18 2 Hours
![Page 19: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/19.jpg)
18
Spark SQL
• Create temporary tables
• SparkSQL
• Parquet vs Avro
• Examples and problem
solving on real data using
RDD and converting the
same to Dataframe
Day 19 2 Hours
Spark Streaming
• Demo: Spark Streaming Example
Day 20 2 Hours
![Page 20: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/20.jpg)
19
ML-lib and GraphX
• Spark ML-lib • GraphX
Day 21 2 Hours
Deploying aSpark application
• Create a Spark project
• SBT / Maven
• How do maven repo work
Day 22 2 Hours
![Page 21: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/21.jpg)
20
Project Demo From Hadoop
• Demo: Music data analysis using
Hadoop
Day 23 2 Hours
Project II
• Final project discussion
Day 24 2 Hours
![Page 22: Bigdata Hadoop and Spark Development - Acadgild · • Custom Writable In MapReduce • Custom WritableComparable In MapReduce • Schedulers In YARN • FIFO Scheduler • Capacity](https://reader031.fdocuments.in/reader031/viewer/2022042001/5e6dc74d2ffa9b486171df63/html5/thumbnails/22.jpg)
[email protected] | www.acadgild.com | 90360 10796