Spark Will Replace Hadoop ! Know Why

Post on 07-Aug-2015

466 views 0 download

Transcript of Spark Will Replace Hadoop ! Know Why

http://www.edureka.co/apache-spark-scala-training

Spark will replace Hadoop ! Know Why ?

Slide 2Slide 2Slide 2 http://www.edureka.co/apache-spark-scala-training

At the end of the session, you will be able to:

Understand Why Learn Spark?

Know Advantages of Spark & its Survey for 2015

Discover Spark Career Path

Understand how Companies are using Spark?

Agenda

Slide 3Slide 3Slide 3 http://www.edureka.co/apache-spark-scala-training

Why Spark?

Slide 4Slide 4Slide 4 http://www.edureka.co/apache-spark-scala-training

Rise of Big Data

By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB)

The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on

Earth.

0

1000

2000

3000

4000

5000

6000

7000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Unstructured Data

Structured Data Un-structured Data

Slide 5Slide 5Slide 5 http://www.edureka.co/apache-spark-scala-training

Application of Big Data

Source: Twitter

Slide 6Slide 6Slide 6 http://www.edureka.co/apache-spark-scala-training

Application of Big Data

Slide 7Slide 7Slide 7 http://www.edureka.co/apache-spark-scala-training

Hadoop is not Enough!

Limitations:

Conclusion:

Real-time Processing

Not Fast Enough

Hadoop MapReduce is Limited to Batch Processing. Real-time processing was a big “No” in Hadoop

Hadoop MapReduce is fast but not fast enough

It is essential and can be achieved using Spark!

Slide 8Slide 8Slide 8 http://www.edureka.co/apache-spark-scala-training

Spark Survey and its Advantages

Slide 9Slide 9Slide 9 http://www.edureka.co/apache-spark-scala-training

Spark Survey 2015!

Source: Typesafe

Slide 10Slide 10Slide 10 http://www.edureka.co/apache-spark-scala-training

Advantages of Spark

Ease of Use

Generality

Runs Everywhere

100x faster than MR

Slide 11Slide 11Slide 11 http://www.edureka.co/apache-spark-scala-training

Feature Comparision

Fast 100x faster than MapReduce

Batch Processing Batch and Real-time Processing

Stores Data on Disk Stores Data in Memory

OpenSource OpenSource

Written in Java Written in Scala

Hadoop MapReduce HADOOP Spark

Source: Databrix

Slide 12Slide 12Slide 12 http://www.edureka.co/apache-spark-scala-training

Spark Features/Modules in Demand

Source: Typesafe

Slide 13Slide 13Slide 13 http://www.edureka.co/apache-spark-scala-training

New Features in 2015

Data Frames

• Similar API to data frames in R and Pandas• Automatically optimised via Spark SQL• Released in Spark 1.3

SparkR

• Released in Spark 1.4• Exposes DataFrames, RDD’s & ML library in R

Machine Learning Pipelines

• High Level API• Featurization• Evaluation • Model Tuning

External Data Sources

• Platform API to plug Data-Sources into Spark• Pushes logic into sources

Source: Databrix

Slide 14Slide 14Slide 14 http://www.edureka.co/apache-spark-scala-training

Spark Career Path

Slide 15Slide 15Slide 15 http://www.edureka.co/apache-spark-scala-training

Job Roles & Industry Focus

Source: Typesafe

Slide 16Slide 16Slide 16 http://www.edureka.co/apache-spark-scala-training

Salary Trends

Slide 17Slide 17Slide 17 http://www.edureka.co/apache-spark-scala-training

Major Companies Using Hadoop

Slide 18Slide 18Slide 18 http://www.edureka.co/apache-spark-scala-training

Industry Adoption

Source: Typesafe

Slide 19Slide 19Slide 19 http://www.edureka.co/apache-spark-scala-training

How Companies are using Spark?

Slide 20Slide 20Slide 20 http://www.edureka.co/apache-spark-scala-training

General Business Goals

Source: Typesafe

http://www.edureka.co/apache-spark-scala-training

Demo

Slide 22Slide 22Slide 22 http://www.edureka.co/apache-spark-scala-training

The Big Question!

Is Spark going to replace Hadoop?

Slide 23Slide 23Slide 23 http://www.edureka.co/apache-spark-scala-training

The Big Question!

Is Spark going to replace Hadoop?

Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce

Reasons:

1. Hadoop MapReduce cannot handle real-time processing 2. Hadoop MapReduce is slower than Hadoop Spark3. With rise of IOT, Spark is a must

Questions

Slide 24 http://www.edureka.co/apache-spark-scala-training

Slide 25

Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better!

Please spare few minutes to take the survey after the webinar.

http://www.edureka.co/apache-spark-scala-training

Survey