Big data

IN THE NAME OF GOD

BIG DATA ANALYTICS HADOOP AND CASSANDRA

Author: Samira Riki

A airline jet collect 10 terabytes of sensor data

for every 30 minutes of flying time.

NYSE generates about one terabyte of new trade

data per day to perform stock trading analytics to

determine trends for optimal trades.

Twitter has over 500 milion registered users.

79% of US Twitter users are more likely to buy from brands

they follow.

67% of US Twitter users are more likely to buy from brands

they follow.

57% of all companies that use social media for business use

Twitter.

“Big Data is the frontier of a firm's ability to

store, process, and access (SPA) all the data

it needs to operate effectively, make

decisions, reduce risks, and serve

customers.”

... How big is BIG?

Let’s look at

Big Data

in a different way…

Byte : one grain of rice

Kilobyte

Kilobyte : cup of rice

Megabyte

Megabyte : 8 bags of rice

Gigabyte

Gigabyte : 3 Semi trucks

Terabyte

Terabyte : 2 Container Ships

Petabyte

Petabyte : Blankets Manhattan

One Byte Exabyte

Exabyte : Blankets west coast states

Zettabyte : Fills the Pacific Ocean

Zettabyte

Yottabyte : A EARTH SIZE RICE BALL! Yottabyte

Hobbyist Byte : one grain of rice

Yottabyte : A EARTH SIZE RICE BALL!

Desktop

Hobbyist Byte : one grain of rice

Desktop

Hobbyist

Internet

Desktop

Hobbyist

Internet

Big Data

Desktop

Hobbyist

The Future?

Internet

Big Data

Process data in parallel? -not simple

An idea: parallelism

A problem: Parallelism is Hard

Synchronization

Deadlock

Limited bandwidth

Timing issues and co-ordination

Split and Aggregation

Coputer are complicate

Driver failure

Data availability

Hey! We have Distributed computing!!!

Yes,we have distributed computing and it also come up with

some challenges

Resource sharing

Concurrency

Fault tolerance

Heterogeneity

Transparency

To address most of these challenges(but not all) Hadoop

come in.

Hadoop origin

• An Elephant can’t jump.But can carry heavy load!!!

• Apache Haddop is a framework that allows for the distributed

processing of large data sets across clusters of commodity

computers using a simple programming model.it is designed to scale

up from single servers to thousands of machines,each providing

computation and storage.

• Hadoop is an open-source implementation of Google

MapReduce,GFS(distributed file system).

• Hadoop was created by Doug Cutting the creator of Apache

Lucene,the widely used text search library.

Hadoop Architecture

Hadoop designed and built on two independent frame works.

Hadoop= HDFS + Map reduce

HDFS(Storage and File system):HDFS is a reliable distributed file system

that provides high-throughput access to data.

MapReduce(processing):MapReduce is a framework for performing high

performance distributed data processing using the divide and aggregate

programming paradigm.

Hadoop has a master/slave architecture for both storage and

processing.

Hadoop Master and Slave Architecture

The components of HDFS are

Name Node

Data Node

Secondary Name Node

The components of MapRedeuce are:

Job Tracker

Task Trackers

Who uses Hadoop?

Amazon/A9

Facebook

Google

Last.fm New York Times

PowerSet

Yahoo!

Twitter

Cassandra

• Apache Cassandra is an open source distributed database

management system designed to handle large amounts of data

across many commodity servers, providing high availability with no

single point of failure. Cassandra offers robust support for clusters

spanning multiple datacenters.

Main features

Cassandra places a high value on performance.

In 2012, University of Toronto researchers studying NoSQL systems concluded that "In terms of scalability, there is a clear winner throughout our experiments.

Decentralized

Supports replication and multi data center replication

Scalability

Fault-tolerant

Query language

MapReduce support

The data model

New use cases

• Geographic data

• Weather data

• Rfid

• Travel schedules

• Hotel reservation

Big Data isn’t big,

if you know how to

use it.

References

1.Big data:the next frontier for innovation,competition

and productivity-McKinsy&company

2. Big Data Meets Big Data Analytics-SAS Company

3. Big data tutorial-Marko Grobelnik

4. Big Data Spectrum

Big data

Engineering

Transcript of Big data

Big Data, Big Commerce, Big Challenge

Big Data Visualization: Turning Big Data into Big Insights · PDF fileWhite Paper Big Data Visualization: Turning Big Data Into Big Insights The Rise of Visualization-based Data Discovery

Unite and Free your Data Making Big Data Big …files.meetup.com/14077672/WiDB - Making Big Data Big...Unite and Free your Data Making Big Data Big Business East Coast Chapter Launch

Met Big bezig? DATA and Analytics Presentati… · 2 Presentatie Big Data: BIG challenge, BIG threat, BIG value or…. BIG hype? © AllRightsReserved2016 (BIG) Data…… What is

Introduction to Big Data, Big Data Processing, and Big ...cis.csuohio.edu/~sschung/CIS660/Lecture1_IntroBigDataAnalyrics.pdf · What’s Big Data? From Wikipedia: • Big data is

Big Data Madison: Architecting for Big Data

Informatica Big Data Management - Meetup › 16208282 › Big Data Management... · 2016-04-15 · Big Data = Big Opportunity Sources: Informatica Big Data Survey, March 2012 Cisco,

Big Vulnerabilities + Big Data = Big Intelligence

Big Data, Big Risks – Simplify Big Data Security & Management | Vormetric

MSA220/MVE440 Statistical Learning for Big Data - Lecture 1 · 2018. 3. 19. · Statistical Learning for Big Data Big Data BIG DATA: can’t t on a HD Big Data: 10Gb+1Tb big data:

Big Data Technology Big Data - aakritsubedi9.com.npaakritsubedi9.com.np/files/Big Data Technology.pdf · Big Data Technology Big Data 1"Big data" is a field that treats ways to analyze,

2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.

Big Data Meets Big Data Analytics

Big Data, künstliche Intelligenz und Data Analytics · Big Data, künstliche Intelligenz, Machine Learning, Data Analytics & Co. How big is big? Big Data in der Versicherung sind

Big Data is Big Business: How to Develop Big Data Competence

การประยุกต์ใช้ Big Data · การประยุกต์ใช้ Big Data ในการบริหารจัดการฐานข้อมูลทางด้าน

2016 Big Data For Beginners Understanding SMART Big Data, Data Mining & Data Analytics2016 Big Data for Beginners Understanding SMART Big Data, Data Mining & Data Analytics

BIG DATA, BIG INNOVATIONS - Data Storage, … · BIG DATA, BIG INNOVATIONS ... before possible with traditional business intelligence and data warehouse ... desired data sets needed

Big Data and Business Analytics: The Engine of Digital ... · Enterprise Big Data Strategy . BIG DATA MANAGEMENT . BIG DATA ANALYTICS . BIG DATA APPLICATIONS . BIG DATA INTEGRATION

Introduction to Big Data, Big Data Processing, and Big ...eecs.csuohio.edu/~sschung/CIS660/Lecture1_IntroBig... · What’s Big Data? From Wikipedia: • Big data is the term for