Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of...

33
NAGARJUNA COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER SCIENCE AND ENGINEERING Big Data (16CSI732) By, Ms. Priyanka k Asst.Professor,CSE, NCET

Transcript of Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of...

Page 1: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

NAGARJUNA COLLEGE OF

ENGINEERING ANDTECHNOLOGY

COMPUTER SCIENCE AND

ENGINEERING

Big Data (16CSI732)

By,

Ms. Priyanka k

Asst.Professor,CSE,

NCET

Page 2: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

MODULE I

OVERVIEW OF BIGDATA

Page 3: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 1

INTRODUCTION: DEFINING BIG DATA

Page 4: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

WHAT IS BIG DATA?

‘Big Data’ is similar to ‘small data’, but bigger in size

but having data bigger it requires different approaches: –

Techniques, tools and architecture an aim to solve new problems or

old problems in a better way

Big Data generates value from the storage and processing of very

large quantities of digital information that cannot be analyzed with

traditional computing techniques.

Walmart handles more than 1 million customer transactionsevery

hour.

Facebook handles 40 billion photos from its user base.

Decoding the human genome originally took 10years to process;

now it can be achieved in one week.

Page 5: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

WHY BIG DATA?

G r o w t h o f B i gD a t a i s n e e d e d

– Increaseofs toragecapaci t ies

– Increaseofprocess ingpower

–Availabilityofdata( differentdatatypes)

– E v e r yd a y w e c r e a t e 2 . 5 q u i n t i l i o n b y t e s o f

data; 90 % o f t h e d a t a i n t h e w o r l d t o d a y h a s b e e n

c r e a t e d i n t h e l a s t t w o y e a r s a l o n e

FBge ne r a t e s 1 0TBd a i l y

Tw i t e rg e n e r a t e s 7 TB o f d a t a D a i l y

IBM claims 90 % of t oday’s s to r edd a ta w a s

g e n e r a t e d i n j u s t t h e l a s t t w o ye a r s

Page 6: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

HOW IS BIG DATADIFFERENT?

Automatically generated by a machine (e.g. Sensor embedded inan

engine)

Typically an entirely new source of data (e.g. Use of the internet)

Not designed to be friendly (e.g. Text streams) 4) May not have

much values

Need to focus on the important part

Page 7: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 2

BIG DATATYPES

Page 8: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

THREE CHARACTERISTICS OF BIG

DATAV3S

Volume: dataquantity

Velocity: dataspeed

Variety: datatypes

Page 9: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

1ST CHARACTER OF BIG DATA

VOLUME

A typical PC might have had 10 gigabytes of storage in 2000.

Today, Facebook ingests 500 terabytes of new data everyday.

Boeing 737 will generate 240 terabytes of flight data during a single

flight across the US.

The smart phones, the data they create and consume; sensors

embedded into everyday objects will soon result in billions of new,

constantly-updated data feeds containing environmental, location,

and other information, including video.

Page 10: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

2ND CHARACTER OF BIG DATA

VELOCITY

Clickstreams and ad impressions capture user behavior atmillions

of events per second

High-frequency stock trading algorithms reflect market changes

within microseconds

Machine to machine processes exchange data between billions of

devices

Infrastructure and sensors generate massive log data in realtime

On-line gaming systems support millions of concurrent users, each

producing multiple inputs per second.

Page 11: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

3RD CHARACTER OF BIG DATA

VARIETY

Big Data isn't just numbers, dates, and strings. Big Data is also

geospatial data, 3D data, audio and video, and unstructured text,

including log files and social media.

Traditional database systems were designed to addresssmaller

volumes of structured data, fewer updates or a predictable,

consistent data structure.

Big Data analysis includes different types ofdata

Page 12: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

THE STRUCTURE OF BIG DATA

Structured : Most traditional datasources

Semi-structured :Many sources of bigdata

Unstructured : Video data, audiodata

Page 13: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 3

BIG DATAANALYTICS

Page 14: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

BIG DATAANALYTICS

Examining large amount ofdata

Appropriateinformation

Identification of hidden patterns, unknowncorrelations

Competitiveadvantage

Better business decisions: strategic andoperational

Effective marketing, customer satisfaction, increased revenue

Page 15: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

TYPES OF TOOLS USED IN BIG-DATA

Where processing is hosted? – Distributed Servers / Cloud(e.g.

Amazon EC2)

Where data is stored? – Distributed Storage (e.g. Amazon S3)

What is the programming model? – Distributed Processing (e.g.

MapReduce)

How data is stored & indexed? – High-performance schema-free

databases (e.g. MongoDB)

What operations are performed on data? – Analytic / Semantic

Processing

Page 16: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

APPLICATION OF BIG DATA

ANALYTICS

Smarter Healthcare

Homeland Security

Traffic Control

Manufacturing

Multi-channel sales

Telecom

TradingAnalytics

Searchquality

Page 17: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

RISKS OF BIG DATA

Will be sooverwhelmed

Need the right people and solve the rightproblems

Costs escalate toofast

Isn’t necessary tocapture 100%

Many sources of big data isprivacy

Self-regulation

Legalregulation

Page 18: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 4

INDUSTRY EXAMPLES OF BIGDATA

Page 19: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LEADING TECHNOLOGY VENDORS

Example Vendors

-IBM – Netezza

-EMC – Greenplum

-Oracle – Exadata

Commonality

-MPParchitectures

-Commodity Hardware

-RDBMS based

-Full SQL compliance

Page 20: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 5

INDUSTRY EXAMPLES OF BIGDATA

Page 21: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

HOW BIG DATA IMPACTS ONIT

Big data is a troublesome force presenting opportunitieswith

challenges to IT organizations.

By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself

India will require a minimum of 1 lakh data scientists in the next

couple of years in addition to data analysts and data managers to

support the Big Data space.

Page 22: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 6

BENEFITS OF BIG DATA

Page 23: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

BENEFITS OF BIG DATA

Real-time big data isn’t just a process for storing petabytes or exabytes

of data in a data warehouse, It’s about the ability to make better

decisions and take meaningful actions at the right time.

Fast forward to the present and technologies like Hadoop give you the

scale and flexibility to store data before you know how you are going

to process it.

Technologies such as MapReduce,Hive and Impala enable you torun

queries without changing the data structures underneath.

O u r n ew es t re s ea rch f inds that organizations are u s i ng b ig

d a ta t o t a rgetcu s to me r- cen t r i c o u tco me s , t ap in to in t e rn a

l d a t a a n d b u i l d a b e t e r i n f o r m a t i o n e c o s ys t e m .

B i g D a t a i s a l r e a d y a n i m p o r t a n t p a r t o f t h e $ 64 b i l i o n

d a t a b a s e a n d d a t a a n a l yt i c s m a r k e t

I tofferscommercialop o r t u n i t i e so f a co m p ar a b l e s c a l e t o

e n t e r p r i s e s o f t w a r e i n t h e l a t e 1980 s

Page 24: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 7

CROWD SOURCING ANALYTICS

Page 25: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

CROWD SOURCE ANALYTICS

Crowd sourcing is the act of taking a job traditionally performed by

a designated agent (usually an employee) and outsourcing it to an

undefined, generally large group of people in the form of an open

call .

Term coined by jeffhowe

When it actuallystart?

Wikipedia

Rent- A-coder

Amazon mechanical turk

Page 26: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

APPLICATIONS OF CROWD

SOURCING

Open sourcedsoftware(linux)

Productvoting(threadless)

Make a commercial(chipolte androckband)

Suggestionbox

Page 27: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

CROWDSOURCING: THE BENEFITS

CompaniesGet

– Improved quality and productivity

– Feedback

– Good Exposure

– Minimum of Cost

Page 28: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

CROWDSOURCING: THE BENEFITS

People Get

– Incentive

Cash Cash Cash

– Recognition

Sense of accomplishment amongpeers

– Make Life Better

Linux

Obama Campaign

Netflix (Netflix Prize)

Cisco (I-Prize)

Steve Fossett

Page 29: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

GOOD RULES FOR CROWDSOURCING

Be Focused

Get Your FiltersRight

Tap the Right Crowds

Build Community Into Social Networks

Page 30: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

CONCLUSION

Crowdsourcing usedproperly

– Generates New Ideas

– Cuts Development Costs

–Creates a Direct, Emotional, bond with customers

Used Improperly

– Can Produce Useless Wasteful Results

–Beware of Mob Rule “Crowds can be wise, but they can also be

stupid.

Page 31: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

LECTURE 8

INDIAN BIG DATACOMPANIES

Page 32: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

INDIA – BIG DATA

Gaining attraction

Huge market opportunities for IT services (82.9% of revenues) and

analytics firms (17.1 % )

Current market size is $200 million. By 2015 $1billion

The opportunity for Indian service providers lies in offeringservices

around Big Data implementation and analytics for global

multinationals

Page 33: Big Data (16CSI732)...1ST CHARACTER OF BIG DATA VOLUME A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data everyday. …

FUTURE OF BIG DATA

$15 billion on software firms only specializing in datamanagement

and analytics.

This industry on its own is worth more than $100 billion and

growing at almost 10% a year which is roughly twice as fast as the

software business as a whole.

In February 2012, the open source analyst firm Wikibon releasedthe

first market forecast for Big Data , listing $5.1B revenue in 2012

with growth to $53.4B in 2017

The McKinsey Global Institute estimates that data volume is

growing 40% per year, and will grow 44x between 2009 and 2020.