Big data road map

48
WDABT 2016 – BHARATHIAR UNIVERSITY

Transcript of Big data road map

Page 1: Big data road map

WDABT 2016 – BHARATHIAR UNIVERSITY

Page 2: Big data road map

Dr.V.BhuvaneswariAssistant Professor

Department of Computer ApplicationsBharathiar University

[email protected], [email protected]

visit at www.budca.in/faculty.php

BIG DATA ROADMAP

Page 3: Big data road map

3

Big Data RoadmapTimeline – Big Data PredictionsData Growth in UnitsData LandscapeData ExplosionBig Data MythsBig Data 5Vs of Big Data Why Big DataData as Data Science

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 4: Big data road map

4

Timeline – Big Data Predictions1944- Yale Library in 2040 will have “approximately

200,000,000 Volumes1961- Scientific Journals will grow exponentially

rather than linearly, doubling every fifteen years and increasing by a factor of ten during every half-century.

1975- Ministry of Posts and Telecommunications in Japan introduced words as unifying unit of measurement

1997- First article published by Michael Cox and David Ellsworth in in the ACM digital library to the term “Big data.”

Big Data evolved in 1997 and exploded to greater heights in 2010 and become popular in 2012Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 5: Big data road map

5

Data Growth – in Units

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 6: Big data road map

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

6

Data Landscape

Page 7: Big data road map

7

BIG DATA FACTSEvery 2 days we create as much

information as we did from the beginning of time until 2003

Over 90% of all the data in the world was created in the past 2 years.

It is expected that by 2020 the amount of digital information in existence will have grown from 3.2 zettabytes today to 40 zettabytes.

Every minute we send 204 million emails, generate 1.8 million Facebook likes, send 278 thousand Tweets, and up-load 200,000 photos to FacebookDr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 8: Big data road map

Big Data Explosion

12+ TBs of tweet data

every day

25+ TBs of

log data every day

? TB

s of

data

ev

ery

day

2+ billion people on the

Web by end

2011

30 billion RFID tags today

(1.3B in 2005)

4.6 billion camera phones

world wide

100s of million

s of GPS

enabled

devices sold

annually

76 million smart meters

in 2009… 200M by 2014

Page 9: Big data road map

Data Deluge

Page 10: Big data road map

Big Data Market Size

Page 11: Big data road map

11

Potential Talent Pool -Big Data

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

India will require a minimum of 1 lakh data scientists in the next couple of years in addition to data analysts and data managers to support the Big Data space.

Page 12: Big data road map

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

12

BIG DATA MYTHSBig Data • New• Only About Massive Data Volume• Means Hadoop• Need A Data Warehouse• Means Unstructured Data• for Social Media & Sentiment Analysis

Page 13: Big data road map

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

13

Lets Us Clarify

Page 14: Big data road map

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

14

Big DataBig Data isA complete subject with tools,

techniques and frameworks.Technology which deals with large and

complex dataset which are varied in data format and structures, does not fit into the memory.

Not about huge volume of data; provide an opportunity to find new insight into the existing data and guidelines to capture and analyze future data

Page 15: Big data road map

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

15

Big Data : A DefinitionBig data is the realization of

greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies

:Source: Harness the Power of Big Data: The IBM Big Data Platform

Page 16: Big data road map

16

BIG DATA as Platform

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar UniversitySource: IBM

Page 17: Big data road map

17

4 V‘s of Big Data

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 18: Big data road map

18

5Vs of Big DataVolumeVelocityVarietyVeracityValue

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 19: Big data road map

19

Why Big Data ?

Page 20: Big data road map

Big Data ExplorationFind, visualize, understand all big data to improve decision making

Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources

Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time

Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency

Operations AnalysisAnalyze a variety of machinedata for improved business results

The 5 Key Big Data Use Cases

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University2

0

Page 21: Big data road map

21Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 22: Big data road map

22

Data Science "Data Science" was used by

statisticians and economist in early 1970 and defined by Peter Naur in 1974.

Data Science” has gained popularity in the last couple of years because of the massive data deposits

Usage of Big Data technology to explore data used in large corporates, government and industries made the term data science catchy.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 23: Big data road map

23

Data Science as Discipline Data Science has emerged as a new discipline to

provide deep insight on the large volume of data.

Data Science is fusion of major disciplines like Computational Algorithms, Statistics and Visualization

90% of the world’s data has been created in the last two years which includes 10% of structured data and 80% of unstructured data

The digital universe is in data deluge and estimated to be larger than the physical universe and data unit measurement is predicted as Geopbytes

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 24: Big data road map

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

24

Page 25: Big data road map

25

Data Growth in Bytes

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 26: Big data road map

26

Data Classification◦Open Data◦Closed Data◦Hot Data◦Warm Data◦Cold Data◦Thin Data◦Thick Data

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 27: Big data road map

27

Data Analytics – Need for todayData considered as digital asset

similar to other property. The organizations believe data

generated by them will provide deep insights to understand their business process for arriving strategic decisions.

The earlier limitation of computational storage and processing is overcome by the technologies of cloud computing and big data techniques.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 28: Big data road map

28

Data Science Components

Pre-Processing -

ETL

Dash BoardsChartsPie, BarHistogram

Data ModelsLinear Regression, Decision Tree, Dimensionality Reduction

ClusteringOutlier AnalysisAssociation Analysis

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 29: Big data road map

29

Data Science - Big Data TechnologyCollect, Load, Transform

◦ETL SCRIBE, FLUMEStore

◦HADOOP, SPARK, STORMProcess, Analyze and Reasoning

◦Computational Algorithms,◦Statistical Methods and Models

R, PIG, HIVE, PHYTON, JAVA, SCALA, CLOJURE, MAHOUT

Visualization ◦DASHBOARD, APP

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 30: Big data road map

30

Data Science Vs Data Analytics Data Science is a discipline which

groups techniques and methods from various domains to study about data and data analytics is a component in Data Science.

Data Analytics is a process of analyzing the dataset to find deep insights of data using computational algorithms and statistical methods. There exists no common procedure to analyze all datasets

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 31: Big data road map

31

Data Analytics Vs Big Data Analytics

Data Analytics is used to explore and analyze datasets using statistical methods and models.

Big Data Analytics is used to analyze data with the characteristics of Volume, Velocity and Variety by integrating statistics, mathematics, computational algorithms in Big data Platform.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 32: Big data road map

32

Data Science – Emerging RolesData Scientist is responsible for scrubbing data to

bring out deep insights of dataSkills : Expert in CS, Mathematics, Statistics

Work on open ended research problems Data Engineer is responsible for managing and

administering the infrastructure and storage of data.

Skills : Strong skills in Programming and Software Engineering Deep Knowledge in Data warehousing Expertise in Hadoop, NOSQL and SQL technologies

Data Analyst is one who views the data from one source and has deep insight on the data based on the organization guidance. Skills : Competency Skills in understanding of Statistics

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 33: Big data road map

33

Data Analytics Use Case Scenario

Page 34: Big data road map

34

Data Science ApplicationsData Personalization - Logs, Tweets, LikesSmart Pricing – Air TransportationFinancial Services – Fraud Detection

InsuranceSmart Grids – Energy Management

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 35: Big data road map

35

Air Fare Management – Use case 1Objectives: Hike airfare based on High Value

Customers - CRM.Strategic decision requires Understanding of

data insights How customers are divided?Which customer is high value customer?Who is Frequent flyer?How to retain customers?

Data sources :Conventional Enterprise informationData from weblogs, social media, competitors pricing

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 36: Big data road map

36

Data Engineering Airfare Classification (Economy, Business,First)Analyse factors (Enterprise Datasources) – Data

Exploration techniquesPassenger Booking informationForecasted data - StatisticsInventory

Customers Behavioral data - Predictive Analytics – Statistical models – Decision tree, classification

Information has to be gained from websites thatprovide route information, dining, preferable

locationsHolistic Analytics

Analyzing customer data from Social profiles, sales, CRM etc.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 37: Big data road map

37

Complexities and ChallengesData is larger than terabytesData integration

Variety data formatsSolutionBig data Accelerators

Hadoop ecosystemAnalytic componentsIntegrated data warehouses

Source: Big data spectrum InfosysDr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 38: Big data road map

38

Insurance Fraud Detection – Use case ScenarioData Engineering

Verifying customer dataCustomer Profile analysisVerification of claims raisedFraud detection from disparate systemsExact claim reimbursement

Data Sources Data about customer, product sold from ERP,

CRM Credit history from other sources Data from social networking – Customer profiles,

product rating, credit rating from 3rd parties

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 39: Big data road map

39

Health EpidemicsData Engineering

Kind of epidemics and target usersCauses and effects with respect to locationsEnvironmental and other related issues of epidemicsData on Awareness

Data Sources EHR records, Medical Insurance claims, Socialmedia – awareness, ERP Systems

Data AnalyticsDescriptive Analytics

Predictive Analytics ( Model based analysis)

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 40: Big data road map

40

Big Data ChallengesPrivacy Protection

All Big data stages collect, store, process, knowledge

Integration with enterprise landscapeAll systems store data in rdbms,DWDoes not support bulk loading to Big data storeLimited number of analytics from MahoutBig data technologies lack visualization support and deliverable methods

Leveraging cloud computing for big data applications Addressing Real time needs with varied

format and volume Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 41: Big data road map

41

PART B : Big Data Use Cases – Scenario

Page 42: Big data road map

42

Big Data Applications

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 43: Big data road map

43

Big Data Applications - IndiaBig Data – ElectionsSBI uses big data mining to check

defaultsKarnataka Govt – Identify water

leakage

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 44: Big data road map

44

Big Data - Election Mined data from every Internet user in the

country, to accurately understand voter sentiments and local issues.

Data-based analysis was used to raise funds and create different models for different regions targeting on local issues.

India involve more than 800 million voters with different ideologies and expectations.

Innovative usage of Big Data marked a huge change in the way elections were fought traditionally.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 45: Big data road map

45

Data AnalyticsModac Analytics built electroal data.Processing huge volumes of

unstructured data (around 10TB of PDF documents), and also structured data.

Modak chose Hadoop, and self-built a 64-node cluster that had 128TB of storage. Apart from Hadoop, the team used PostgreSQL as the front-end database.

They have developed Rapid ETL to overcome the difficulties into hadoop.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 46: Big data road map

46

SBIState Bank of India (SBI) ran its newly

acquired data-mining software recently to check for purity of data.

Made an interesting find - close to one crore accountholders have not provided any nomination for their savings accounts. What is worse, over half of them are senior citizens.

To analyse trends in Banks, SBI has hired a whole team of statisticians and economists.

Identify default patterns, high value customers.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 47: Big data road map

47

QUERIES?

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

Page 48: Big data road map

48

THANK YOU

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University