Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai - Dec 2013

17
LOGO www.unicomlearning.com Manoj Kolhe Project Lead, Testing Service Line, L&T Infotech India Testing Week 2013 Big Data Testing December 10, 2013 Mumbai www.unicomlearning.com www.nextgentesting.org

Transcript of Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai - Dec 2013

Page 1: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

LOGO

www.unicomlearning.com

Manoj KolheProject Lead, Testing Service Line, L&T Infotech

India Testing Week 2013

Big Data Testing

December 10, 2013Mumbai

www.unicomlearning.com

www.nextgentesting.org

Page 2: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Introduction to Big Data

Web Logs

Social N/W Data

Transactional Data

Statistical Data

Database Cluster

Database of Databases

Big Data

BI Reporting

Trend Analysis

Decision Making

Page 3: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Wikipedia:

Big Data term is applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process within a tolerable elapsed time.

Gartner -

Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization

Gartner

..the real issue is making sense of big data and finding patterns in it that help organizations make better business decisions. “look for patterns that support business decisions in what we call Pattern-Based Strategy”

Forrester

Opportunities to improve the bottom line exist in a flood of information; however, gaining insight from data becomes challenging as it grows extremely large. Emerging technology applies the power of distributed, virtual computing to the problem of large data,

SAS

With Big Data Analytics, organizations can make better and faster decisions based not only on what has happened, but what will happen next. They can also predict the best possible outcome while remaining agile in swiftly changing times

Defining Big Data

Page 4: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

How Big is Big data?

175 million tweets a

day

465 million user

accounts

30+ PB User Data

100TB daily uploads

50 billion user photos

20+ PB Daily

Data

300 billion videos

Total runtime 47 million years

48 hrs of video each minute of

the day

5 billion users are

calling, texting,

browsing data

2.9 millions of mails

exchanged every

second

1.3 Exabyte's ~1018

bytes

Popular 5Vs:

Big Data

• Volume• Variety• Velocity• Viability • Value

90% of the data in the world today has been created in the last two years alone

Page 5: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

Market Statistics & Applications of Big Data

www.unicomlearning.com

www.nextgentesting.org

Page 6: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Big Data – Use Case

Case Study – Telematics Domain

Objectives

Create a state-of-the-art analytic environment to support and fuel fast growing telematics industry

Increase operational efficiency

Solution

Built a comprehensive scalable and reliable service platform to support an end-to-end analytic environment everything from operational data to robust predictive analytic applications

Technology Implemented

MongoDB Enterprise Service Bus Big Data predictive analytics software

Outcome

Reduced end-to-end predictive analytics process from months to days Improved marketing campaign effectiveness with 65% model accuracy

and efficacy

Page 7: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.orgTraditional RDBMS vs NoSQL….

RDBMS NoSQL

Feature Row-Column & Structured Semi-structured & Unstructured in Parallelism – Batch processing

Scalability Vertical by adding systems Horizontal by replicating nodes

Data Handling / Extraction

Slower for large data volumes in analytical – Partitioning

Fast access in both operational and analytical - Sharding

Price Mostly proprietarye.g. Oracle, DB2, SQL Server

Open Sourcee.g.Hadoop, CloudEra, Amazon, Hortonworks

Page 8: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

Big Data – Test Environment

www.unicomlearning.com

www.nextgentesting.org

Test Environment Setup• Infrastructure:

• Cluster Setup – Evaluate Data nodes• Software / Platform• File System, NoSQL DB

• Test Data• Off peak hours traffic testing• Staging environments / Scaled down models• Historical Data / Test data generation using utilities

• Test Infrastructre• Test strategy, testing release cycle• Volume of data consideration • 3rd party tools

• Load Simulators• jMeter for multi-threaded users• Load simulation using Cloud

• Monitoring• Hadoop Performance Monitoring• ECL Watch (HPCC)

Page 9: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Testing Requirements in Big Data

Logging Data Data Streams Social Data Traditional RDBMS

Store Process

Analyze ReportingHad

oop

H

DFS

Enterprise Data

WarehouseBig Data Analytics

BI Reporting

Processed Data

Pre-Hadoop

Validation1

HDFS Data

Validation2

ETL Data

Validation3

Reports Data

Testing4

Ref: Infosys Big Data Solutions

Page 10: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

Testing Approach for Data Validation

www.unicomlearning.com

www.nextgentesting.org

Data Loading in HDFS

Unstructured Data

Structured Data

HDFS

Managing the data

Processed Data

Data Validation

Expected Results

1

2

3

4

Page 11: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Testing Landscape

•Acquire all available data

ACQUIRE

•Organize and Clean data with parallel processing

ORGANIZE•Analyze all data, at once

ANALYZE

•Take business decisions based on active data

BUSINESS USE

Tapping unused , new

datasets

Build new relationship ,

understanding

Data driven business decisions

Page 12: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Testing Landscape – Acquire

•Acquire all available data

Acquire

•Organize and Clean data with parallel processing

Organize•Analyze all data, at once

Analyze

•Take business decisions based on active data

Business Use

Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication

Page 13: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Testing Landscape – Organize

•Acquire all available data

Acquire

•Organize and Clean data with parallel processing

Organize•Analyze all data, at once

Analyze

•Take business decisions based on active data

Business Use

Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication

MR Job Validation• MR Job Logic• Aggregation and

consolidated data• Job Output against

source files

Page 14: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Testing Landscape – Analyze

•Acquire all available data

Acquire

•Organize and Clean data with parallel processing

Organize•Analyze all data, at once

Analyze

•Take business decisions based on active data

Business Use

Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication

MR Job Validation• MR Job Logic• Aggregation and

consolidated data• Job Output against

source files

Data Transformation• Transformation Logic• Data cleansing• Data transfer• Data integrity

Page 15: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Testing Landscape – Business Use

•Acquire all available data

Acquire

•Organize and Clean data with parallel processing

Organize•Analyze all data, at once

Analyze

•Take business decisions based on active data

Business Use

Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication

MR Job Validation• MR Job Logic• Aggregation and

consolidated data• Job Output against

source files

Data Transformation• Transformation Logic• Data cleansing• Data transfer• Data integrity

Reports Definition• Report data validation• OLAP cube testing• Dashboard testing

Page 16: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

www.unicomlearning.com

www.nextgentesting.org

Big Data Testing Readiness…

People

Training

Certifications

Knowledge Base

Process

Data Centric Testing Process

Risk Based Testing

Approach

Good Practices Knowledge

TDM

Test Data Generation

Data Masking Requirements

Data Profiling & Extraction

Utilities

Technology

Non-traditional automation

End to End testing

Adaptability to existing

solutions

Page 17: Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai -  Dec 2013

Organized by

UNICOM Trainings & Seminars Pvt. Ltd.

[email protected]

Speaker name: Manoj Kolhe

Email ID: [email protected]

www.unicomlearning.com

Thank You

www.nextgentesting.org