Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai - Dec 2013
-
Upload
manoj-kolhe -
Category
Software
-
view
107 -
download
0
Transcript of Manoj Kolhe - Unicom - India Testing Week - Big Data Testing - Mumbai - Dec 2013
LOGO
www.unicomlearning.com
Manoj KolheProject Lead, Testing Service Line, L&T Infotech
India Testing Week 2013
Big Data Testing
December 10, 2013Mumbai
www.unicomlearning.com
www.nextgentesting.org
www.unicomlearning.com
www.nextgentesting.org
Introduction to Big Data
Web Logs
Social N/W Data
Transactional Data
Statistical Data
Database Cluster
Database of Databases
Big Data
BI Reporting
Trend Analysis
Decision Making
www.unicomlearning.com
www.nextgentesting.org
Wikipedia:
Big Data term is applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process within a tolerable elapsed time.
Gartner -
Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization
Gartner
..the real issue is making sense of big data and finding patterns in it that help organizations make better business decisions. “look for patterns that support business decisions in what we call Pattern-Based Strategy”
Forrester
Opportunities to improve the bottom line exist in a flood of information; however, gaining insight from data becomes challenging as it grows extremely large. Emerging technology applies the power of distributed, virtual computing to the problem of large data,
SAS
With Big Data Analytics, organizations can make better and faster decisions based not only on what has happened, but what will happen next. They can also predict the best possible outcome while remaining agile in swiftly changing times
Defining Big Data
www.unicomlearning.com
www.nextgentesting.org
How Big is Big data?
175 million tweets a
day
465 million user
accounts
30+ PB User Data
100TB daily uploads
50 billion user photos
20+ PB Daily
Data
300 billion videos
Total runtime 47 million years
48 hrs of video each minute of
the day
5 billion users are
calling, texting,
browsing data
2.9 millions of mails
exchanged every
second
1.3 Exabyte's ~1018
bytes
Popular 5Vs:
Big Data
• Volume• Variety• Velocity• Viability • Value
90% of the data in the world today has been created in the last two years alone
Market Statistics & Applications of Big Data
www.unicomlearning.com
www.nextgentesting.org
www.unicomlearning.com
www.nextgentesting.org
Big Data – Use Case
Case Study – Telematics Domain
Objectives
Create a state-of-the-art analytic environment to support and fuel fast growing telematics industry
Increase operational efficiency
Solution
Built a comprehensive scalable and reliable service platform to support an end-to-end analytic environment everything from operational data to robust predictive analytic applications
Technology Implemented
MongoDB Enterprise Service Bus Big Data predictive analytics software
Outcome
Reduced end-to-end predictive analytics process from months to days Improved marketing campaign effectiveness with 65% model accuracy
and efficacy
www.unicomlearning.com
www.nextgentesting.orgTraditional RDBMS vs NoSQL….
RDBMS NoSQL
Feature Row-Column & Structured Semi-structured & Unstructured in Parallelism – Batch processing
Scalability Vertical by adding systems Horizontal by replicating nodes
Data Handling / Extraction
Slower for large data volumes in analytical – Partitioning
Fast access in both operational and analytical - Sharding
Price Mostly proprietarye.g. Oracle, DB2, SQL Server
Open Sourcee.g.Hadoop, CloudEra, Amazon, Hortonworks
Big Data – Test Environment
www.unicomlearning.com
www.nextgentesting.org
Test Environment Setup• Infrastructure:
• Cluster Setup – Evaluate Data nodes• Software / Platform• File System, NoSQL DB
• Test Data• Off peak hours traffic testing• Staging environments / Scaled down models• Historical Data / Test data generation using utilities
• Test Infrastructre• Test strategy, testing release cycle• Volume of data consideration • 3rd party tools
• Load Simulators• jMeter for multi-threaded users• Load simulation using Cloud
• Monitoring• Hadoop Performance Monitoring• ECL Watch (HPCC)
www.unicomlearning.com
www.nextgentesting.org
Testing Requirements in Big Data
Logging Data Data Streams Social Data Traditional RDBMS
Store Process
Analyze ReportingHad
oop
H
DFS
Enterprise Data
WarehouseBig Data Analytics
BI Reporting
Processed Data
Pre-Hadoop
Validation1
HDFS Data
Validation2
ETL Data
Validation3
Reports Data
Testing4
Ref: Infosys Big Data Solutions
Testing Approach for Data Validation
www.unicomlearning.com
www.nextgentesting.org
Data Loading in HDFS
Unstructured Data
Structured Data
HDFS
Managing the data
Processed Data
Data Validation
Expected Results
1
2
3
4
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape
•Acquire all available data
ACQUIRE
•Organize and Clean data with parallel processing
ORGANIZE•Analyze all data, at once
ANALYZE
•Take business decisions based on active data
BUSINESS USE
Tapping unused , new
datasets
Build new relationship ,
understanding
Data driven business decisions
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Acquire
•Acquire all available data
Acquire
•Organize and Clean data with parallel processing
Organize•Analyze all data, at once
Analyze
•Take business decisions based on active data
Business Use
Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Organize
•Acquire all available data
Acquire
•Organize and Clean data with parallel processing
Organize•Analyze all data, at once
Analyze
•Take business decisions based on active data
Business Use
Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication
MR Job Validation• MR Job Logic• Aggregation and
consolidated data• Job Output against
source files
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Analyze
•Acquire all available data
Acquire
•Organize and Clean data with parallel processing
Organize•Analyze all data, at once
Analyze
•Take business decisions based on active data
Business Use
Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication
MR Job Validation• MR Job Logic• Aggregation and
consolidated data• Job Output against
source files
Data Transformation• Transformation Logic• Data cleansing• Data transfer• Data integrity
www.unicomlearning.com
www.nextgentesting.org
Testing Landscape – Business Use
•Acquire all available data
Acquire
•Organize and Clean data with parallel processing
Organize•Analyze all data, at once
Analyze
•Take business decisions based on active data
Business Use
Data Validation• Data Comparison• Extraction of right data• HDFS loading• Data replication
MR Job Validation• MR Job Logic• Aggregation and
consolidated data• Job Output against
source files
Data Transformation• Transformation Logic• Data cleansing• Data transfer• Data integrity
Reports Definition• Report data validation• OLAP cube testing• Dashboard testing
www.unicomlearning.com
www.nextgentesting.org
Big Data Testing Readiness…
People
Training
Certifications
Knowledge Base
Process
Data Centric Testing Process
Risk Based Testing
Approach
Good Practices Knowledge
TDM
Test Data Generation
Data Masking Requirements
Data Profiling & Extraction
Utilities
Technology
Non-traditional automation
End to End testing
Adaptability to existing
solutions
Organized by
UNICOM Trainings & Seminars Pvt. Ltd.
Speaker name: Manoj Kolhe
Email ID: [email protected]
www.unicomlearning.com
Thank You
www.nextgentesting.org