Divide and-conquer approach towards data analytics testing

16
Kokila Rudresh Devangana Khokhar Divide-and-Conquer Testing in Data Analytics Domain VodQA 2015

Transcript of Divide and-conquer approach towards data analytics testing

Page 1: Divide and-conquer approach towards data analytics testing

Kokila Rudresh

Devangana Khokhar

Divide-and-Conquer Testing in Data Analytics Domain

Vo d Q A 2 0 1 5

Page 2: Divide and-conquer approach towards data analytics testing

Data Analytics: An Introduction

Collection

Processing Modelling Inference Visualization

Page 3: Divide and-conquer approach towards data analytics testing

Data Analytics: Use Cases

Business Intelligence

Social NetworksAstronomy and

Astrophysics

Robotics and Artificial Intelligence Life Sciences

Finance and Stock Market

Medical Imaging

Computer Graphics

Computer Vision

Energy Exploration

Page 4: Divide and-conquer approach towards data analytics testing

Data Analytics: Why Testing is Important

Volume

DomainComplexity

Variety

Computations

Testing

Thou shalt not leave the application untested!

Page 5: Divide and-conquer approach towards data analytics testing

Data Analytics: Testing Challenges

Data Validation

Model Implementation

Business Perspective

Page 6: Divide and-conquer approach towards data analytics testing

Data Analytics: Typical System Implementation

Extract

Transform

Load

Source Data

Simulation AggregationETL VisualizationRaw Data

Page 7: Divide and-conquer approach towards data analytics testing

Format

Consistency

Completeness

Divide-and-Conquer Testing

ExtractTransform

LoadSource Data

Pre-ETL Validations

Page 8: Divide and-conquer approach towards data analytics testing

Divide-and-Conquer Testing

ExtractTransform

LoadSource Data

Post-ETL Tests

Meta-data

Data transformation

Data quality checks

Business-specific validations

Page 9: Divide and-conquer approach towards data analytics testing

Divide-and-Conquer Testing

ExtractTransform

LoadSource Data

Simulation Validations

Model Validation

Implementation

Computation

Page 10: Divide and-conquer approach towards data analytics testing

Divide-and-Conquer Testing

ExtractTransform

LoadSource Data

Aggregation Validations

Data Hierarchy

Data Scope

Summarized Values

Page 11: Divide and-conquer approach towards data analytics testing

Divide-and-Conquer Testing

ExtractTransform

LoadSource Data

UI Validations

Information Representation

Data Format

Result Intuitiveness

Page 12: Divide and-conquer approach towards data analytics testing

Learnings

ANALYSE

CODETEST

Initial Data Flow• Pre defined data

template• Pre-ETL data validations

Domain Knowledge• KT Sessions involving SME’s• Core computations

Business Involvement• Test data closer to real

time data• User flows prioritization

Page 13: Divide and-conquer approach towards data analytics testing

Learnings

Implementation

• Alternate implementation• SME validation

Computation

• Addressing the right problem

• Computational Factors

ANALYSE

CODETEST

Page 14: Divide and-conquer approach towards data analytics testing

Learnings

Testing Process• Step wise data

validation• Defect investigation

Test Automation• Data combinations• Xml test data

Test Execution• CI test execution• Execution frequency

Test Data• Data distribution• Edge case data

Testing Tools• Spreadsheet gear• Excel macros

ANALYSE

CODETEST

Page 15: Divide and-conquer approach towards data analytics testing

Domain Context

Integrating Business

Use-cases

Design and Testing

Challenges

Testing Approach Learnings

Summary

Page 16: Divide and-conquer approach towards data analytics testing

[email protected]@thoughtworks.com

@DevanganaK