Applying Testing Techniques for Big Data and Hadoop
-
Upload
mark-johnson -
Category
Technology
-
view
154 -
download
3
Transcript of Applying Testing Techniques for Big Data and Hadoop
![Page 2: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/2.jpg)
Page 2
Who Am I?
• Too many years of programming experience
• Created lots of bugs in my career…but hate for them
to be found by others
Mark Johnson
Regional Director Services -
Hortonworks
![Page 3: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/3.jpg)
Page 3
Who are You?
• Technologist
• Programmer
• Want to
produce low
defect or
defect free
code in
Hadoop
![Page 4: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/4.jpg)
Page 4
Good Hadoop Development is
Hard!
![Page 5: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/5.jpg)
Page 5
Lots of Data !
![Page 6: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/6.jpg)
Page 6
• Volume
• Velocity
• Variety
![Page 7: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/7.jpg)
Page 7
Fast Release Pressure!
![Page 8: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/8.jpg)
Page 8
BUGS!
![Page 9: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/9.jpg)
Page 9
Will anyone really notice that
little problem which affects only
1% of the rows?
![Page 10: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/10.jpg)
Page 10
YES!
1% of the financial transactions on
Wall Street is a lot of money!
Big Data can amplify small
problems!
![Page 11: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/11.jpg)
Page 11
Problem #1: Speed of Light
![Page 12: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/12.jpg)
Page 12
Problem #2: Data Permutations
![Page 13: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/13.jpg)
Page 13
Problem #3: Processing distributed
across multiple nodes
![Page 14: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/14.jpg)
Page 14
How are you currently
preventing BUGS?
![Page 15: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/15.jpg)
Page 15
Consequences
The Longer you wait to find
and fix a problem the more
time it will take to fix it!
![Page 16: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/16.jpg)
Page 16
Verify “DONE”
![Page 17: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/17.jpg)
Page 17
What 4 things do we need to
Verify?
![Page 18: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/18.jpg)
Page 18
1.Does it run?
![Page 19: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/19.jpg)
Page 19
2.) Functional Correctness
to Requirements• Data Filters
• Loops and branches
• Calculations
• Output
![Page 20: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/20.jpg)
Page 20
3. Resources Correctly
Referenced
• Missing files
• Bad file names
• Etc.
![Page 21: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/21.jpg)
Page 21
4. Exceptions and errors
properly handled• System left in a safe state
• User / SysAdmin informed of the failure
• Idempotency
![Page 22: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/22.jpg)
Page 22
Materiality Rule
Materiality Rule: Start with High value tests (easy to test
AND valued by business)
Light & Fast : Don’t create an overly heavy test process
Positive Tests:
• Tests should contain one general data condition
• Individual tests only for those determined by specific
data conditions
Negative Tests:
• Data not present
![Page 23: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/23.jpg)
Page 23
Data
1Sample
Uncollected data
![Page 24: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/24.jpg)
Page 24
Traditional Sampling Problems
• Sample error reflects the risk that, purely by chance, a
randomly chosen sample of opinions does not reflect the
true views of the population. The “margin of error”
reported in opinion polls reflects this risk and the larger
the sample, the smaller the margin of error
• sampling bias is a bias in which a sample is collected in
such a way that some members of the intended
population are less likely to be included than others.
(wikipedia)
![Page 25: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/25.jpg)
Page 25
PIG SAMPLE
![Page 26: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/26.jpg)
Page 26
Manual Sampling
• Build the sample dataset based on your program’s
logic.
• Does not need to be more than a few rows for each
test.
• Can get embedded within your test program to
facilitate test management.
![Page 27: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/27.jpg)
Page 27
• MapReduce testing framework
• Accepts a small record sample for each test
• Test one thing per test
• Does not reference HDFS directly
• Tests Map and Reduce methods separetly
![Page 28: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/28.jpg)
Page 28
Example: Word Count Program -
Mapper
• Does the tokenize work properly
• Does the Mapper properly filter ‘stop’ words
• Is the counter properly initialized
![Page 29: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/29.jpg)
Page 29
Example: Word Count - Reducer
• Keys counter values grouped properly
• Counter values are properly aggregated
![Page 30: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/30.jpg)
Page 30
Setup MRUnit test suite
• MapDriver:
• ReduceDriver:
• MapReduceDriver:
![Page 31: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/31.jpg)
Page 31
MRUnit Mapper test
![Page 32: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/32.jpg)
Page 32
Test Results
![Page 33: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/33.jpg)
Page 33
MRUnit Reducer Test
![Page 34: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/34.jpg)
Page 34
Test Results
![Page 35: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/35.jpg)
Page 35
Traditional Pig Functional Testing tools
DUMP
ILLUSTRATE
DESCRIBE
EXPLAIN
![Page 36: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/36.jpg)
Page 36
PigUnit to test Pig scripts
• Only one LOAD operation
per test script.
• PigUnit overrides STORE,
LOAD and DUMP
![Page 37: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/37.jpg)
Page 37
PigUnit – Setup
• PigTest references
your unmodified pig
script for testing.
• Input:
• Include just the rows required
to test logic
![Page 38: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/38.jpg)
Page 38
PigUnit: Assert
![Page 39: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/39.jpg)
Page 39
![Page 40: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/40.jpg)
Page 40
PigUnit: Output Job Order
![Page 41: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/41.jpg)
Page 41
BeeTest
Developed by: Adam Kawa
GitHub Project: https://github.com/kawaa/Beetest.git
Beetest provides a simple ‘unit’ test capability on a given
Hadoop Hive script which runs on a small cluster to
validate a Hive script
![Page 42: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/42.jpg)
Page 42
Beetest: Inputs and Outputs
Directory containing the following files defining the
test process:
setup.hql – The HQL script to setup the environment
select.hql – The HQL script to test
Input.tsv – input data
Expected.txt – the script’s expected output
variables.properties – the Beetest properties
![Page 43: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/43.jpg)
Page 43
Setting up BeeTest
![Page 44: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/44.jpg)
Page 44
BeeTest: Executing the test
• ${table} – value defined in variables.properties
![Page 45: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/45.jpg)
Page 45
BeeTest – Test Results
![Page 46: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/46.jpg)
Page 46
BeeTest: Test failures
![Page 47: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/47.jpg)
Page 47
Getting started with your test initiative
1. Start Simple
2. Materiality rule: Focus on Highest value and easiest
tests first
3. Data Sampling: Use the smallest datasets possible
4. Keep tests “light and fast”
5. Keep Hadoop code and tests in a Source Code
Management system
6. Use Automated test environment (Jenkins, AntHill
Pro, etc.)
7. Maintain and publicly publish historical test results
![Page 48: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/48.jpg)
Page 48
Hadoop Test Tool Wrap Up
• Tools and Techniques
– Data Sampling
– MRUnit
– PigUnit
– Beetest
Testing environment still imature but good enough to start
using now
![Page 49: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/49.jpg)
Page 49
Mark Johnson
Regional Director Services
Hortonworks
Linkedin:markfjohnson
Twitter: markfjohnson
Source:
https://github.com/mfjohnson/HadoopTesting.
git
![Page 50: Applying Testing Techniques for Big Data and Hadoop](https://reader034.fdocuments.in/reader034/viewer/2022050907/55a933141a28ab3a368b47c5/html5/thumbnails/50.jpg)
Page 50