Big Data - Hadoop and MapReduce - Aditya Garg
-
Upload
agile-testing-alliance -
Category
Technology
-
view
1.052 -
download
2
Transcript of Big Data - Hadoop and MapReduce - Aditya Garg
![Page 1: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/1.jpg)
Confidential | Copyright © QAAgility Technologies
Big Data - Hadoop and MapReduce - new age tools for aid to testing and
QAby Aditya Garg
![Page 2: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/2.jpg)
Big Data - Hadoop and MapReduce - new age tools
for aid to testing and QA
Topic for the presentation
![Page 3: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/3.jpg)
What is this
Confidential | Copyright © QA Agility Technologies
![Page 4: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/4.jpg)
1. How to test Big Data applications ?
2. How can QA and Testing team use Big Data tools for their testing needs ?
What are we going to discuss ?
![Page 5: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/5.jpg)
1. How to test Big Data applications ?
2. How can QA and Testing team use Big Data tools for their testing needs ?
What are we going to discuss ?
![Page 6: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/6.jpg)
Confidential | Copyright © QA Agility Technologies
What is Big Data ?
Is it just too much Hype or reality ?
![Page 7: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/7.jpg)
Here is latest one from yesterday on #Bigdata
![Page 8: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/8.jpg)
Confidential | Copyright © QA Agility Technologies
Let us start with what exactly is BigData
![Page 9: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/9.jpg)
Which Search Engine do you use ?
https://www.cirrusinsight.com/blog/how-much-data-does-google-store
http:
//se
arch
stor
age.
tech
targ
et.c
om/
defin
ition
/Kilo
-meg
a-gi
ga-te
ra-p
eta-
and-
all-
that
How much data does Google store ?
![Page 10: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/10.jpg)
![Page 11: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/11.jpg)
Key Points in Big Data
1.Volume – Data Explosion
2.Velocity3.Variety4.Veracity
![Page 12: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/12.jpg)
Ref: IBM.com
Key Points in Big Data
![Page 13: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/13.jpg)
Definition
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#379879e621a9
Ref: goo.gl/iWZhjJ
![Page 14: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/14.jpg)
Big Data Application
1. Finance2. Insurance3. Health Care4. Agriculture5. Defense6. Manufacturing7. Aero Space8. Oil and Gas9. Advertisement and Marketing10.Election Campaigns11. List goes on --- applicability across industries
![Page 15: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/15.jpg)
http://snip.ly/UKNB#http://bit.ly/1OF5nhF
![Page 16: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/16.jpg)
Big Data Application
http://www.forbes.com/sites/bernardmarr/2016/02/03/how-the-super-bowl-uses-big-data-to-change-the-game/?
![Page 17: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/17.jpg)
Big Data Application
http://andrewshamlet.com/2015/12/03/who-will-win-the-2016-us-presidential-nominations/
![Page 18: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/18.jpg)
Lets go back to definition
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
![Page 19: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/19.jpg)
Confidential | Copyright © QA Agility Technologies
Tools solving Big Data Challenge
![Page 20: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/20.jpg)
Tool solving the Big Data Challenge
![Page 21: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/21.jpg)
*Source Udacity
Hadoop – Key components HDFS and MR
![Page 22: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/22.jpg)
*Source Udacity
1. Sqoop takes data from regular RDBMS and puts it into HDFS
2. Flume ingests data into HDFS as it is generated by external systems
3. HBASE is real time database on top of HDFS
4. Hue is a graphical front end to the cluster
5. Oozie is workflow management tool
6. Mahout is Machine Learning library
Hadoop Ecosystem
![Page 23: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/23.jpg)
HDFS
• HDFS stands for Hadoop Distributed File System, which is the storage system used by Hadoop. The following is a high-level architecture that explains how HDFS works.
![Page 24: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/24.jpg)
Map Reduce
Ref: Emanuele Della Valle@manudellavalle
![Page 25: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/25.jpg)
Confidential | Copyright © QA Agility Technologies
Understanding MapReduce
Demo – Word Count
Given an input file, count unique words
![Page 26: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/26.jpg)
WordCount – Map Reduce
Reference : http://wearecloud.cz/media/files/prezentace-biz/Big%20Data%20v%20Cloudu.ppt
![Page 27: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/27.jpg)
Confidential | Copyright © QA Agility Technologies
How can QA and Testing team use Big Data tools for their testing needs ?
![Page 28: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/28.jpg)
Confidential | Copyright © QA Agility Technologies
Problem Statement and Solution using Hadoop
and MapReduce
![Page 29: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/29.jpg)
MTBT – Multicast Tick by Tick Adapter
Input was exchange feed – Output given to HFT Engine
Exchange TAP – Co-location servers listen to it at high speed
Legacy Adaptor (3rd Party) connects to the TAP – and converts to a format which can be used by HFT Platforms (Algorithmic Trading Platforms)
New Adaptor – being made Inhouse – to increase the
speed by 10 Times
HFT Engine
MTBT - Adaptor
![Page 30: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/30.jpg)
MTBT – Multicast Tick by Tick Adapter
•Client was trying to build a brand new MTBT Exchange Adaptor•The adaptor was being developed in C and Unix and was to run in a co-location with NSE (National Stock Exchange)•The new adaptor was supposed to increase the overall speed by more than 10 times from the existing adaptor•The Goal was to test the new adaptor
![Page 31: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/31.jpg)
LEGACY
INHOUSE (NEW)
Input OutputOutput over time
MTBT - Adaptor
Sample
Sample
Sample
Sample
Sample
Do A Reverse Comparison
MTBT – Testing Strategy - Sampling
![Page 32: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/32.jpg)
LEGACY
INHOUSE (NEW)
Input OutputOutput over time
MTBT - Adaptor Challenges--------------------------------------------------1. Manually next to impossible2. Even few seconds samples were
running into large MegaBytes (MB) files
3. Manually impossible to compare the legacy records with the New code processed records
4. Daily processed data ran into 150 Giga Bytes (GB) plus files
MTBT – Challenges
![Page 33: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/33.jpg)
LEGACY
INHOUSE (NEW)
Input OutputOutput over time
MTBT - Adaptor BIG DATA Problem--------------------------------------------------1. LARGE 150 GB files (legacy and New
applications) – VOLUME
2. Testing to compare the output and measure the functional effectiveness in real time data environment – VELOCITY
3. Packet drops may happen – (VERACITY)
4. Variety was not there – except the format of the output file generated was not in similar format – the content/information was there
MTBT – It was a BIG DATA Testing problem
![Page 34: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/34.jpg)
MTBT – SOLUTION
1 Reduce LEGACY MTBT - Output file into a standard format
2 Reduce NEW INHOUSE MTBT output file into a standard format
3 Compare the two files
4 Generate Report
![Page 35: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/35.jpg)
Confidential | Copyright © QA Agility Technologies
QA team can use the tools in multiple scenarios1. Beta Testing2. Repeated execution effectiveness –
applying analytics ( R)3. Capturing Customer feedback and
channeling the same for smarter test execution
4. Extracting relevant information from repeated regression cycles from QC
5. Adding intelligence on the data generated by the testing team
Other scenarios – Big Data Tool implementation
![Page 36: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/36.jpg)
Thank you and Jai Hind
Questions ?@adigIndia@AgileTA#GTR2016
![Page 37: Big Data - Hadoop and MapReduce - Aditya Garg](https://reader036.fdocuments.in/reader036/viewer/2022062503/58714db21a28ab55588b7281/html5/thumbnails/37.jpg)
ContactPlease contact us at
Confidential | Copyright © QAAgility Technologies
MUMBAI711, Rupa SolitaireMBP, MahapeNavi Mumbai-400701
DENMARK1 Lindebo 7 Lej - 42,2630 Tasstrup, [email protected]
USA 200 E Campus View Blvd.Suite 200, Columbus, OH