Performance Comparison between Apache Hive and Oracle SQL for Big Data

27
Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analysis Presented by:- Santosh Kumar Dash |M.Tech CSE, Utkal University ASE| Tata Consultancy Services Limited 8th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016)

Transcript of Performance Comparison between Apache Hive and Oracle SQL for Big Data

Page 1: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Performance Comparison Between Apache Hive and Oracle SQL for Big Data

AnalysisPresented by:- Santosh Kumar Dash |M.Tech CSE, Utkal University

ASE| Tata Consultancy Services Limited

8th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016)

Page 2: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Contents• Introduction• Motivation and Objective• Proposed Methodology• Data Set Description• Experimental Results• Conclusion and Future Scope• References

Page 3: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Introduction

Page 4: Performance Comparison between Apache Hive and Oracle SQL for Big Data

What is Bigdata?

Page 5: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Bigdata Applications

Page 6: Performance Comparison between Apache Hive and Oracle SQL for Big Data

What is ORACLE SQL?

Page 7: Performance Comparison between Apache Hive and Oracle SQL for Big Data

What is Apache Hive?

Page 8: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Motivation• Many researchers have tried to perform Big Data

Analytics using traditional methods which resulted in poor performances due to memory constraints. • Hence, we are motivated to explore the suitability of

Apache Hive as a distributed database for faster retrieval in comparison to popular Oracle SQL approach.

Page 9: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Objective• Performance Analysis of Oracle SQL w.r.t. time• Performance Analysis of Apache Hive w.r.t. time.• Performance Analysis of Mean processing time between

Apache Hive and Oracle SQL.

Page 10: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Proposed MethodologyThe diagram represents the Proposed Model for the experiment.

Page 11: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Data Set Description• Online Video

Characteristics and Transcoding Time • Record Linkage

Comparison Patterns • 3D Road Network (North

Jutland, Denmark)• Rate (Health-Insurance-

Market place)

Data Set Name Rows Columns

Video 168286 11

Record 5749132 12

Road 434874 4

Rate 13,000000 23

Page 12: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Column Names of DatasetsVideo:- YouTube video id, duration, bitrate (total in Kbits), bitrate(video bitrate in Kbits), height(in pixels), width(in pixels), frame rate, estimated framerate, codec, category, and direct video link.

Record:- id_1, id_2, cmp_fname_c1, cmp_fname_c2,cmolname_c1, cmp_lname_c2, cmp_sex, cmp_bd, cmp_bm, cmp_by, cmp_plz and is_match

Page 13: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Column Names of DatasetsRoad:- OSM_ID, LONGITUDE, LATITUDE, and ALTITUDE

Rate:- BusinessYear, StateCode, IssuerId,Source_Name,Version_Num,ImportDat,Issuer_Id2,FederalTIN,RateEffectiveDate,RateExpiraon Date, PlanId , Rating AreaId, Tobacco, Age, Individual Rate, Individual Tobacco Rate, Couple, Primary Subscriber And One Dependent, Primary Subscriber And Two Dependents, Primary Subscriber And Three Or More Dependents, Couple And One Dependent, Couple And Two Dependents, couple and Three Or More Dependents, and RowNumber

Page 14: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Experimental Results

Page 15: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Queries Description StatementsQuery 1 Retrieving Unique column using

DISTINCTRetrieving unique Output

RecordsQuery 2 Retrieving Records from a given

dataset using ORDER BY for general Sorting

Sorting

Query 3 Retrieving Records Using ORDER BY and DESC for Backward Sorting

Sorting Backward

Query 4 Using COUNT and GROUP BY for Retrieving records and their count.

Grouping with Counting

Query 5 Using MAX aggregate function for retrieving MAXIMUM value from a record

Maximum Value

Page 16: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Performance Comparison of Video dataset

Query 1 Query 2 Query 3 Query 4 Query 50

10

20

30

40

50

60

70

80

Hive Oracle SQL

Page 17: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Performance Comparison of Record dataset

Query 1 Query 2 Query 3 Query 4 Query 50

10

20

30

40

50

60

70

Hive Oracle SQL

Page 18: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Performance Comparison of Road dataset

Query 1 Query 2 Query 3 Query 4 Query 50

5

10

15

20

25

30

Hive Oracle SQL

Page 19: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Performance Comparison of Road dataset

Query 1 Query 2 Query 3 Query 4 Query 50

200

400

600

800

1000

1200

1400

1600

Hive Oracle SQL

Page 20: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Mean Processing Time of All Datasets

Video Road Record Rate0

100

200

300

400

500

600

700

800

900

Hive Oracle SQL

Page 21: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Conclusion APACHE HIVE

• In large data sets, Apache hive is very efficient.• Queries with Group By,

Order By and aggregate function are taking more time as compared to a retrieving entire dataset.• average time is more if the

number of rows is less

ORACLE SQL

• In Small scale datasets, Oracle SQL performs better. • Queries with Group By,

Order By and aggregate function are taking less time as compared to a retrieving entire dataset.• average time is less if the

number of rows is less

Page 22: Performance Comparison between Apache Hive and Oracle SQL for Big Data

Future work• In future, we will take large-scale (In TB) datasets and

do analysis on both Apache Hive and Oracle SQL for the performance test.

Page 23: Performance Comparison between Apache Hive and Oracle SQL for Big Data

References1. Chawda RK , Big Data and Advanced Analytics

Tools.Symposium on Colossal Data Analysis and Networking (CDAN).(2016)

2. Garg V, Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive. 2015 Int Conf Comput Intell Commun Networks, pp. 938–941.(2015)

3. Gruenheid A, Omiecinski E, Mark L Query Optimization Using Column Statistics in Hive Categories and Subject Descriptors.(2016)

4. Haryono GP, Zhou Y (2016) Profiling apache HIVE query from runtime logs. Int Conf Big Data Smart Comput BigComp pp. 61–68.(2016)

Page 24: Performance Comparison between Apache Hive and Oracle SQL for Big Data

References5. Kaisler S, Armour F, Espinosa JA, Money W, Big Data:

Issues and Challenges Moving Forward. 2013 46th Hawaii Int Conf Syst Sci.pp. 995–1004. (2013)

6. Rotsnarani Sethy, Mrutyunjaya Panda, Big Data Analysis Using Hadoop: A survey. IJARCSSE pp.1153–1157.(2015)

7. Thusoo A, Sarma J Sen, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R ,Hive - A petabyte scale data warehouse using Hadoop. In: Proc. - Int. Conf. Data Eng. pp 996–1005.(2010)

8. Loshin D, Chapter 7 - Big Data Tools and Techniques. pp 61–72 (2013)

Page 25: Performance Comparison between Apache Hive and Oracle SQL for Big Data

References9. Hive Architecture.

https://cwiki.apache.org/confluence/display/Hive/Design.

10.Introduction to Oracle Database. https://docs.oracle.com/database/121/CNCPT/intro.htm#CNCPT001.

11.Online Video Characteristics and Transcoding Time Dataset Data Set. https://archive.ics.uci.edu/ml/datasets.html.(2015)

12.Record Linkage Comparison Patterns Data Set. https://archive.ics.uci.edu/ml/datasets.html.(2011)

Page 26: Performance Comparison between Apache Hive and Oracle SQL for Big Data

References13.3D Road Network (North Jutland, Denmark) Data Set.

https://archive.ics.uci.edu/ml/datasets.html.(2013) 14.Rate Data Set https://www.kaggle.com/hhsgov/health-

insurance-marketplace.(2015)15.5v’s of Big data http://bigdata.black/featured/what-is-

big-data/16.Big data applications http://image.slidesharecdn.com/

Page 27: Performance Comparison between Apache Hive and Oracle SQL for Big Data