Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

24
Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13

Transcript of Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Page 1: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Anand Hegde

Prerna Shraff

Performance Analysis of Lucene Index on HBase Environment

Group #13

Page 2: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Overview

•HBase vs BigTable

•The Problem

• Implementation

•Performance Analysis

•Survey

•Conclusion

Page 3: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

HBase vs BigTable

BigTable

• Compressed, high performance database system 

• It is built GFS using Chubby Lock Service, SSTable etc.

HBase

• Hadoop Database

• Open source, distributed versioned, column oriented

•Modeled after BigTable

Page 4: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

The Problem

•Data intensive computing requires storage solutions for huge amount of data.

•The requirement is to host very large tables on clusters of commodity hardware.

•HBase provides BigTable like capabilities on top of Hadoop.

•Current implementation in this field includes an experiment using Lucene Index on HBase in an HPC Environment. (Xiaoming Gao, Vaibhav Nachankar, Judy Qiu)

Page 5: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Architecture

Page 6: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Implementation

•Configured Hadoop and HBase on Alamo cluster.

•Added scripts to run the program sequentially on multiple nodes.

•Modified scripts to record size of the table.

•Modified scripts to record time of execution for both sequential and parallel execution.

Page 7: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Performance Analysis

•Sequential execution across same number of nodes for different data sizes.

•Sequential execution across different number of data nodes for same data size.

•Parallel execution across same number of nodes for different data sizes.

Page 8: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Analysis details

• Performed analysis on Alamo cluster on FutureGrid

• System type: Dell PowerEdge

•No. of CPUs: 192

•No. of cores: 768

• 3 ZooKeeper nodes + 1 HDFS-Master + 1 HBase-master

Page 9: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Analysis details

00000004###md###Title###Geoffrey C. Fox Papers Collection 1990

00000004###md###Category###paper, proceedings collection

00000004###md###Authors###Geoffrey C. Fox, others

00000004###md###CreatedYear###1990

00000004###md###Publishers###California Institute of Technology CA

00000004###md###Location###California Institute of Technology CA

00000004###md###StartPage###1

00000004###md###CurrentPage###105

00000004###md###Additional###This is a paper collection of Geoffrey C. Fox

00000004###md###DirPath###Proceedings in a collection of papers from one conference/Fox

00000005###md###Title###C3P Related Papers - T.Barnes

00000005###md###Category###paper, proceedings collection

00000005###md###Authors###T.Barnes, others

Page 10: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Number of nodes: 11

100 MB 300 MB 500 MB 800 MB 1 GB0

10

20

30

40

50

60

70

Sequential execution

Size of data

Tim

e in s

econds

Page 11: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Size of data: 50 MB

11 nodes 13 nodes 15 nodes 17 nodes 19 nodes0

1

2

3

4

5

6

7

Sequential execution

Number of nodes

Tim

e in s

econds

Page 12: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Number of nodes: 13

1 GB 5 GB 10 GB 20 GB 30 GB0

2

4

6

8

10

12

14

16

Parallel Execution

Size of data

Tim

e in m

inute

s

Page 13: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Survey

• There are a lot of load testing frameworks available to run distributed tests using many machines.

• Popular ones are Grinder, Apache JMeter, Load Runner etc.

•Compared the above testing frameworks to choose the best framework.

Page 14: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Why Survey?

•Gives the absolute measure of the system response time.

• Targets the regressions on the sever and the application code.

• Examines the response.

•Helps evaluate and compare middleware solutions from different vendors.

Page 15: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Load Runner

•Automated performance testing product on a commercial ground

• Supports JavaScript and C-script

•Windows platform

•Commercial

•Aimed for Automated Test Engineers

•Has a UI

Framework:

•Virtual User Scripts

•Controller

Page 16: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Apache JMeter

• Pure Java desktop application

• designed to load test functional behavior and measure performance

• designed for testing Web Applications

• Java based

•Highly extensible

Test Plan

• Thread Groups

• Controllers

• Samplers

• Listeners

Page 17: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Grinder

•Open source

•Uses Jython

• Scripts can be run by defining the tests in the grinder.properties file

Framework:

•Console

•Agent

•Workers

Page 18: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Grinder

Page 19: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Comparison

Parameter LoadRunner Grinder JMeter

Server monitoring

Strong for MS Windows

Needs wrapper based

approach

No built in monitoring

Amount of load

Number of users restricted

Number of agents restricted

Number of agents depend on H/W support available

Able to run in batch?

No No Yes

Ease of installation

Difficult Moderate Easy

Setting up tests

Icon based Uses Jython Java based

Page 20: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Comparison

Parameter LoadRunner Grinder JMeter

Running tests

Complex Moderate Simple

Result generation

Integrated analysis tool

No integrated tool available

Can generate client side graphs

Agent management

Easy/Automatic

Manual Real time/Dynamic

Cross Platform

No. MS Windows only

Yes Yes

Intended audience

Aimed at non-developers

Aimed at developers

Aimed at non-builders

Stability Poor Moderate Poor

Cost Expensive Free (open source)

Free (open source)

Page 21: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Roadmap

Study HBase

Study Lucene Indexing

ModifyScripts

Add Scripts

Study TestingFrameworks

Implement Grinder

Page 22: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Conclusion

• Sequential execution takes more time compared to parallel execution on HBase.

•Research indicates that HBase is not as robust as the BigTable yet.

•Regarding the testing framework, we recommend Grinder as it is an open source tool and has lot of documentation.

•Grinder also provides good real time feedbacks.

Page 23: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

References

• http://grinder.sourceforge.net/

• http://jmeter.apache.org/

• http://www8.hp.com/us/en/software/software-product.html?compURI=tcm:245-935779

• http://hpcdb.org/sites/hpcdb.org/files/gao_lucene.pdf

• http://hadoop.apache.org/common/docs/stable/file_system_shell.html#du

Page 24: Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Thank you