Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved...

40
Insight Case Studies Tuning the Beloved DB-Engines Presented By Nithya Koka and Michael Arnold

Transcript of Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved...

Page 1: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

Insight Case Studies

Tuning the Beloved DB-Engines

Presented By Nithya Koka and Michael Arnold

Page 2: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Who is Nithya Koka ?

● Senior Hadoop Administrator ○ Project Lead ○ Client Engagement○ On-Call Engineer○ Cluster NinjaOn numerous Insight projects

● 5+ years in IT - 4 years with Hadoop

Page 3: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Who is Michael Arnold ?

● Principal Systems Engineer

● Automation geek

● 20+ years in IT - 9 years with Hadoop

● I help people deal with:

○ Servers (physical and virtual)

○ Networks

○ Server operating systems

○ Hadoop distributions

○ Making it all run smoothly

Page 4: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

Impala Tuning

Case Study

HBase Tuning

Case Study

Agenda

C L A I R V O Y A N T S O F T . C O M

Page 5: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Tuning Case Study

Page 6: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

1. Impala threads peak, crash the daemon, and all queries hang causing complete outage to their end users. This is happening over: ○ 2 years, on and off○ Multiple support tickets ○ Several tuning attempts

No trends on host or timeframe where these incidents tend to occur

2. Impala queries on HUE error out with expired results messages

Case Study: ClientA Impala Woes

Page 7: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Initial Insight Evaluation

Gotchas Captured: ● Role Layout: over burdened “Master hosts”● Using the buggy RHEL kernel (Linux

2.6.32-504.3.3.el6.x86_64)● Multiple Java versions● Default swappiness● Transparent hugepages was enabled

Page 8: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads

Typical Incident Pattern

Page 9: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads

Typical Incident Pattern

Page 10: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

1. Potential disk errors in dmesg output for incident prone hosts.

2. The JVM crashes reported by Impala.

3. HDFS file count snowballing.

Page 11: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

1.15Million

750K

File

s

Page 12: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

1. Disk Errors● Without Spill directories configured, Scratch was defaulting

to /tmp/impala-scratch, which was unsuitable for the scale and concurrency.

Resolution: ● Spread the disk spill across the data drives.

Page 13: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

1. Disk Errors● Identified bad RAID controller :

Three problem disks on a master host, RAID10 virtual disk for namenode, RAID1 virtual disk for Journalnode and another RAID1 virtual disk for Zookeeper.

Resolution: ● The host with bad disks was decommissioned to replace the

disks and brought back in a good state.● Regular scans have been set with the raid controller CLI to

alert about any future incidents.

Page 14: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

2. Impala reported JVM Crashes ● The running OS kernel version is known to cause

CDH applications to pause and result in JVM hangs as seen on Impala reports.

Resolution: ● Upgrading kernel version to 2.6.32-504.16.2.el6

or later is recommended

Page 15: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

3. The small files problem: ● Parquet files in order of KB which led to slow IO throughput.● Coordinator and Executor connections fail due to high scan times

from NN.● The failed executor connections kick off more threads which add up

very quickly and crash the daemon.

Resolution: ● By rewriting Parquet Compaction to dynamic partitions the client

was able to produce 1 file in place of 29 files, significantly reducing the file count overall.

Page 16: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

Tuning for Scale

● Since Impala 2.9, we can assign Impala Daemons as query coordinators or query executors.

● These two components can now be tuned as per their responsibilities giving us more flexibility.

Page 17: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

Tuning for Scale

Coordinators: ● Perform the network communication to keep metadata up-to-date

and route query results to the appropriate clients.

● Experience significant network and CPU overhead with queries containing a large number of query fragments.

● Need large JVM heap for caching metadata for all table partitions and data files.

Page 18: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

Tuning for Scale

Executors: ● Need default JVM Heap, leaving more memory

available to process CPU intensive joins, aggregations, and other operations.

● Executors perform I/O intensive scans.

Page 19: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Impala Threads : Deep Dive

Tuning for Scale

Coordinators: How Many? [Our cluster: 3]● Small is good (a minimum of 1 dedicated)● Considerations: # of Impala Daemons, DDL queries, average query

resource usage at various stages.

Where do they go? [Our cluster: Utility hosts]● Coordinators can go non-workers.● Avoid losing out on resources, memory, or disk.

Page 20: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Choosing the right Load-Balancing Algorithm for High Availability through a proxy.

LeastConn:

High Availability

What? Connects sessions to the coordinator with the fewest connections, to balance the load evenly.

When? Many independent, short-running queries.

Where? Recommended for Impala with F5.

Page 21: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Choosing the right Load-Balancing Algorithm for High Availability through a proxy.

RoundRobin:

High Availability

What? Distributes connections to all coordinator nodes, we can add list of servers with a weight parameter to define the distribution.

When? Predictable and stable balancing, requires to perform benchmarks and load testing.

Where? Not recommended by Cloudera for Impala.

Page 22: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Impala Tuning

Choosing the right Load-Balancing Algorithm for High Availability through a proxy.

Source Persistence:

High Availability

What? The source IP address is hashed and divided by the total weight of the running servers to determine which server will receive the request.

When? Impala workloads containing a mix of queries and DDL statements, such as CREATE TABLE and ALTER TABLE.

Where? It is required for setting up high availability with Hue.

Page 23: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

HBase Tuning Case Study

Page 24: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

● Client wanted to upgrade from manually installed HBase environment to the Cloudera distribution's HBase.

● New hardware with much larger RAM footprint.● SSDs, because, why not? (And not important to

this tuning.)

Case Study: ClientB OpenTSDB Platform Upgrade

Page 25: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Initial Insight Evaluation

Gotchas Captured:

● None, really. It is not installed yet, but we will need to tune HBase to utilize a lot more memory.

Page 26: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Use the Java Development Kit (JDK) version 8.

Java

Page 27: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable garbage collection (GC) logging.Java

-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC -XX:+PrintFlagsFinal -Xloggc:/var/log/hbase/regionserver-gc.log

Page 28: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable garbage collection (GC) log rotation.Java

-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=200M

Page 29: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable G1GC Garbage Collector for RegionServer.Java

-XX:+UseG1GC -XX:MaxGCPauseMillis=100

https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html

Page 30: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Tune G1GC.Java

-XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=8+(logical Processors-8)(5/8) -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=3

https://www.oracle.com/technetwork/articles/java/g1gc-1984535.html

Page 31: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Where do the HBase GC settings go?Configuration

Cloudera Manager: HBase -> Configuration -> SCOPE:RegionServer / CATEGORY:Advanced / Java Configuration Options for HBase RegionServer

Ambari: Service/HBase/Configs -> CONFIGS / ADVANCED / Advanced hbase-env / hbase-env template

Page 32: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Increase the Java Heap of the HBase RegionServer.

Java

CM: Java Heap Size of HBase RegionServer in Bytes: 31 GiBAmbari: HBase RegionServer Maximum Memory: 31 GiB

Page 33: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Increase the Java Heap of the HBase RegionServer.

Java

CM: Java Heap Size of HBase RegionServer in Bytes: 31 GiBAmbari: HBase RegionServer Maximum Memory: 31 GiB

Never set the heap size to values between 32-48 GiB.

https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/

Page 34: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable the HBase BucketCache.HBase

RegionServer Advanced Configuration Snippet (Safety Valve) for hbase-site.xml:

hbase.bucketcache.ioengine: offheap

hbase.bucketcache.size: 32 GiB (or 96 GiB)

hfile.block.cache.size: 0.2

Page 35: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable the HBase BucketCache.HBase

HBase Client Environment Advanced Configuration Snippet for hbase-env.sh:

HBASE_OFFHEAPSIZE=36G (or 100G)

HBASE_OPTS=-XX:MaxDirectMemorySize=36G (100G)

Page 36: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable HBase MultiWAL Support.HBase

hbase.wal.provider: Multiple HDFS WAL

hbase.wal.regiongrouping.numgroups: (numDrives/3)

Page 37: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

HBase Tuning

Enable HDFS Hedged Reads.HDFS

dfs.client.hedged.read.threadpool.size: 20

dfs.client.hedged.read.threshold.millis: 500 milliseconds

Page 38: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

References

● https://impala.apache.org/docs/build/html/topics/impala_scalability.html

● https://impala.apache.org/docs/build/html/topics/impala_partitioning.html

● https://impala.apache.org/docs/build/html/topics/impala_proxy.html

● https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-ga

rbage-collection-for-hbase

● http://gceasy.io/

Page 39: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Thank You

• Thank you

• Questions

• Get in touch with us:

www.clairvoyantsoft.com

Page 40: Insight Case Studies Tuning the Beloved DB-Engines · Insight Case Studies Tuning the Beloved DB-Engines ... Using the buggy RHEL kernel (Linux 2.6.32-504.3.3.el6.x86_64) Multiple

C L A I R V O Y A N T S O F T . C O M

C L A I R V O Y A N T S O F T . C O M

Contact Us

CHANDLER, AZ

SEATTLE, WA

DALLAS, TX

BOSTON, MA

PUNE, INDIA

+1 (623) 282 2385

Nithya Koka

@nithya_koka

https://www.linkedin.com/in/nithyakoka

6185 W Detroit St. Chandler, AZ

Michael Arnold

@hadoopgeek

https://www.linkedin.com/in/michaelarnold