Cassandra Day Atlanta 2016 - Monitoring Cassandra

Post on 15-Apr-2017

607 views 2 download

Transcript of Cassandra Day Atlanta 2016 - Monitoring Cassandra

CASSANDRA DAY ATLANTA 2016

MONITORING CASSANDRA

Aaron Morton@aaronmorton

CEO

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Apache Cassandra Committer and DataStax MVPs.

Based in New Zealand, Australia, France & USA.

MetricsMonitoring & Alerting

Insights

codehale / yammer / drop wizard

Metrics<dependency groupId=“io.dropwizard.metrics" artifactId=“metrics-core" version="3.1.0" />

Metrics

Seperate Collection from Reporting.

Metrics Collection

Metrics are always collected.

Metrics

Metrics have a dotted notation name, timestamp, and

value e.g.com.thelastpickle.presenters.count=2

Metric Types

Gauge.

A simple value.

Metric Types

Ratio Gauge.

A ratio between two values.

Metric Types

Histograms.

The distribution of values in a stream of data.

Histograms

Quantiles (e.g. 75th, 95th) calculated using reservoir

sampling.(Check docs.)

Histograms

Default Exponentially Decaying Reservoirs, (roughly) the last five

minutes of data, exponential weighting towards newer data.

(Check docs.)

Metric Types

Meter

Measures the per second rate at which a set of events occur.

Meter

Three different exponentially-weighted moving average rates: 1, 5, and 15 minutes

Metric Types

Timer.

Histogram of duration and rate of events .

Reporting

Reporters run in the Cassandra process, pushing

metrics to external services.

Reporters

ConsoleReporter, GraphiteReporter, InfluxDBReporter, RiemannReporter,

Reporters In Cassandra

Configuration file:

metrics-reporter-config-sample.yaml

Reporters In Cassandragraphite: - period: 10 timeunit: 'SECONDS' prefix: 'cassandra.prod.ip_1_2_3_4.' hosts: - host: '1.2.3.4' port: 2003 predicate: color: "white" useQualifiedName: true patterns: - "^org.apache.cassandra.metrics.+"

metrics-reporter-config

Configures Metrics reporters.

github.com/addthis/metrics-reporter-config

metrics-reporter-config

Supports:

GangliaGraphiteRiemann

JMX

Cassandra creates JMX MBeans for each Metric.

JMX

Reporters

Reporters may change the name of measures, e.g.95thPercentile == p95

MetricsMonitoring & Alerting

Insights

Monitoring and Alerting

Use what you like and what works for you.

Monitoring Platforms

OpsCentre, Grafana & Graphite, DataDog, Riemann

MetricsMonitoring & Alerting

Insights

Names ?

All under

org.apache.cassandra.metrics

Scale ?

Latency? microsecondsRates? per second

Data? bytes

Percentiles ? 75thPercentile 95thPercentile 99thPercentile

Rates ? OneMinuteRate

Request Throughput - All RequestsClientRequest.

$REQUEST.Latency.1MinuteRate

CASRead, CASWrite, RangeSlice, Read, ViewWrite,

Write

A Note On Requests

We will focus onRead, Write

But there are othersCAS*, RangeSlice, ViewWrite

Request Throughput - Per TableTable.$KEYSPACE.$TABLE.

ReadLatency.1MinuteRate WriteLatency.1MinuteRate

Request Latency - All RequestsClientRequest.

Write.Latency.95percentile Read.Latency.95percentile

Request Latency - Per TableTable.$KEYSPACE.$TABLE.

CoordinatorReadLatency.95percentile

Local Latency - Per TableTable.$KEYSPACE.$TABLE.

WriteLatency.95percentile ReadLatency.95percentile

Local Read PathTable.$KEYSPACE.$TABLE.

KeyCacheHitRate.value BloomFilterFalseRatio.value

LiveScannedHistogram.95percentile TombstoneScannedHistogram.95percentile SSTablesPerReadHistogram.95percentile

Memory UsageTable.$KEYSPACE.$TABLE.

BloomFilterOffHeapMemoryUsed.value IndexSummaryOffHeapMemoryUsed.value

MemtableOnHeapSize.value MemtableOffHeapSize.value

ClientsClient.connnectedNativeClients.value

CQL.PreparedStatementsRatio.value

CQL.PreparedStatementsEvicted.value

Client ErrorsClientRequest.

$REQUEST.Unavailables.1MinuteRate $REQUEST.Timeouts.1MinuteRate $REQUEST.Failures.1MinuteRate

InconsistencyStorage.TotalHints.count

HintedHandOffManager. Hints_created-$IP_ADDRESS.count

Connection.TotalTimeouts.1MinuteRate Connection.$IP_ADDRESS.Timeouts.

1MinuteRate

Inconsistency

Will also want to monitor dropped messages, later…

Eventual ConsistencyReadRepair.Attempted.1MinuteRate

ReadRepair.RepairedBackground.1MinuteRate

ReadRepair.RepairedBlocking.1MinuteRate

Server ErrorsStorage.Exceptions.count

Disk UsageStorage.Load.count

Table.$KEYSPACE.$TABLE. TotalDiskSpaceUsed.count

CompactionsCompaction.PendingTasks.value

Compaction.TotalCompactionsCompleted.1MinuteRate

Table.$KEYSPACE.$TABLE.PendingCompactions .value

Thread Pool PerformanceThreadPools.request.

MutationStage.PendingTasks.value ReadStage.PendingTasks.value

CounterMutationStage.PendingTasks.value RequestResponseStage.PendingTasks.value

ViewMutationStage.PendingTasks.value

Thread Pool PerformanceDroppedMessage.

MUTATION.Dropped.1MinuteRate READ.Dropped.1MinuteRate

Thread Pool PerformanceDroppedMessage.

$VERB.InternalDroppedLatency .95thPercentile

$VERB.CrossNodeDroppedLatency .95thPercentile

Commit Log PerformanceCommitLog.

PendingTasks.Value

WaitingOnSegmentAllocation.95thPercentile

WaitingOnCommit.Value

Thanks.

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com