Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

19
| Log management as a service Simplify Log Management Apache Storm Why Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management Infrastructure Engineering Team June 2014

description

Agenda for this Presentation • The challenges of Log Management at scale • Overview of Loggly’s processing pipeline • Alternative technologies considered • Why we love Apache Kafka • How Kafka has added flexibility to our pipeline  The Challenges of Log Management at Scale • Big data – >750 billion events logged to date – Sustained bursts of 100,000+ events per second – Data space measured in petabytes • Need for high fault tolerance • Near real-time indexing requirements • Time-series index management

Transcript of Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

Page 1: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Apache Storm

Why Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management Infrastructure Engineering Team June 2014

Page 2: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

World’s most popular cloud-based log management service

§  More than 5,000 customers §  Near real-time indexing of events

Distributed architecture, built on AWS

Initial production services in 2011 §  Loggly Generation 2 released in Sept 2013

What Loggly Does

Page 3: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Centralized logging and archival

§  Real-time processing, analysis and visualization

§  Monitoring, alerting and troubleshooting

Loggly: Addressing the first big data problem every company faces

Page 4: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  The challenges of Log Management at scale

§  Overview of Loggly’s processing pipeline

§  Alternative technologies considered

§  Why we love Apache Kafka §  How Kafka has added flexibility to our pipeline

Agenda for this Presentation

Page 5: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Big data –  >750 billion events logged to

date –  Sustained bursts of 100,000+

events per second –  Data space measured in

petabytes

§  Need for high fault tolerance §  Near real-time indexing

requirements §  Time-series index

management

The Challenges of Log Management at Scale

Page 6: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Load Balancing

Kafka Stage

2

Loggly Custom Module

Log Management Processing Pipeline: Overview

Page 7: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Load Balancing

Kafka Stage

2

Loggly Custom Module

Collectors Can Easily Outpace Downstream Processes

§  Written in C++ §  Designed to ingest

massive data volumes §  Need to collect

regardless of what’s happening downstream

Page 8: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Load Balancing

Kafka Stage

2

Loggly Custom Module

Solution: Queue That’s External to Collector

§  Based on Apache Kafka

§  Highly performant and reliable

Page 9: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Internal buffering in collectors –  Added complexity

§  Cassandra –  Not as good a queue as Kafka

§  Apache Storm –  In initial Gen2 architecture, removed after launch

Alternate/ Supplementary Approaches Considered

Page 10: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Results: §  Can process sustained rates of

100,000+ events per second per cluster §  Average message 300 bytes

The Secret to Log Management at Scale: Keep It Simple, Stupid

Page 11: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Why We Love Kafka

Page 12: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

What Attracted Us in the First Place

No single point of failure

•  Terabytes  of data move through our Kafka cluster every day without losing a single event

•  We use age-based retention to purge old data on disks Low latency •  99.99999% of the time our data is coming from disk

cache and RAM; only very rarely do we hit disk Performance •  Crazy good!

•  We currently have a bunch of Kafka brokers running on m2.xlarge instances backed by provisioned IOPS.

•  One of consumer group (eight threads) which maps a log to a customer can process about 200,000 events per second draining from 192 partitions spread across three brokers

Scalability •  Ability to increase partition count per topic and downstream consumer threads provides flexibility to increase throughput when desired

Page 13: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

How Our Kafka Crush Has Deepened

Distributed log collection

•  Local pods and collectors spread all over the Internet with local Kafka deployments to collect data from customers located all over world

•  Can collect logs even when we lose connectivity •  When network comes back, Kafka sends the logs

downstream to the rest of the pipeline More efficient, effective DevOps

•  Deploying Kafka throughout pipeline makes it easy to disable certain parts of system (for troubleshooting or upgrades)

•  No worrying that we will lose customer data •  Example: Add support for new log type into our

automatic parsing capabilities by turning off existing parser, deploying new one, and processing logs that Kafka has queued up

Controlling resource utilization

•  Keep collectors as simple as possible for resilience and reliability reasons

•  Add intelligence into our pipelines using Kafka

Page 14: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Resource Utilization Example: “Noisy Neighbors”

Page 15: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Sending many times their “normal” level of logging volume, inadvertently or because their application is in big trouble

§  Routing logs to separate queue minimizes impact on other customers

“Noisy Neighbors” are Inherent to SaaS

Page 16: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Because Kafka topics are very cheap from a performance and overhead standpoint, we can create as many queues as we want §  Scaled to the performance we want §  Optimizing resource utilization across the system

§  Because they can be created dynamically, we can make business rules very flexible

§  Makes us confident that pipeline will scale as customer data volumes do

Kafka Queues Add Flexibility to Loggly Pipeline

Page 17: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

§  Kafka deployment working without us thinking about it

§  Plenty of other things to do to keep our position as the world’s most popular cloud-based log management service!

Conclusion: Kafka Frees Our Development Team to Build Differentiating Features

Page 18: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Does Log Management Sound Hard? It Should!

About Us: Loggly is the world’s most popular cloud-based log management solution, used by more than 5,000 happy customers to effortlessly spot problems in real-time, easily pinpoint root causes and resolve issues faster to ensure application success.

Let us do the heavy lifting for you!

Visit us at loggly.com or follow @loggly on Twitter.

Try Loggly FREE for 30 days

Page 19: Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

| Log management as a service Simplify Log Management

Did you like this presentation?

Head over to our blog for more great content!

Take me to the Loggly Blog