Scale Splunk

Scaling Splunk 101

Quick Overview of Scaling Splunk with Commodity Hardware

Erik SwanOct, 09

** Slides intentionally ugly, no designers were harmed during construction

Splunk (all in one)

Users

Single Server InstallCommodity Architecture

Data from Splunk Forwarders, Syslog, Files, etc.

Simplest Splunk install is a single server that functions as both indexer and search head.

A single box can easily index 100-200G per day, BUT for fast searching its best to use more than one box.

Improving Search and Indexing Performance

Splunk scales search and indexing performance horizontally by adding more indexers and in some cases scaling out a search tier.

By spreading the incoming load across more indexers you index faster.

Perhaps more importantly, by spreading the indexed data across more indexers your search performance improves linearly as well.

Consider that every doubling of hardware will double your index and search performance and don’t be shy of adding 10’s of servers.

RULE #1 – If your searches are slow, add another box!

Spunk Indexer

Users

Adding a Search Head

Splunk Search Head

Data from Splunk Forwarders, Syslog, Files, etc.

By splitting out a Search Head, search performance is improved and load is taken off the indexer for faster indexing.

Best to add sooner than later.

Best for volumes between 5-100G p/day

1 Indexer1 Search Head

Spunk Indexer

Users

Adding a second Indexer

Splunk Search Head

Spunk Indexer

Data from Splunk Forwarders, Syslog, Files, etc.As volume goes up beyond 100G OR you want to improve search performance its best to add a second Indexer.

**Remember adding indexers improves search performance linearly as well.

Best for volumes 20-200G p/day

2 Indexers1 Search Head

Spunk Indexer

Users

Adding additional Indexers

Splunk Search Head

Spunk Indexer Spunk Indexer(n) Indexers

TBs/day from Splunk Forwarders and SyslogFor every new ~100G, or again to improve search performance add another indexer.

RULE #1: If searches are slow, add an another indexer.

For volumes from 200G-1T p/day

Spunk Indexer

Users

Adding additional Indexers

Splunk Search Head


TBs/day from Splunk Forwarders and SyslogFor every new ~100G, or again to improve search performance add another indexer.

RULE #1: If searches are slow, add an another indexer.

For volumes from 200G-1T p/day

Assume 100G p/day:

Use Case : Log archival and some periodic troubleshooting1 Commodity Server

Use Case #2 : Archival, troubleshooting and summary reporting1 Index Server, 1 Search Server

Use Case #3: Archival, Trouble Shooting, and Reporting2 Index Servers, 1 Search Server

Use Case #4: Many ( >2 ) users doing constant use3+ Index Servers, 1 Search Server

Spunk Indexer

Users

Adding additional Search Heads

Splunk Search Head


Load Bal.

Splunk Search Head

TBs/day from Splunk Forwarders and Syslog

(n) Search Heads1~ 4T each p/day

Adding more Search Heads is a convenient way to improve search performance

Add an additional Search Heads when:1. It makes sense to partition

users.2. Too offload summary or

scheduled searches.

Spunk Indexer

Users

Adding additional Search Heads

Splunk Search Head


Load Bal.

Splunk Search Head


(n) Search Heads1~ 4T each p/day

For every new ~TB p/day, add another search head.

For volumes > 2T p/day

(n) Indexers each <100G p/day(m) Search Heads for every ~1T p/day

Assuming a load of 1T p/day:

Use Case #1: Log archival and some periodic troubleshooting4 Index Servers, 1 Search Server

Use Case #2: Archival, trouble shooting and some summary reporting

8+ Index Server, 1 Search Server

Use Case #3: Archival, Trouble Shooting, and Reporting16+ Index Servers, 1 Search Server

Use Case #4: Many ( >2 ) users doing constant use20+ Index Servers, 1 Search Server

Spunk Indexer

Users

Long term storage, add a SAN

Splunk Search Head


Load Bal.

Splunk Search Head

Tier 1 SAN


Long term storage can not be kept on local commodity IO.

If wanting to keep more than can be kept on local indexer disk, splunk can be configured to use SAN or other storage device.

Best for keeping >30 day – multi year data.

Multi-datacenter or deployment

If you have multiple data centers, it is often best to leave the data local and use distributed search between two deployments.

If you have data that naturally partitions such that users would rarely search across the data, partitioning entire deployments can help.

Obviously for DR as well.

Additional Scaling Topics

Summary Indexing – If your searches are slow consider using summary indexing: 1. video - http://www.splunk.com/view/SP-CAAACZW2. docs -

http://www.splunk.com/base/Documentation/4.0.5/User/UseSummaryIndexingForIncreasedReportingEfficiency

Routing High Volume data to Separate Index – If you are searching or reporting on a source that is dwarfed by the volume of another source, you can partition data such that the high volume source is in its own index: 3. docs -

http://www.splunk.com/base/Documentation/latest/Admin/Setupmultipleindexes#Why_have_multiple_indexes.3F

http://www.splunk.com/view/SP-CAAACZW

Scale Splunk

Technology

Transcript of Scale Splunk