SF ElasticSearch Meetup 2013.04.06 - Monitoring
-
Upload
sushant-shankar -
Category
Technology
-
view
791 -
download
1
description
Transcript of SF ElasticSearch Meetup 2013.04.06 - Monitoring
Monitoring tools for ElasticSearch
SF Meetup2013.03.06
Sushant ShankarShyam Kuttikkad
• Why and how we use ElasticSearch• Monitoring– Tools– Index Building– Query Performance
Who is asdfas• Social Sharing and Content Discovery platform
– We help >600,000 publishers with content distribution, user engagement, and advertising monetization
– 450 Fortune 1000 brand marketers leverage our unique social signals to deliver impactful advertising
• We develop Machine Learning algorithms operating on Big Data to:– Provide content sharing insights to Publishers– Build customized audience segments for advertising campaigns– Extract actionable insights out of social and interest data
www.33Across.comwww.tynt.com
Data firehose of 30B monthly events, 1.25B cookies
- Interaction with web content- Shares – images, copies- Searches
Social AudiencesBehaviorContextKnowledge
Real-time view
Build, understand,analyze
ElasticSearch!
Production ElasticSearch cluster
Build index using MR job and Bulk API
Hardware6 nodes, 24GB RAM16GB for ES service 4 cores3x 1.5TB drive
Index>1TB/index (replicated) ~300M documents~5KB / document~3 hours
System monitoring using Zabbix
Index Build
ElasticSearch specific monitoring using SPM
Scalable Performance Monitoring (http://sematext.com/spm/index.html)
• Index stats – Total/Refreshed/Merged documents• Shards – Total/Active/Relocating/Initializing• Search - Request rate and latency• Cache – {Filter, field} cache {count, evictions, size}• Machine – CPU, Memory, JVM, GC, Network, Disk
Index Building Optimization using Zabbix and SPM
Amount bulk indexed
# Shards
Time takenCPU util.
Mem util.Disk I/ONetwork
in practice…
Debugging and Validating using SPM
Index Building: Learnings
• 2 shards / CPU• 10,000 documents (users) per indexing
request
• Bulk API for our use case• No replicas• Refresh off (index.refresh_interval = -1)
Query Performance: Learnings
• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users
QUERIES?
Sushant [email protected]
Shyam [email protected]
Why we really need a search engine
… …
Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.)
Warm Up: load into memory and cache
Other cool features
• Custom Scoring functions• Scripts – MVEL, Python• Facets
• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships