Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
-
Upload
lucidworks -
Category
Technology
-
view
523 -
download
0
Transcript of Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Mail Search As A Service Rishi Easwaran
Principal Software Engineer, Aol
3
01Agenda
• Overview
• Multicore Architecture
• Multicore Pain Points
• Hybrid Cloud Architecture
• Hybrid Cloud Benefits
• Search As A Service
• Future Work
• Q & A
4
02Overview
Metadata Storage Index Storage
Mail Core
Bulk Storage
Protocol Handlers (LMTP, PP3,IMAP,SOAP,WCAP,AOL API)
Sharding Layer
Directory Service
5
02Solr Production Metrics
Index Size Range 1KB to 100 GB
Number of hosts > 200
Complex Size > 400 TB
Number of solr indexes > 50 million
Avg updates requests /day > 1.2 billion
Avg search requests /day > 70m
Current Availability 99.99%
6
02Multicore Architecture
http://wiki.apache.org/solr/LotsOfCores
7
02Multicore Architecture Pain Points
• Non Availability of Search.
Ø No backups available for user Solr index.
Ø High re-indexing time.
Ø Disk commits every 5 minutes.
• High variance in response times.
• Large hardware footprint. (> 1000 hosts)
• Load balancing and frequent hot spots required manual intervention.
8
02Benefits of Upgrading Multicore To Solr 4.2
• 75% Reduction in Search response times
• 50% Reduction in Disk busy.
• 15% reduction in CPU usage.
• 50% Reduction in GC total stop time.
Ø Application throughput into the 99.9% range
9
02Solr Hybrid Cloud Architecture
10
02 Cloud Archiving Tool a.k.a (CAT)
• Split Merge logic caused CPU spike and live instance slowness.
• We can passively split and merge.
• We can run split merge on a cloud shard once a day.
• Split merge process can be controlled to off peak/maintenance hours.
• Minimal impact to live production user and system.
11
02Hybrid Cloud Architecture Benefits
• Cost Savings of ~30%
• SSD drives to handle the newest data (Updates 10ms & Searches 50ms)
• NRT availability of indexed message. (1s commit)
• SOLR is not single point of failure in our system
Multicore Hybrid Cloud
Inserts 20ms 7ms
Deletes 20ms 7ms
Searches 60ms 55ms
12
02Hybrid Cloud Architecture Pain Points
• Disk Space Issues
Ø Deleted document clean up and recovery of SSD space. http://lucene.472066.n3.nabble.com/Solr-Cloud-reclaiming-disk-space-from-deleted-documents-td4200506.html
Ø Multiple index.timestamp directories filling up SSD space. http://lucene.472066.n3.nabble.com/Multiple-index-timestamp-directories-using-up-disk-space-td4201098.html
Ø 60% free space required for optimal operation
• Solr Overseer Node & Overseer overflow issue.
• CloudSolrJ 10s hardcoded ZK timeout at initialization.
• Search Infrastructure tightly coupled with Mail System
Ø The external clients should not care about the underlying indexing service.
13
02Search As A Service
14
02Service Rest API’s <host>:<port>/addDocument?id=<id>&applicationId=<id>&responseFormat=xml/json&document=<doc> xml <response> <statusCode/> <statusText/> <statusDetailText/> </response> json {"response":{"statusCode":"200","statusText":"OK","statusDetailText":"Details"}}
15
02 Issues
• Dispersion of problem
• Slow node impact felt at a broader scale
Solution • Tracing a user request across multiple systems (Specific exception logging).
• Incorporate Hystrix in HttpSolrServer for latency and fault tolerance
16
02 Future work
• Solr Cloud Cross Data Centre Deployment.
• Upgrade to Solr 5.3.1 and remove overlapping customization.
• Fault Tolerance Tuning for different sub-systems.
• Attachment analysis and indexing with Tika.