Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

21
Search | Discover | Analyze Confidential and Proprietary © Copyright 2013 Deploying and Managing SolrCloud in the Cloud ApacheCon, April 8, 2014 Timothy Potter

description

SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter.

Transcript of Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Page 1: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Search | Discover | Analyze

Confidential and Proprietary © Copyright 2013

Deploying and Managing SolrCloud in the CloudApacheCon, April 8, 2014Timothy Potter

Page 2: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

My SolrCloud Experience

• Currently, working on scaling up to a 200+ node deployment at LucidWorks

• Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18 shards ~900M docs)

• Contributed several tests and patches to the code base

• Built a Fabric/boto framework for deploying and managing a cluster in EC2

• Co-author of Solr In Action; wrote CH 13 which covers SolrCloud

Page 3: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Solr Scaling Toolkit

• Requirements• High-level overview• Nuts and Bolts (live demo)• Roadmap• Q&A

Page 4: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• Provisioning N machine instances in EC2• Configuring / starting ZooKeeper (1 to n

servers)• Configuring / starting N Solr instances in

cloud mode (M x N nodes)• Integrating with Logstash4Solr and other

supporting services, e.g. collectd• Day-to-day operations on an existing cluster

Tasks to Automate

Page 5: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Python-based Tools

boto – Python API for AWS (EC2, S3, etc)Fabric – Python-based tool for automating system admin tasks over SSHpysolr – Python library for Solr (sending commits, queries, ...)kazoo – Python client tools for ZooKeeper

Supporting Cast:JMeter – run tests, generate reportscollectd – system monitoringLogstash4Solr – log aggregationJConsole/VisualVM – monitor JVM during indexing / queries

Page 6: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Fabric in 3 minutes or Less ...

Fabric helps you do common system administration tasks on multiple hosts over SSH ...• Just Python• Easy to install / learn; good documentation• http://docs.fabfile.org/en/1.8/

def kill(cluster): ec2 = _connect_ec2() taggedInstances = _find_instances_in_cluster(ec2, cluster) instance_ids = taggedInstances.keys() if confirm(('Found %d instances to terminate, continue? ' % len(instance_ids))): ec2.terminate_instances(instance_ids) ec2.close()

Page 7: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Fabric in 3 minutes or Less, cont. ...

• Define all commands in a file named: fabfile.py• Get a list of supported commands with short

description

• Get extended documentation for a command

$ fab -lAvailable commands: backup_to_s3 Backup an existing collection to S3 check_zk Performs health check against all ... commit Sends a hard commit to the ... ...

$ fab -d new_solr_cloudDisplaying detailed information for task 'new_solrcloud’: Provisions n EC2 instances and then deploys SolrCloud; uses the new_ec2_instances and setup_solrcloud commands ...

Page 8: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Meta Node

SiLK

SolrCloud Nodes (NxM nodes)Node 1: Custom AMI

...

...

Solr Node 1: 8983...

core

core

Solr Node N: 898x...

core

core

M of these machines

system monitoringof M machines w/collectd and JMX

deploy and manage SolrCloud cluster

Solr-Scale-Toolkit

ZooKeeper-1

ZK Host 1

ZooKeeper-N

ZK Host N

ZooKeeper Ensemble

...

Solr Scale Toolkit: Architecture

Page 9: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Solr Scale Toolkit: Demo

• Launch a meta node– Log agg / basic monitoring using SiLK

• Launch ZooKeeper Ensemble– 3 nodes to establish quorum– Setup cron job to clean-up snapshots

• Launch SolrCloud cluster• Create new collection and index some docs

– Attach JConsole while indexing• Run a healthcheck on the collection• Checkout Banana Dashboard• Backup / Restore

– Requires patch for SOLR-5956– Use fab patch_jars to update jars and do a rolling restart

Page 10: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• Custom built AMI?

• Block device mapping– dedicated disk per Solr node

• Launch and then poll status until they are live – verify SSH connectivity

• Tag each instance with a cluster ID and username

Provisioning machines

fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge

Page 11: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• Two options:– provision 1 to N nodes when you launch Solr cluster– use existing named ensemble

• Fabric command simply creates the myid files and zoo.cfg file for the ensemble– and some cron scripts for managing snapshots

• Basic health checking of ZooKeeper status:– echo srvr | nc localhost 2181

ZooKeeper

fab new_zk_ensemble:zk1,n=3

Page 12: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

SolrCloud Cluster: NxM nodes

EC2 Instance: RedHat Enterprise Linux, 64-bit

Solr 4.7.1 Node 1

MM

apDi

rect

ory

dedicateddisk 1

Limit to 50-100M docs across all cores per node

Solr 4.7.1 Node NM

Map

Dire

ctor

y

...

dedicateddisk N

... x M instances

OScache

memorymapped

I/O

collection1shard1 / replica1

(Solr core)... collection2

shard2 / replica1(Solr core)

collection3shard1 / replica1

(Solr core)... collection1

shard2 / replica1(Solr core)

...

Must design to give bulk ofthe memory to OS cache

Page 13: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• Upload a BASH script that starts/stops Solr• Set system props: jetty.port, host, zkHost,

JVM opts• One or more Solr nodes per machine• JVM mem opts dependent on instance type

and # of Solr nodes per instance• Optionally configure log4j.properties to

append messages to Rabbitmq for Logstash4Solr integration

SolrCloud

fab new_solrcloud:test1,zk=zk1,nodesPerHost=2

Page 14: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• BASH script that implements:– start/stop Solr nodes on each EC2 instance– sets JVM memory options, system properties

(jetty.port), enable remote JMX, etc– backup log files before restarting nodes– ensure JVM is killed correctly before restarting

• Environment variables in:solr-ctl-env.sh

solr-ctl.sh

Page 15: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• Deploy a configuration directory to ZooKeeper• Create a new collection• Attach a local JConsole/VisualVM to a remote JVM• Rolling restart (with Overseer awareness)• Build Solr locally and patch remote

– Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes from within the network

• Put/get files• Grep over all log files (across the cluster)

Miscellaneous Utility Tasks

Page 16: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• fab mine: See clusters I’m running (or for other users too)

• fab kill_mine: Terminate all instances I’m running– Use termination protection in production

• fab ssh_to: Quick way to SSH to one of the nodes in a cluster

• fab stop/recover/kill: Basic commands for controlling specific Solr nodes in the cluster

• fab jmeter: Execute a JMeter test plan against your cluster– Example test plan and Java sampler is included with the source

Other useful stuff ...

Page 17: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

• Java-based command-line application that uses SolrJ’s CloudSolrServer to perform advanced cluster management operations:– healthcheck: collect metadata and health information

from all replicas for a collection from ZooKeeper– backup: create a snapshot of each shard in a collection

for backing up to remote storage (S3)• Framework for building complex tools that benefit

from having access to cluster state information in ZooKeeper

SolrCloud Tools (SolrJ client app)

./tools.sh –tool healthcheck

Page 18: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

SiLK Integration

• SiLK: Solr integrated with Logstash and Kibana– Index time-series data, such as log data (collectd, Solr

logs, ...)– Build cool dashboards with Banana (fork of Kibana)

• Easily aggregate all WARN and more severe log messages from all Solr servers into logstash4solr

• Send collectd metrics to logstash4solr

Page 19: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

SiLK Integration

Solr Node 1: 8983...

core

core

AMQPLog4J

Appender

logstash4solr

logstash4solrindex

parsing/indexing

decouplelog write

performancefrom log indexing

Ad hoc loganalysis

Solr Node N: 8983...

core

core

...many of these

Log Records Include:- host:port- collection- shard- test label+ standard Log4J message fields

MQ

bananaDashboard

Page 20: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

What’s Next?

• Migrate to using Apache libcloud instead of using boto directly

• Use this framework to perform large-scale performance testing– Report results back to community

• Ability to request spot instances– Good for testing only

• Chaos monkey tests– integrate jepsen?

• Open source so please kick the tires!

Page 21: Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit

Confidential and Proprietary © Copyright 2013

Wrap-up

• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk• LucidWorks: http://www.lucidworks.com• SiLK: http://www.lucidworks.com/lucidworks-silk/• Solr In Action: http://www.manning.com/grainger/• Connect: @thelabdude / [email protected]

Questions?