How to make a simple cheap high availability self-healing solr cluster

26
Lucene revolu+on 2013 SIMPLE & “CHEAP” SOLR CLUSTER Stéphane Gamard Searchbox CTO 1 Lucene revolu+on 2013

description

Presented by Stephane Gamard, Chief Technology Officer, Searchbox In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shardes, and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster. We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.

Transcript of How to make a simple cheap high availability self-healing solr cluster

Page 1: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

SIMPLE & “CHEAP” SOLR CLUSTER

Stéphane GamardSearchbox CTO

1Lucene  revolu+on  2013

Page 2: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

BOOK GIVE-AWAYMail to: [email protected]: [book-away]

2Lucene  revolu+on  2013

Page 3: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 3

Searchbox  -­‐  Search  as  a  Service

“We  are  in  the  business  of  providing  search  engines  on  demand”

Page 4: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

Solr  Provisioning

4

High  Availability• Redundancy• Sustained  QPS• Monitoring• Recovery

Index  Provisioning• Collec+on  crea+on• Cluster  resizing• Node  distribu+on

Page 5: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

Solr  Clustering

5

LB

Master

Slave

Slave

Master

Slave

Backup Backup

Master

Slave

Slave

LB

Monitoring

Before  4.x:

Master/SlaveCustom  Rou+ngComplex  Provisioning

Page 6: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

Solr  Clustering

6

A6er  4.x:

NodesAutoma+c  Rou+ngSimple  Provisioning

Node

Monitoring

Node Node Node

ZK

NodeNode Node

ZK

ZKLB LB

Thank  you    to  the  SolrCloud  Team  !!!

Page 7: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

What  is  SolrCloud?

7

Backward  compa=bility• Plain  old  Solr  (with  Lucene  4.x)• Same  schema• Same  solrconfig• Same  plugins

Some  plugins  might  need  update  (distrib)

Page 8: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

What  is  SolrCloud?

8

Centralized  configura=on

• /conf

• /conf/schema.xml

• /conf/solrconfig.xml

• numShards

• replica+onFactor

• ...

Node

Monitoring

Node Node Node

ZK

NodeNode Node

ZK

ZKLB LB

Page 9: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

What  is  SolrCloud?

9

Configura=on  &  Architecture  Agnos=c  Nodes

Node

Monitoring

Node Node Node

ZK

NodeNode Node

ZK

ZKLB LB

• ZK  driven  configura+on

• Shard  (1  core)

• ZK  driven  role:

• Leader

• Replica

• Peer    &  Replica+on

• Disposable

Page 10: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

What  is  SolrCloud?

10

Automa=c  Rou=ng

Node

Monitoring

Node Node Node

ZK

NodeNode Node

ZK

ZKLB LB

• Smart  client  connect  to  ZK

• Any  node  can  forward  a  requests  to  node  that  can  process  it

Page 11: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

What  is  SolrCloud?

11

Collec=on  API• Abstrac+on  level• An  index  is  a  collec+on• A  collec+on  is  a  set  of  shards• A  shard  is  a    set  of  cores

• CRUD  API  for  collec+on

“Collec?ons  represents  a  set  of  cores  with  iden)cal  configura?on.  The  set  of  cores  of  a  collec?on  covers  the  en?re  index”

Page 12: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

What  is  SolrCloud?

12

Node

Core

Shard

Collec=on Abstrac+on  level  of  interac+on  &  config

Scaling  factor  for  collec+on  size  (numShards)

Scaling  factor  for  QPS  (replica?onFactor)

Scaling  factor  for  cluster  size  (liveNodes)

=>  SolrCloud  is  highly  geared  toward  horizontal  scaling

Page 13: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 13

nodes  =>  Single  effort  for  scalability  

That’s  SolrCloud

High  Availability• Redundancy• Sustained  QPS• Monitoring• Recovery

#  replicas

ZK  (clusterstatus,  livenodes)peer  &  replica+on

#  replicas  &  #  shards

Page 14: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 14

Collection

Shards

Cores

Nodes

SolrCloud  -­‐  Design

Key  metrics• Collec+on  size  &  complexity• JVM  requirement• Node  requirement

Page 15: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 15

SolrCloud  -­‐  Collec+on  Metrics

Pubmed  Index• ~12M  documents• 7  indexed  fields• 2  TF  fields• 3  sorted  Fields• 5  stored  Fields

Page 16: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 16

A  note  on  sharding “The  magic  sauce  of  webscale”

Ram  requirement  effect

!"

#!!!"

$!!!"

%!!!"

&!!!"

'!!!"

(!!!"

!" $" &" (" )" #!" #$"

!"#$%$&'()*$

# shards

ram

Page 17: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 17

A  note  on  sharding “The  magic  sauce  of  webscale”

Disk  requirement  effect

!"

#"

$!"

$#"

%!"

%#"

&!"

&#"

'!"

'#"

#!"

!" %" '" (" )" $!" $%" $'" $("

!"#$%&%#'()*%

# shards

disk

spa

ce

“hidden  quote  for  the  book”

Page 18: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 18

SolrCloud  -­‐  Collec+on  Configura+on

Pubmed  Index• ~12M  documents• 7  indexed  fields• 2  TF  fields• 3  sorted  Fields• 5  stored  Fields

Configura=on• numShards:  3• replica+onFactor:  2• JVM  ram:  ~3G• Disk:  ~15G

Page 19: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 19

SolrCloud  -­‐  Core  Sizing

Heuris=cally  inferred  from  “experience”• Size  on  shard,  not  collec+on• Do  NOT  starve  resources  on  nodes• Senle  for  JVM/Disk  sizing  • Large  amount  of  spare  disk  (op+mize)

RAM Disk3  G 60  G

Page 20: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 20

SolrCloud  -­‐  Cluster  Availability

Depends  on  the  nodes!!!Instance ram disk $/h Nodes Min Size $/core/m

m1.medium 3.75 410 0.12 1 6 6 87

m1.large 7.5 850 0.24 2 6 12 87

m1.xlarge 15 1690 0.48 5 6 30 70

m2.xlarge 17.1 420 0.41 5 6 30 60

m2.2xlarge 34.2 850 0.82 11 6 66 54

m1.medium 3.75 410 0.12 3 6 18 28

CCtrl  (paas) 1.02 420 -­‐ 1 6 6 75( )

Page 21: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 21

SolrCloud  -­‐  Monitoring

Solr  Monitoring• clusterstate.json• /livenodes

Node  Monitoring  *• load  average• core-­‐to-­‐resource  consump+on  (Core  to  CPU)• collec+on-­‐to-­‐node  consump+on  (LB  logs)

Page 22: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 22

SolrCloud  -­‐  Provisioning

Stand-­‐by  nodes• Automa+cally  assigned  as  replica• provides  a  metric  of  HA

Node  addi=on  *  (self  healing)• Scheduled  check  on  cluster  conges+on• Automa+cally  spawn  new  nodes  per  need

Page 23: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 23

SolrCloud  -­‐  Conclusion

Using  SolrCloud  is  like  juggling• Gets  bener  with  prac+ce• There  is  always  some  magic  leq• Could  become  very  overwhelming• When  it  fails  you  loose  your  balls

Test  -­‐>  Test  -­‐>  Test  -­‐>  some  more  Tests  -­‐>  Test

Page 24: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 24

What  would  make  our  current  SolrCloud  cluster  even  more  awesome:• Balance/distribute  core  based  on  machine  load

• Standby  core  (replicas  not  serving  request  and  auto-­‐shurng  down

Next  Steps

Page 25: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013 25

Requirement  for  solrCloud:

• Solr  Mailing  list:  solr-­‐[email protected]

Further  informa+on

• blogs  &  feed:  hnp://www.searchbox.com/blog/• Searchbox  email:  [email protected]

Further  Informa+on

Page 26: How to make a simple cheap high availability self-healing solr cluster

Lucene  revolu+on  2013

CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge gets you in the door

TOMORROW Breakfast starts at 7:30Keynotes start at 8:30

CONTACTStephane [email protected]

26Lucene  revolu+on  2013