Conf2014_SearchHeadClustering

42
Copyright © 2014 Splunk Inc. Mustafa Ahamed Director, Product Management Eric Woo Senior SoEware Engineer Anirban Rahut Senior SoEware Engineer What’s New In Search Head Clustering

Transcript of Conf2014_SearchHeadClustering

Page 1: Conf2014_SearchHeadClustering

Copyright  ©  2014  Splunk  Inc.  

Mustafa  Ahamed  Director,  Product  Management  

Eric  Woo  Senior  SoEware  Engineer  

Anirban  Rahut  Senior  SoEware  Engineer          

                     

What’s  New  In  Search  Head  Clustering  

Page 2: Conf2014_SearchHeadClustering

Disclaimer  

2  

During  the  course  of  this  presentaLon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauLon  you  that  such  statements  reflect  our  current  expectaLons  and  

esLmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaLon  are  being  made  as  of  the  Lme  and  date  of  its  live  presentaLon.  If  reviewed  aEer  its  live  presentaLon,  this  presentaLon  may  not  contain  current  or  accurate  informaLon.  We  do  not  assume  any  obligaLon  to  update  any  forward-­‐looking  statements  we  may  make.  In  addiLon,  any  informaLon  about  our  roadmap  outlines  our  general  product  direcLon  and  is  subject  to  change  at  any  Lme  without  noLce.  It  is  for  informaLonal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obligaLon  either  to  develop  the  features  or  funcLonality  described  or  to  

include  any  such  feature  or  funcLonality  in  a  future  release.  

Page 3: Conf2014_SearchHeadClustering

Agenda  

!   What  is  Search  Head  Clustering?  !   Business  Benefits  of  Search  Head  Clustering  !   SHC  ConfiguraLon  /  ReplicaLon  !   App  Deployment  !   Tips  and  Tricks    

–  MigraLon    

!   Q&A    

3  

Page 4: Conf2014_SearchHeadClustering

Search  Head  Clustering    

Ability  to  group  search  heads  into  a  cluster  in  order    to  provide      Highly  Available  and  Scalable  search  services  

4  

MISSION  CRITICAL  ENTERPRISE  

Page 5: Conf2014_SearchHeadClustering

5  

 Horizontal  Scaling    

   

 Always-­‐on  Search  Services    

 

 Consistent  User  Experience  

 

 Easy  to  add  /  manage  

premium  contents  (apps)  

Business  Benefits  of  SHC  

Page 6: Conf2014_SearchHeadClustering

SHP      vs.      SHC  SHC  SHP  

•  Available  since  v4.2  •  Sharing  configuraLons  through  NFS  •  Single  point  of  failure  •  Performance  issues  

•  No  NFS  •  ReplicaLon  using  local  storage  •  Commodity  hardware  

6  

NFS  

Page 7: Conf2014_SearchHeadClustering

   

1.  No  Single  Point  of  Failures  

2.  “One  ConfiguraLon”  across  SH  

3.  Horizontal  Scaling  

7  

1.  Dynamic  Captain  2.  AutomaLc  Config    

replicaLon  across  SH  3.  Ability  to  add/remove  

nodes  on  running  cluster  

Design  Goals   ImplementaLon  

Page 8: Conf2014_SearchHeadClustering

SHC  –  How  Does  it  Work?  

8  

1  Search  Head  gets  the  peer  list  from  Cluster  Master  1  Search  Head  gets  the  peer  list  from  Cluster  Master  1  Search  Head  gets  the  peer  list  from  Cluster  Master  

1.  Group  search  heads  into  a  cluster  2.  A  captain  gets  elected  dynamically  3.  User  created  reports/dashboards  automaLcally  

replicated  to  other  search  heads  

1   2  

3  

Page 9: Conf2014_SearchHeadClustering

Anatomy  &  Cluster  Bring  up  

Page 10: Conf2014_SearchHeadClustering

Search  Head  Cluster  Bring  Up  

10  

captain  

config-­‐log  {s1,s2,  ...,  sn}  

•  Bootstrap  captain  •  Bring-­‐up  members  •  Captain  establishes  authority  •  Members  join/register  •  CLI  based  cluster  scale/shrink  

     ...  

members  

Page 11: Conf2014_SearchHeadClustering

Job  Scheduling  

Page 12: Conf2014_SearchHeadClustering

Use  Case  

12  

!   Scale  search  capacity  !   Enable  more  reports,  dashboards,  alerts  !   Load  balance  user  sessions  (onboarding)  

Page 13: Conf2014_SearchHeadClustering

•  Captain  is  job  scheduler  •  Eliminates  job-­‐server  need  •  Load-­‐based  heurisLc  

Job  Scheduling  OrchestraLon  

13  

captain  

scheduler  

     ...  search        1  

search        2  

LOAD  

SUCC  

FAIL  

load  balancer  

search      3    

Page 14: Conf2014_SearchHeadClustering

Details  

14  

!   Captain  updates  RA/DM  summaries  on  indexers.  !   Scheduler  limits  honored  across  the  cluster  !   Real  Lme  scheduled  searches  run  one  instance  across  cluster  !   Auto-­‐failover  –  New  captain  becomes  scheduler  !   captain_is_adhoc_searchhead  knob  to  reduce  captain  load  

Page 15: Conf2014_SearchHeadClustering

Alerts  &  Suppression  

15  

!   Alerts  fired  when  results  of  search  meet  alerLng  criteria  –  Historical  Searches  –  within  10  seconds  aEer  job  completes  –  RealTime  searches  –  ongoing  basis  

!   Captain  merges  and  maintains  global  view  of  alerts  !   Suppression  informaLon  centralized  by  the  captain  !   Merged  Alerts/Suppression  sent  back  to  members  

Page 16: Conf2014_SearchHeadClustering

High  Availability  of  Search  Results  

Page 17: Conf2014_SearchHeadClustering

Search  Results  primer  

17  

Search  -­‐  HEAD  indexers  

stream  results  

reduce   map  

Other  Names  1.  search  results  2.  search  arLfact  3.  dispatch  directory  4.  SID  

$SPLUNK_HOME/var/run/splunk/dispatch/scheduler__admin__search__mysearch_at_1410708600_345  

sourcetype  =  access_combined    |  stats  count  by  clienLp  

Dispatch  dir  needs  to  be  replicated  to    mulCple  nodes  to  tolerate  node  failures  

Page 18: Conf2014_SearchHeadClustering

ArLfact  ReplicaLon  

18  

     ...  succ  

succ  

•  Captain  orchestrates  replicaLon  •  Default  replicaLon_factor=3  •  Success  failure  ACK’d  to  captain  •  Asynch  Replicate  on  Proxy    •  ReplicaLon  policy  enforced  by  fixups   replica-­‐1    

replica-­‐2    

succ  captain  

replicate  

orig  

Page 19: Conf2014_SearchHeadClustering

     ArLfact  Proxy-­‐ing  

19  

!   ReplicaLon  Guarantees  HA&DR  but...  !   SID  not  available  on  all  nodes  *locally*  !   RealTime  searches  are  not  replicated  !   We  use  proxy-­‐ing  to  fill  these  gaps  !   Proxying  on  REST  request  

 

captain  over  HB  

locaLon    log  

r1  

proxy  

orig  

AuthenCcaCon  is  cluster  aware!!  

HB  =  Heartbeat   r2  

async  replicate  

Page 20: Conf2014_SearchHeadClustering

Adhoc  Search  Management  

20  

!   Adhoc  search  -­‐  interacLve  search  run  from  a  user  session  !   Adhoc  searches  not  replicated    !   Captain,  however  will  have  global  knowledge  of  all  searches  !   GET  services/search/jobs  will  return  the  global  list  of  searches  !   You  can  proxy  and  access  adhoc  searches  from  any  node  

Page 21: Conf2014_SearchHeadClustering

Reaping  of  Search  ArLfacts  

21  

!   Reaping  –  DeleLon  of  search  results  when  TTL  (Lme  to  live)  expires  !   Search  ArLfacts  reaped  from  the  origin  node  !   Captain  orchestrates  reaps  of  the  replicas  

Page 22: Conf2014_SearchHeadClustering

Auto  Failover  

Page 23: Conf2014_SearchHeadClustering

HA  &  Auto-­‐Failover  

23  

Design  Goals  1.  No  Single  Point  of  Failure  2.  ConLnuous  UpLme  3.  Consistent  User  Experience  

ImplementaLon  1.  Dynamic  Captain  elecLon  2.  Auto  Failover  3.  Proxying  for  consistent  view  

Page 24: Conf2014_SearchHeadClustering

Dynamic  Captain  

24  

!   RaE  Consensus  Protocol  from  Stanford  –  Diego  Ongaro  &  John  Osterhout  –  Acknowledge  Diego  Ongaro  for  help!  

!   SHC  uses  RAFT  for  LE  and  Auto  Failover  

RV  =      Request  Vote      LE  =    Leader  ElecLon  SHC  =    Search  Head  Clustering  

 

S4  S2  

S5  

S3  

S1  

captain  

new    captain  

Page 25: Conf2014_SearchHeadClustering

Auto-­‐Failover  

25  

new  captain  

     ...  

members  

old  captain  

arLfacts  running  jobs  alerts,  etc  search  load  

scheduler  

Fixups  

Page 26: Conf2014_SearchHeadClustering

ConfiguraLon  Management  

Page 27: Conf2014_SearchHeadClustering

ConfiguraLon  Files  !   Custom  user  content  

–  Reports  –  Dashboards  

!   Search-­‐Lme  knowledge  –  Field  extracLons  –  Event  types  –  Macros  

!   System  configuraLons  –  Inputs,  forwarding,  authenLcaLon  

Page 28: Conf2014_SearchHeadClustering

Goal  

28  

!   Consistent  user  experience  across  all  search  heads  !   Changes  made  on  one  member  are  reflected  on  all  members  

Page 29: Conf2014_SearchHeadClustering

ConfiguraLon  Changes  

29  

!   Users  customize  search  and  UI  configuraLons  via  UI/CLI/REST  –  Save  report  –  Add  panel  to  dashboards  –  Create  field  extracLon  

!   Administrators  modify  system  configuraLons  –  Configure  forwarding  –  Deploy  centralized  authenLcaLon  (e.g.  LDAP)  –  Install  enLrely  new  app  or  hand-­‐edited  configuraLon  

Page 30: Conf2014_SearchHeadClustering

Search  and  UI  ConfiguraLons  

30  

!   Changes  to  search  and  UI  configuraLons  are  replicated  across  the  search  head  cluster  automaLcally  

!   Goal:  eventual  consistency  

Page 31: Conf2014_SearchHeadClustering

ConfiguraLon  ReplicaLon  

31  

my_dashboard.xml  

C  

Page 32: Conf2014_SearchHeadClustering

Concurrent  Changes  

32  

C  

Page 33: Conf2014_SearchHeadClustering

Custom  App  Content  

33  

!   App  devs  may  "opt-­‐in"  their  custom  configuraLons  for  replicaLon  under  search  head  clustering  

!   Example  server.conf  from  an  app  would  look  like:    [shclustering]  

 conf_replicaLon_include.my_custom_file  =  true  

Page 34: Conf2014_SearchHeadClustering

System  ConfiguraLons  

34  

!   Recall:  only  changes  to  search  and  UI  configuraLons  are  replicated  across  the  search  head  cluster  automaLcally  

!   hanges  to  system  configuraLons  are  not  replicated  automaLcally  because  of  their  high  potenLal  impact  

!   How  are  system  configuraLons  kept  consistent  then?  

Page 35: Conf2014_SearchHeadClustering

!   Deployer:  a  single,  well-­‐controlled  instance  outside  of  the  cluster  !   ConfiguraLons  should  be  tested  on  dev/QA  instances  prior  to  deploy  

D  

ConfiguraLon  Deployment  

35  

Page 36: Conf2014_SearchHeadClustering

UI  

36  

Page 37: Conf2014_SearchHeadClustering

Tips  &  Tricks  

Page 38: Conf2014_SearchHeadClustering

Best  PracLces    

!   Deployer  Instance  –  Can  piggyback  Cluster  Master  or  Deployment  Server  –  RecommendaLon  is  to  run  Deployer  on  separate  instance  

!   Run  CLI  to  get  status  about  SHC  –  ./splunk  show  shcluster-­‐status  

   

38  

Page 39: Conf2014_SearchHeadClustering

Summary  

Page 40: Conf2014_SearchHeadClustering

Key  Benefits  of  SHC  

40  

 Horizontal  Scaling    

   

 Always-­‐on  Search  Services    

 

 Consistent  User  Experience  

 

 Easy  to  add  /  manage  

premium  contents  (apps)  

Page 41: Conf2014_SearchHeadClustering

Q  &  A  

Page 42: Conf2014_SearchHeadClustering

THANK  YOU