SF Solr Meetup - Interactively Search and Visualize Your Big Data

Post on 09-Jan-2017

1.352 views 3 download

Transcript of SF Solr Meetup - Interactively Search and Visualize Your Big Data

INTERACTIVELY SEARCH AND VISUALIZE YOUR DATA WITH SOLR AND SPARK

Romain Rigaux

GOALS

Build  a  Web  app  Quickly  explore  data  

…  with  Solr

make  Solr  /  Hadoop  easier  to  use

+

ARCHITECTURE“Just  a  view”  on  top  of  the  standard  Solr  API

REST

HISTORYV1 USER

HISTORYV1 ADMIN

ARCHITECTURENEXT!

Lot  of  learning,  UX  Boost  needed  

Simple,  don’t  know  it  is  Solr

HISTORYV2 USER

HISTORYV2 ADMIN

HISTORYV2 BETTER UX

ARCHITECTURE

/select  /admin/collections  /get  /luke...

/add_widget  /zoom_in  /select_facet  /select_range...

REST AJAXTemplates  

+  JS  Model

www….

ARCHITECTUREUI FOR FACETS

Query

Collection

 Layout All  the  2D  positioning  (cell  ids),  visual,  drag&drop

Dashboard,  fields,  template,  widgets  (ids)

Search  terms,  selected  facets  (q,  fqs)

ADDING A WIDGETLIFECYCLE

Load  the  initial  page  Edit  mode  and  Drag&Drop

/solr/zookeeper/clusterstate.json  /solr/admin/luke…

/get_collection

ADDING A WIDGETLIFECYCLE

/solr/select?stats=true /new_facet

Select  the  field  Guess  ranges  (number  or  dates)  Rounding  (number  or  dates)

ADDING A WIDGETLIFECYCLE

Query  part  1

Query  Part  2

Augment  Solr  response

facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&  f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10

q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]

{ 'facet_counts':{ 'facet_ranges':{ 'bytes':{ 'start':10000, 'counts':[ '900000', 3423, '1800000', 339,

... ] } }}

{ ..., 'normalized_facets':[ { 'extraSeries':[

], 'label':'bytes', 'field':'bytes', 'counts':[ { 'from’:'900000', 'to':'1800000', 'selected':True, 'value':3423, 'field’:'bytes', 'exclude':False } ], ... } }}

JSON TO WIDGET{ "field":"rate_code","counts":[ { "count":97797, "exclude":true, "selected":false, "value":"1", "cat":"rate_code" } ...

{ "field":"medallion","counts":[ { "count":159, "exclude":true, "selected":false, "value":"6CA28FC49A4C49A9A96", "cat":"medallion" } ….

{ "extraSeries":[

],"label":"trip_time_in_secs","field":"trip_time_in_secs","counts":[ { "from":"0", "to":"10", "selected":false, "value":527, "field":"trip_time_in_secs", "exclude":true } ...

{ "field":"passenger_count","counts":[ { "count":74766, "exclude":true, "selected":false, "value":"1", "cat":"passenger_count" } ...

REPEATUNTIL…

GAME CHANGER!

Possibilihes

5.1  /  5.2

Analyhc  Facets

FACETFUNCTIONS

Count  Sum  Avg  Percentile  Max  ...

Count(id)  Sum(bytes)  Avg(mul(price,  quantity))  Percentile(salary,  50,  90)  Max(temperature)  ...

FACETFUNCTIONS

SUB “NESTED”FACETS

top_os  {      type:  term,      field:  os,      limit:  5  }

top_os  {      type:  term,      field:  os,      limit:  5,      facet  :  {          by_country:  {              type:  term,              field:  country          }      }  }

FUNCTION + NESTED =ANALYTICS states  {  

   type:  term,      field:  state,      facet  :  {        by_month  :  {              type:  range,              field:  time,              start:  “TODAY-­‐6MONTHS”,              end:  “TODAY”,              gap:  “MONTH”,              facet  :  {                    avg_sal:  “avg(salary)”              }          }      }  }

states  {      type:  term,      field:  state,      facet  :  {          avg_sal:  “avg(salary)”      }  }

OPERATIONS ONBUCKETS OF DATA

Counts  →  Functions

OPERATIONS ONBUCKETS OF DATA

Nested  →  nD  functions

SEARCH AS ONLYAPP IN HUE

gethue.com/solr-­‐search-­‐ui-­‐only/

• Spark  in  your  browser  

• Notebooks  

• New  REST  Server

SPARKINDEXING WHAT

• Open  source  REST  for  Spark  Shell  

• Runs  locally  or  inside  YARN  

• Spark  Scala,  PySpark  and  jar/py  submission

SPARKINDEXING WHAT

hpps://github.com/cloudera/hue/tree/master/apps/spark/java

LIVY ARCH YARN LOCAL

Livy  Server

Livy  REPL

Spark  Contexts

Spark  Worker

Livy  ServerYARN  Master

YARN  Node

Livy  REPL

Spark  Context  /  PySpark

YARN  Node

Spark  Worker

YARN  Node

Spark  Worker

1

2

3

4

SPARK STREAMING

Real  hme!                    Spark  Solr

• Python  

• Scala  

• Charts

NOTEBOOKS / SHELL

WHAT

DEMO TIME• Analyze  Bay  area  bike  share  

• Visualize  one  year  of  data  

• Know  your  users,  predict  behavior

MISSEDSOMETHING?

demo.gethue.com

• Full  Analyhcs  

• Easier  indexing  

• Geo  

• Export/Share  results  

• Solr  Joins,  Solr  SQL  

• Spark,  SQL...  integrahon,  Hue  4

WHAT’S NEXT

NEW FEATURES

TWITTER

@gethue

USER GROUP

hue-­‐user@

WEBSITE

hpp://gethue.com

LEARN

hpp://learn.gethue.com

THANKS!