[2C6]Everyplay_Big_Data

28

description

[2C6]Everyplay_Big_Data

Transcript of [2C6]Everyplay_Big_Data

Page 1: [2C6]Everyplay_Big_Data
Page 2: [2C6]Everyplay_Big_Data

Tuomas Rinta, Development Director Everyplay / Unity Technologies

FROM BIG DATA TO ACTIONABLE ANALYTICS

Page 3: [2C6]Everyplay_Big_Data

So  what  is                                                                                ?  

Page 4: [2C6]Everyplay_Big_Data
Page 5: [2C6]Everyplay_Big_Data

 

                                                                       and  numbers  

•  Live  in  about  1000  games  across  iOS  and  Android  •  Nearly  100  million  game  sessions  recorded  daily  •  About  2  billion  events  of  usage  data  generated  every  week  

 

 

Page 6: [2C6]Everyplay_Big_Data

 Why  do  we  care  about  big  data?  

•  Mobile  games,  especially  free-­‐to-­‐play,  live  and  die  by  their  metrics  

•  Providing  a  service  for  game  developers  must  have    proven  value,  and  each  opFmizaFon  counts  

 

 

Page 7: [2C6]Everyplay_Big_Data

So  let’s  talk  about  how  we  use  big  data,  and  how  we  got  

started  

Page 8: [2C6]Everyplay_Big_Data

 

Our  goal  “How  do  we  create  a  metrics-­‐driven    product  based  on  big  data?”  

 

 

Page 9: [2C6]Everyplay_Big_Data

This  needs  to  be  as  quick  as  possible  

Collect data

Analyze Create

A/B tests

Improve product

Page 10: [2C6]Everyplay_Big_Data

 

Challenges  •  We  ship  an  SDK  –  and  normal  update  cycle  by  clients  can  be  as  long  as  6-­‐12  months,  not  very  dynamic  –  This  conflicts  with  the  fast  improvement  cycle  –  Technology  must  adapt  to  supporFng  big  data  

•  The  product  evolves  constantly  –  AnalyFcs  requirements  change  constantly  

 

 

Page 11: [2C6]Everyplay_Big_Data

SDK  is  instrumented    to  send  everything    

the  user  does  to  the    servers  

Scribe

Amazon S3

Real-tim

e production system

Batch data processing

Apache Pig

Page 12: [2C6]Everyplay_Big_Data

Tackling  evolving  analy;cs  

Page 13: [2C6]Everyplay_Big_Data

 

Issues  with  big  data  and  analyFcs  •  AnalyFcs  requirements  change  •  RedshiS  is  based  on  PostgreSQL,  so  there  needs  to  be  a  scheme  –  Schemes  are  the  most  restricFve  factor  with  RedshiS  

•  How  does  that  work  with  evolving  analyFcs?  •  Everything  would  be  easy  if  there  weren’t  billions  of  rows  of  data…  

 

Page 14: [2C6]Everyplay_Big_Data

 

How  should  data  be  reported?  •  Choosing  how  the  end-­‐user  instrumentaFon  sends    events  is  crucial  

•  Bad  format  of  events  can  make  analyFcs  from  big  data  nearly  impossible  

•  You  don’t  always  know  before-­‐hand  what  you  need  

Page 15: [2C6]Everyplay_Big_Data

 

Two  possible  approaches  Separate events Example of video sharing: openVideoEditor trimButtonPressed undoTrimPressed activateFacecamRecording finishFacecamRecording shareButtonPressed •  More flexible with a schema-

based database •  Requires much more

data processing •  Combining events can be

a hassle

Conversions with properties Example of video sharing: {event: “videoShareComplete”, {properties: [ {didTrimVideo: true}, {isVideoTrimmed: false}, {didUseFacecam: true}, {isFacecamEnabled: true}, {totalDuration: 1241} ] } } •  Problematic with a schema-

based database •  Easier and faster to process •  All relevant data is pre-

aggregated

Page 16: [2C6]Everyplay_Big_Data

 

“What  about  Postgre  and  JSON?”  •  Yes,  Postgre  allows  parsing  of  JSON  documents  which  allows  arbitrary  format  of  event  data  

•  However,  when  your  data  gets  big,  this  comes  with  a  warning…  

Page 17: [2C6]Everyplay_Big_Data

 Comparing  querying  fields  and  JSON  Normal  query:  select count(*) from events where created > ‘2014-09-01’ and event_type=‘recordSessionClosed’;  Vs.  JSON-­‐based:  select count(*) from events where created > ‘2014-09-01’ and json_extract_path_text(event_json, ‘event_type’) = ‘recordSessionClosed’

Page 18: [2C6]Everyplay_Big_Data

Results  

0

200

400

600

800

1000

1200

1400

Normal JSON

Execution time (in seconds)

Page 19: [2C6]Everyplay_Big_Data

 

So  what’s  the  best  soluFon?  •  Combining  single-­‐event  sending  with  extra  JSON-­‐  properFes  

•  Querying  the  JSON-­‐properFes  is  slow,  so  we  store  only  informaFon  that  is  not  needed  that  much  there  (drill-­‐down  informaFon)  

Page 20: [2C6]Everyplay_Big_Data

 

How  do  we  then  analyse  the  data?  •  Most  on-­‐the-­‐market  soluFons  fell  short  due  to  

– Pricing  – Features  – Availability  

•  Turned  out  to  be  easier  to  “roll  your  own”  

Page 21: [2C6]Everyplay_Big_Data

Solving  an  actual  problem    “What  are  the  worst  drop-­‐off  points  for  uploading  a  replay?”  

Page 22: [2C6]Everyplay_Big_Data

Tools    •  SQL  •  JavaScript  •  Google  Charts  visualisaFon  library  

Page 23: [2C6]Everyplay_Big_Data
Page 24: [2C6]Everyplay_Big_Data

Why  JavaScript  for  processing?    •  Dynamic,  fast,  relaFvely  well-­‐known  •  Excellent  libraries  for  data  visualisaFon  

– Highcharts,  Google  Charts,  D3.js,  Dygraph  •  Good  for  visualizing  data,  but  that’s  it  

Page 25: [2C6]Everyplay_Big_Data

Keys  to  a  successful  data-­‐driven  product    •  Plan  ahead  for  analyFcs  and  leave  room  for  an  evolving  product  

•  If  metrics  and  analyFcs  are  not  easily  accessible  by  decision  makers,  they  are  worthless  –  self-­‐updaFng  dashboards  are  one  of  the  main  keys  to  success  

•  Build  A/B  tesFng  and  data-­‐driven  behaviour  directly  into  your  product,  don’t  hack  it  on  later  

Page 26: [2C6]Everyplay_Big_Data

Thank  you!  Questions, comments? Email: [email protected] Twitter: @trinta developers.everyplay.com

Page 27: [2C6]Everyplay_Big_Data

Q&A

Page 28: [2C6]Everyplay_Big_Data

THANK YOU