AWS Game Analytics - GDC 2014

53
AWS Gaming Solutions | GDC 2014 Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect

description

Use AWS to learn how much players love your game by analyzing in-game metrics to measure engagement and retention. Start simple by uploading data to S3 and analyzing it with Redshift. Add additional game data sources and dive deeper with Cohort analysis. Finally I cover real-time analytics with Kinesis and Spark.

Transcript of AWS Game Analytics - GDC 2014

Page 1: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect

Page 2: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Mobile Game Landscape

•  Free To Play •  In-App Purchases •  Long-Tail •  Cross-Platform •  Go Global •  User Retention = Revenue

Page 3: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Projected Mobile App Revenue

0 10000 20000 30000 40000 50000 60000 70000 80000 90000

2011 2012 2013 2014 2015 2016 2017

Ads IAP Paid

Source: Gartner

Page 4: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Winning at Free to Play

•  Phase 1: Collect Data •  Phase 2: Analyze •  Phase 3: Profit

Page 5: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Analyze What?

Emotions •  Enjoying game •  Engaged •  Like/dislike new content •  Stuck on a level •  Bored •  Abandonment

Behaviors •  Hours played day/week •  Number of sessions/day •  Level progression •  Friend invites/referrals •  Response to mobile push •  Money spent/week

Page 6: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (One Metric)

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

# of Tries

Page 7: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (Two Metrics)

0 10 20 30 40 50 60

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

% Highest Level # of Tries

Page 8: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Key Takeaways

•  Multiple data sources •  Correlate variables •  Deltas vs absolutes •  Settle on terminology (game vs level) •  Time matters

Page 9: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 10: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Events & Metrics

•  Event = Moment in Time –  Login/quit –  Game start/end –  Level up –  In-app purchase

•  Metrics = What to Measure –  KISS –  Numbers –  Booleans –  Strings (Enums)

•  Always Include (ALWAYS) –  User –  Action –  Session (context-dependent) –  Timestamp in ISO8601

2014-­‐03-­‐16T16:28:26

Page 11: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Off The Shelf Analytics

•  Easy To Integrate •  Pre-Baked Reports •  Rate Limits •  Retention Windows •  Data Lock-In

Page 12: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest Store Process Analyze

Page 13: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest • HTTP PUT • Kafka • Kinesis • Scribe

Store • S3 • DynamoDB • HDFS • Redshift

Process • EMR (Hadoop) • Spark • Storm

Analyze • Tableau • Pentaho •  Jaspersoft

Page 14: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Write Events File on Device •  Periodically Upload to S3 •  Process into Redshift •  Point GUI Tool to Redshift

Start Simple

2014-­‐01-­‐24,nateware,e4df,login  2014-­‐01-­‐24,nateware,e4df,gamestart  2014-­‐01-­‐24,nateware,e4df,gameend  2014-­‐01-­‐25,nateware,a88c,login  2014-­‐01-­‐25,nateware,a88c,friendlist  2014-­‐01-­‐25,nateware,a88c,gamestart  

Profit!

Page 15: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift at a Glance

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3/DynamoDB

JDBC/ODBC

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

Leader Node

•  Leader Node –  SQL endpoint –  Stores metadata –  Coordinates query execution

•  Compute Nodes –  Columnar table storage –  Load, backup, restore via Amazon S3 –  Parallel load from Amazon DynamoDB

•  Single node version available

Page 16: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Tableau + Redshift

Page 17: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Plumbing

①  Create S3 bucket ("mygame-analytics-events") ②  Request a security token for your mobile app:

http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html

③  Upload data from your users' devices ④  Run a scheduled copy to Redshift ⑤  Setup Tableau to access Redshift ⑥  Go to the Beach

Page 18: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from S3

copy  events  from  's3://mygame-­‐analytics-­‐events'  credentials  'aws_access_key_id=<access-­‐key-­‐id>;  aws_secret_access_key=<secret-­‐access-­‐key>'  delimiter=',';  

Scheduled Redshift Load using Data Pipeline: http://aws.amazon.com/articles/1143507459230804

Page 19: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Also Collect Server Logs •  Periodically Upload to S3 •  Stuff into Redshift •  External Analytics Data Too

More Data Sources

EC2

External Analytics

Page 20: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Logrotate to S3

/var/log/apache2/*.log  {      sharedscripts      postrotate          sudo  /usr/sbin/apache2ctl  graceful          s3cmd  sync  /var/log/*.gz  s3://mygame-­‐logs/      endscript  }  

Blog Entry on Log Rotation: http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/ And/or, Use ELB Access Logs: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/access-log-collection.html

Page 21: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Different File Formats •  Device vs Apache vs CDN •  Cleanup with EMR Job •  Output to Clean Bucket •  Load into Redshift

Dealing With Messy Data

EC2

Page 22: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift vs Elastic MapReduce

Redshift •  Columnar DB •  Familiar SQL •  Structured Data •  Batch Load •  Faster to Query •  Long-term Storage

Elastic MapReduce •  Hadoop •  Hive/Pig are SQL-like •  Unstructured Data •  Streaming Loop •  Scales > PB's •  Transient

Page 23: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Integrate Game DB •  Load Directly into Redshift •  Redshift does Intelligent Merge •  Tracks Hash Keys, Columns

Direct From DynamoDB

EC2

Page 24: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Integrate Game DB •  Load Directly into Redshift •  Redshift does Intelligent Merge •  Tracks Hash Keys, Columns •  Or Stream into EMR

Direct From DynamoDB

EC2

Page 25: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from DynamoDB

copy  games  from  'dynamodb://games'  credentials  'aws_access_key_id=<access-­‐key-­‐id>;  aws_secret_access_key=<secret-­‐access-­‐key>';  

copy  events  from  's3://mygame-­‐analytics-­‐events'  credentials  'aws_access_key_id=<access-­‐key-­‐id>;  aws_secret_access_key=<secret-­‐access-­‐key>'  delimiter=',';  

Page 26: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 27: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Funnel Cake

Page 28: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Back To Basics

2014-­‐01-­‐24,nateware,e4df,login  2014-­‐01-­‐24,nateware,e4df,gamestart  2014-­‐01-­‐24,nateware,e4df,gameend  2014-­‐01-­‐25,nateware,a88c,login  2014-­‐01-­‐25,nateware,a88c,friendlist  2014-­‐01-­‐25,nateware,a88c,gamestart  

Page 29: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Measure Retention: Repeated Plays

create  view  events_by_user_by_month  as  select  user_id,  date_trunc('month',  event_date)  as  month_active,  count(*)  as  total_events  from  events  group  by  user_id,  month_active;    

Page 30: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

First-Pass Retention – Too Noisy

0 5

10 15 20 25 30 35 40

# Play Sessions / Month

nateware Lazyd0g AK187 3strikes

Page 31: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts & Cambria

•  Enables calculating relative metrics •  Group users by a common attribute

–  Month game installed –  Demographics

•  Run analysis by cohort –  Join with metrics

•  Use Redshift as it's SQL –  Example of where SQL is a good fit

Page 32: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Creating Cohorts with Redshift

create  view  cohort_by_first_event_date  as  select  user_id,  date_trunc('month',  min(event_date))  as  first_month  from  events  group  by  user_id;    

http://snowplowanalytics.com/analytics/customer-analytics/cohort-analysis.html

Page 33: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Retention by Cohort – Join Events with Cohort

0

5

10

15

20

25

Week 1 Week 2 Week 3 Week 5 Week 6 Week 7

# Sessions / Week

2013-11 2013-12 2014-01 2014-02 2014-03 2014-04

Page 34: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Moar Cohorts

•  Define multiple cohorts –  By activity, time, demographics –  As many as you like

•  Change cohort depending on analysis •  Join same metrics with different cohorts

–  Retention by date –  Retention by demographic –  Retention by average plays/month quartile

Page 35: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-­‐03-­‐17T09:52:08-­‐07:00,nateware,e4b5,login  2014-­‐03-­‐17T09:52:54-­‐07:00,nateware,e4b5,gamestart  2014-­‐03-­‐17T09:53:15-­‐07:00,nateware,e4b5,levelup  2014-­‐03-­‐17T09:54:06-­‐07:00,nateware,e4b5,gameend  2014-­‐03-­‐17T09:54:23-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:55:14-­‐07:00,nateware,30a4,gameend  2014-­‐03-­‐17T09:55:41-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:57:12-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:58:50-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:59:52-­‐07:00,nateware,6ebd,gameend    

Page 36: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-­‐03-­‐17T09:52:08-­‐07:00,nateware,e4b5,login  2014-­‐03-­‐17T09:52:54-­‐07:00,nateware,e4b5,gamestart  2014-­‐03-­‐17T09:53:15-­‐07:00,nateware,e4b5,levelup  2014-­‐03-­‐17T09:54:06-­‐07:00,nateware,e4b5,gameend  2014-­‐03-­‐17T09:54:23-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:55:14-­‐07:00,nateware,30a4,gameend  2014-­‐03-­‐17T09:55:41-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:57:12-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:58:50-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:59:52-­‐07:00,nateware,6ebd,gameend    

Page 37: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts by Type of Activity

create  view  cohort_by_first_play_date  as  select  user_id,  date_trunc('month',  min(event_date))  as  first_month  from  events  where  action  =  'gamestart'  group  by  user_id;    

Page 38: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 39: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Post-Match Heatmaps

Page 40: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Analytics

Batch •  What game modes do

people like best? •  How many people have

downloaded DLC pack 2? •  Where do most people

die on map 4? •  How many daily players

are there on average?

Real-Time •  What game modes are

people playing now? •  Are more or less people

downloading DLC today? •  Are people dying in the

same places? Different? •  How many people are

playing today? Variance?

Page 41: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Why Real-Time Analytics?

30x in 24 hours What if you ran a promo?

Page 42: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Tools

Spark •  High-Performance

Hadoop Alternative •  Berkeley.edu •  Compatible with HiveQL •  100x faster than Hadoop •  Runs on EMR

Kinesis •  Amazon fully-managed

streaming data layer •  Similar to Kafka •  Streams contain Shards •  Each Shard ingests data

up to 1MB/sec, 1000 TPS •  Data stored for 24 hours

Page 43: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Always Batch Due to S3

Back To Basics [Dubstep Remix]

EC2

Page 44: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Stream Data With Kinesis •  Multiple Writers and Readers •  Still Output to Redshift

Need Data Faster!

EC2

Page 45: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Stream Data With Kinesis •  Multiple Writers and Readers •  Still Output to Redshift •  Stream to Spark on EMR •  Storm via Kinesis Spout •  Custom EC2 Workers

Lots of Ins and Outs

EC2

EC2

Page 46: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

 Data  Sources  

App.4    

[Machine  Learning]  

                             

     AW

S  En

dpoint  

App.1    

[Aggregate  &  De-­‐Duplicate]  

 Data  Sources  

Data  Sources  

 Data  Sources  

App.2    

[Metric  Extrac=on]  

S3

DynamoDB

Redshift

App.3  [Sliding  Window  Analysis]  

 Data  Sources  

Availability Zone

Shard 1 Shard 2 Shard N

Availability Zone

Availability Zone

Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion

Page 47: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Putting Data into Kinesis

•  Producers use PUT to send data to a Stream

•  PutRecord {Data, PartitionKey, StreamName}

•  Partition Key distributes PUTs across Shards

•  Unique Sequence # returned on PUT call

•  Documentation:

http://docs.aws.amazon.com/kinesis/latest/dev/

introduction.html

Producer

Shard 1

Shard 2

Shard 3

Shard n

Shard 4

Producer

Producer

Producer

Producer

Producer

Producer

Producer

Producer

Kinesis

Page 48: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Writing to a Kinesis Stream POST  /  HTTP/1.1  Host:  kinesis.<region>.<domain>  x-­‐amz-­‐Date:  <Date>  Authorization:  AWS4-­‐HMAC-­‐SHA256  Credential=<Credential>,  SignedHeaders=content-­‐type;date;host;user-­‐agent;x-­‐amz-­‐date;x-­‐amz-­‐target;x-­‐amzn-­‐requestid,  Signature=<Signature>  User-­‐Agent:  <UserAgentString>  Content-­‐Type:  application/x-­‐amz-­‐json-­‐1.1  Content-­‐Length:  <PayloadSizeBytes>  Connection:  Keep-­‐Alive  X-­‐Amz-­‐Target:  Kinesis_20131202.PutRecord    {      "StreamName":  "exampleStreamName",      "Data":  "XzxkYXRhPl8x",      "PartitionKey":  "partitionKey"  }  

Page 49: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Kinesis + Spark

http://aws.amazon.com/articles/4926593393724923

Page 50: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Death in Real-Time

PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}  PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"}  PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"}  PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":34,"victim":18,"coord":"163,677,18"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":20,"victim":37,"coord":"71,473,20"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":21,"victim":19,"coord":"332,381,17"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":0,"victim":10,"coord":"14,108,25"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}  

Page 51: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Heatmaps

Page 52: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

But A Bow On It

•  Collect data from the start •  Store it even if you can't process it (yet) •  Start simple – S3 + Redshift •  Add data sources – process with EMR •  Real-time – Kinesis + Spark •  Tons of untapped potential for gaming

Page 53: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Fallback Plan

Cheers – Nate Wiger @nateware