Game Analytics with AWS - GDC 2014
-
Upload
amazon-web-services -
Category
Technology
-
view
108 -
download
3
description
Transcript of Game Analytics with AWS - GDC 2014
AWS Gaming Solutions | GDC 2014
Game Analytics with AWS
Or, How to learn what your players love so they will love your game
Nate Wiger @nateware | Principal Gaming Solutions Architect
AWS Gaming Solutions | GDC 2014
Mobile Game Landscape
• Free To Play
• In-App Purchases
• Long-Tail
• Cross-Platform
• Go Global
• User Retention = Revenue
AWS Gaming Solutions | GDC 2014
Projected Mobile App Revenue
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
2011 2012 2013 2014 2015 2016 2017
Ads
IAP
Paid
Source:
Gartner
AWS Gaming Solutions | GDC 2014
Winning at Free to Play
• Phase 1: Collect Data
• Phase 2: Analyze
• Phase 3: Profit
AWS Gaming Solutions | GDC 2014
Analyze What?
Emotions
• Enjoying game
• Engaged
• Like/dislike new content
• Stuck on a level
• Bored
• Abandonment
Behaviors
• Hours played day/week
• Number of sessions/day
• Level progression
• Friend invites/referrals
• Response to mobile push
• Money spent/week
AWS Gaming Solutions | GDC 2014
Example: Level Progression (One Metric)
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
# of Tries
AWS Gaming Solutions | GDC 2014
Example: Level Progression (Two Metrics)
0
10
20
30
40
50
60
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
% Highest Level # of Tries
AWS Gaming Solutions | GDC 2014
Key Takeaways
• Multiple data sources
• Correlate variables
• Deltas vs absolutes
• Settle on terminology (game vs level)
• Time matters
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Events & Metrics
• Event = Moment in Time – Login/quit
– Game start/end
– Level up
– In-app purchase
• Metrics = What to Measure – KISS
– Numbers
– Booleans
– Strings (Enums)
• Always Include (ALWAYS) – User
– Action
– Session (context-dependent)
– Timestamp in ISO8601 2014-03-16T16:28:26
AWS Gaming Solutions | GDC 2014
Off The Shelf Analytics
• Easy To Integrate
• Pre-Baked Reports
• Rate Limits
• Retention Windows
• Data Lock-In
AWS Gaming Solutions | GDC 2014
Ok, A Real Business Plan
Ingest Store Process Analyze
AWS Gaming Solutions | GDC 2014
Ok, A Real Business Plan
Ingest
• HTTP PUT
• Kafka
• Kinesis
• Scribe
Store
• S3
• DynamoDB
• HDFS
• Redshift
Process
• EMR (Hadoop)
• Spark
• Storm
Analyze
• Tableau
• Pentaho
• Jaspersoft
AWS Gaming Solutions | GDC 2014
• Write Events File on Device
• Periodically Upload to S3
• Process into Redshift
• Point GUI Tool to Redshift
Start Simple
2014-01-24,nateware,e4df,login 2014-01-24,nateware,e4df,gamestart 2014-01-24,nateware,e4df,gameend 2014-01-25,nateware,a88c,login 2014-01-25,nateware,a88c,friendlist 2014-01-25,nateware,a88c,gamestart
Profit!
AWS Gaming Solutions | GDC 2014
Redshift at a Glance
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
• Leader Node – SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes – Columnar table storage
– Load, backup, restore via Amazon S3
– Parallel load from Amazon DynamoDB
• Single node version available
AWS Gaming Solutions | GDC 2014
Tableau + Redshift
AWS Gaming Solutions | GDC 2014
Plumbing
① Create S3 bucket ("mygame-analytics-events")
② Request a security token for your mobile app: http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html
③ Upload data from your users' devices
④ Run a scheduled copy to Redshift
⑤ Setup Tableau to access Redshift
⑥ Go to the Beach
AWS Gaming Solutions | GDC 2014
Loading Redshift from S3
copy events from 's3://mygame-analytics-events' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>' delimiter=',';
Scheduled Redshift Load using Data Pipeline:
http://aws.amazon.com/articles/1143507459230804
AWS Gaming Solutions | GDC 2014
• Also Collect Server Logs
• Periodically Upload to S3
• Stuff into Redshift
• External Analytics Data Too
More Data Sources
EC2
External
Analytics
AWS Gaming Solutions | GDC 2014
Logrotate to S3
/var/log/apache2/*.log { sharedscripts postrotate sudo /usr/sbin/apache2ctl graceful s3cmd sync /var/log/*.gz s3://mygame-logs/ endscript }
Blog Entry on Log Rotation:
http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/
And/or, Use ELB Access Logs:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/acce
ss-log-collection.html
AWS Gaming Solutions | GDC 2014
• Different File Formats
• Device vs Apache vs CDN
• Cleanup with EMR Job
• Output to Clean Bucket
• Load into Redshift
Dealing With Messy Data
EC2
AWS Gaming Solutions | GDC 2014
Redshift vs Elastic MapReduce
Redshift
• Columnar DB
• Familiar SQL
• Structured Data
• Batch Load
• Faster to Query
• Long-term Storage
Elastic MapReduce
• Hadoop
• Hive/Pig are SQL-like
• Unstructured Data
• Streaming Loop
• Scales > PB's
• Transient
AWS Gaming Solutions | GDC 2014
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
Direct From DynamoDB
EC2
AWS Gaming Solutions | GDC 2014
• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
• Or Stream into EMR
Direct From DynamoDB
EC2
AWS Gaming Solutions | GDC 2014
Loading Redshift from DynamoDB
copy games
from 'dynamodb://games'
credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>';
copy events from 's3://mygame-analytics-events' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>' delimiter=',';
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Funnel Cake
AWS Gaming Solutions | GDC 2014
Back To Basics
2014-01-24,nateware,e4df,login 2014-01-24,nateware,e4df,gamestart 2014-01-24,nateware,e4df,gameend 2014-01-25,nateware,a88c,login 2014-01-25,nateware,a88c,friendlist 2014-01-25,nateware,a88c,gamestart
AWS Gaming Solutions | GDC 2014
Measure Retention: Repeated Plays
create view events_by_user_by_month as
select user_id,
date_trunc('month', event_date)
as month_active,
count(*) as total_events
from events
group by user_id, month_active;
AWS Gaming Solutions | GDC 2014
First-Pass Retention – Too Noisy
05
10152025303540
# Play Sessions / Month
nateware
Lazyd0g
AK187
3strikes
AWS Gaming Solutions | GDC 2014
Cohorts & Cambria
• Enables calculating relative metrics
• Group users by a common attribute – Month game installed
– Demographics
• Run analysis by cohort – Join with metrics
• Use Redshift as it's SQL – Example of where SQL is a good fit
AWS Gaming Solutions | GDC 2014
Creating Cohorts with Redshift
create view cohort_by_first_event_date as
select user_id,
date_trunc('month', min(event_date))
as first_month
from events
group by user_id;
http://snowplowanalytics.com/analytics/customer-
analytics/cohort-analysis.html
AWS Gaming Solutions | GDC 2014
Retention by Cohort – Join Events with Cohort
0
5
10
15
20
25
Week 1 Week 2 Week 3 Week 5 Week 6 Week 7
# Sessions / Week
2013-11
2013-12
2014-01
2014-02
2014-03
2014-04
AWS Gaming Solutions | GDC 2014
Moar Cohorts
• Define multiple cohorts – By activity, time, demographics
– As many as you like
• Change cohort depending on analysis
• Join same metrics with different cohorts – Retention by date
– Retention by demographic
– Retention by average plays/month quartile
AWS Gaming Solutions | GDC 2014
Example Event Stream
2014-03-17T09:52:08-07:00,nateware,e4b5,login
2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart
2014-03-17T09:53:15-07:00,nateware,e4b5,levelup
2014-03-17T09:54:06-07:00,nateware,e4b5,gameend
2014-03-17T09:54:23-07:00,nateware,30a4,gamestart
2014-03-17T09:55:14-07:00,nateware,30a4,gameend
2014-03-17T09:55:41-07:00,nateware,30a4,gamestart
2014-03-17T09:57:12-07:00,nateware,6ebd,levelup
2014-03-17T09:58:50-07:00,nateware,6ebd,levelup
2014-03-17T09:59:52-07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014
Example Event Stream
2014-03-17T09:52:08-07:00,nateware,e4b5,login
2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart
2014-03-17T09:53:15-07:00,nateware,e4b5,levelup
2014-03-17T09:54:06-07:00,nateware,e4b5,gameend
2014-03-17T09:54:23-07:00,nateware,30a4,gamestart
2014-03-17T09:55:14-07:00,nateware,30a4,gameend
2014-03-17T09:55:41-07:00,nateware,30a4,gamestart
2014-03-17T09:57:12-07:00,nateware,6ebd,levelup
2014-03-17T09:58:50-07:00,nateware,6ebd,levelup
2014-03-17T09:59:52-07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014
Cohorts by Type of Activity
create view cohort_by_first_play_date as
select user_id,
date_trunc('month', min(event_date))
as first_month
from events
where action = 'gamestart'
group by user_id;
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Post-Match Heatmaps
AWS Gaming Solutions | GDC 2014
Real-Time Analytics
Batch
• What game modes do people like best?
• How many people have downloaded DLC pack 2?
• Where do most people die on map 4?
• How many daily players are there on average?
Real-Time
• What game modes are people playing now?
• Are more or less people downloading DLC today?
• Are people dying in the same places? Different?
• How many people are playing today? Variance?
AWS Gaming Solutions | GDC 2014
Why Real-Time Analytics?
30x in 24 hours
What if you ran a promo?
AWS Gaming Solutions | GDC 2014
Real-Time Tools
Spark
• High-Performance
Hadoop Alternative
• Berkeley.edu
• Compatible with HiveQL
• 100x faster than Hadoop
• Runs on EMR
Kinesis
• Amazon fully-managed
streaming data layer
• Similar to Kafka
• Streams contain Shards
• Each Shard ingests data
up to 1MB/sec, 1000 TPS
• Data stored for 24 hours
AWS Gaming Solutions | GDC 2014
• Always Batch Due to S3
Back To Basics [Dubstep Remix]
EC2
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
Need Data Faster!
EC2
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
• Stream to Spark on EMR
• Storm via Kinesis Spout
• Custom EC2 Workers
Lots of Ins and Outs
EC2
EC2
AWS Gaming Solutions | GDC 2014
Data Sources
App.4
[Machine Learning]
AW
S En
dp
oin
t
App.1
[Aggregate & De-Duplicate]
Data Sources
Data Sources
Data Sources
App.2
[Metric Extraction]
S3
DynamoDB
Redshift
App.3 [Sliding
Window Analysis]
Data Sources
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone Availability
Zone
Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion
AWS Gaming Solutions | GDC 2014
Putting Data into Kinesis
• Producers use PUT to send data to a Stream
• PutRecord {Data, PartitionKey, StreamName}
• Partition Key distributes PUTs across Shards
• Unique Sequence # returned on PUT call
• Documentation:
http://docs.aws.amazon.com/kinesis/latest/dev/
introduction.html
AWS Gaming Solutions | GDC 2014
Writing to a Kinesis Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-amz-Date: <Date> Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-type;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid, Signature=<Signature> User-Agent: <UserAgentString> Content-Type: application/x-amz-json-1.1 Content-Length: <PayloadSizeBytes> Connection: Keep-Alive X-Amz-Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }
AWS Gaming Solutions | GDC 2014
Kinesis + Spark
http://aws.amazon.com/articles/4926593393724923
AWS Gaming Solutions | GDC 2014
Death in Real-Time
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"}
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"}
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"}
PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"}
PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":20,"victim":37,"coord":"71,473,20"}
PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":21,"victim":19,"coord":"332,381,17"}
PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":0,"victim":10,"coord":"14,108,25"}
PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}
PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"}
PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"}
PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}
AWS Gaming Solutions | GDC 2014
Real-Time Heatmaps
AWS Gaming Solutions | GDC 2014
But A Bow On It
• Collect data from the start
• Store it even if you can't process it (yet)
• Start simple – S3 + Redshift
• Add data sources – process with EMR
• Real-time – Kinesis + Spark
• Tons of untapped potential for gaming
AWS Gaming Solutions | GDC 2014
Fallback Plan
Cheers – Nate Wiger @nateware