Game Analytics with AWS - GDC 2014

53
AWS Gaming Solutions | GDC 2014 Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect

description

Free to play is now the standard for mobile and social games. But succeeding in free-to-play is not easy: You need in-depth data analytics to gain insight into your players so you can monetize your game. Learn how to leverage new features of AWS services such as Elastic MapReduce, Amazon S3, Kinesis, and Redshift to build an end-to-end analytics pipeline. Plus, we'll show you how to easily integrate analytics with other AWS services in your game.

Transcript of Game Analytics with AWS - GDC 2014

Page 1: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Game Analytics with AWS

Or, How to learn what your players love so they will love your game

Nate Wiger @nateware | Principal Gaming Solutions Architect

Page 2: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Mobile Game Landscape

• Free To Play

• In-App Purchases

• Long-Tail

• Cross-Platform

• Go Global

• User Retention = Revenue

Page 3: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Projected Mobile App Revenue

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

2011 2012 2013 2014 2015 2016 2017

Ads

IAP

Paid

Source:

Gartner

Page 4: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Winning at Free to Play

• Phase 1: Collect Data

• Phase 2: Analyze

• Phase 3: Profit

Page 5: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Analyze What?

Emotions

• Enjoying game

• Engaged

• Like/dislike new content

• Stuck on a level

• Bored

• Abandonment

Behaviors

• Hours played day/week

• Number of sessions/day

• Level progression

• Friend invites/referrals

• Response to mobile push

• Money spent/week

Page 6: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (One Metric)

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

# of Tries

Page 7: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (Two Metrics)

0

10

20

30

40

50

60

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

% Highest Level # of Tries

Page 8: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Key Takeaways

• Multiple data sources

• Correlate variables

• Deltas vs absolutes

• Settle on terminology (game vs level)

• Time matters

Page 9: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 10: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Events & Metrics

• Event = Moment in Time – Login/quit

– Game start/end

– Level up

– In-app purchase

• Metrics = What to Measure – KISS

– Numbers

– Booleans

– Strings (Enums)

• Always Include (ALWAYS) – User

– Action

– Session (context-dependent)

– Timestamp in ISO8601 2014-03-16T16:28:26

Page 11: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Off The Shelf Analytics

• Easy To Integrate

• Pre-Baked Reports

• Rate Limits

• Retention Windows

• Data Lock-In

Page 12: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest Store Process Analyze

Page 13: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest

• HTTP PUT

• Kafka

• Kinesis

• Scribe

Store

• S3

• DynamoDB

• HDFS

• Redshift

Process

• EMR (Hadoop)

• Spark

• Storm

Analyze

• Tableau

• Pentaho

• Jaspersoft

Page 14: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Write Events File on Device

• Periodically Upload to S3

• Process into Redshift

• Point GUI Tool to Redshift

Start Simple

2014-01-24,nateware,e4df,login 2014-01-24,nateware,e4df,gamestart 2014-01-24,nateware,e4df,gameend 2014-01-25,nateware,a88c,login 2014-01-25,nateware,a88c,friendlist 2014-01-25,nateware,a88c,gamestart

Profit!

Page 15: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift at a Glance

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

• Leader Node – SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes – Columnar table storage

– Load, backup, restore via Amazon S3

– Parallel load from Amazon DynamoDB

• Single node version available

Page 16: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Tableau + Redshift

Page 17: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Plumbing

① Create S3 bucket ("mygame-analytics-events")

② Request a security token for your mobile app: http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html

③ Upload data from your users' devices

④ Run a scheduled copy to Redshift

⑤ Setup Tableau to access Redshift

⑥ Go to the Beach

Page 18: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from S3

copy events from 's3://mygame-analytics-events' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>' delimiter=',';

Scheduled Redshift Load using Data Pipeline:

http://aws.amazon.com/articles/1143507459230804

Page 19: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Also Collect Server Logs

• Periodically Upload to S3

• Stuff into Redshift

• External Analytics Data Too

More Data Sources

EC2

External

Analytics

Page 20: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Logrotate to S3

/var/log/apache2/*.log { sharedscripts postrotate sudo /usr/sbin/apache2ctl graceful s3cmd sync /var/log/*.gz s3://mygame-logs/ endscript }

Blog Entry on Log Rotation:

http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/

And/or, Use ELB Access Logs:

http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/acce

ss-log-collection.html

Page 21: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Different File Formats

• Device vs Apache vs CDN

• Cleanup with EMR Job

• Output to Clean Bucket

• Load into Redshift

Dealing With Messy Data

EC2

Page 22: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift vs Elastic MapReduce

Redshift

• Columnar DB

• Familiar SQL

• Structured Data

• Batch Load

• Faster to Query

• Long-term Storage

Elastic MapReduce

• Hadoop

• Hive/Pig are SQL-like

• Unstructured Data

• Streaming Loop

• Scales > PB's

• Transient

Page 23: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Integrate Game DB

• Load Directly into Redshift

• Redshift does Intelligent Merge

• Tracks Hash Keys, Columns

Direct From DynamoDB

EC2

Page 24: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Integrate Game DB

• Load Directly into Redshift

• Redshift does Intelligent Merge

• Tracks Hash Keys, Columns

• Or Stream into EMR

Direct From DynamoDB

EC2

Page 25: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from DynamoDB

copy games

from 'dynamodb://games'

credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>';

copy events from 's3://mygame-analytics-events' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>' delimiter=',';

Page 26: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 27: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Funnel Cake

Page 28: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Back To Basics

2014-01-24,nateware,e4df,login 2014-01-24,nateware,e4df,gamestart 2014-01-24,nateware,e4df,gameend 2014-01-25,nateware,a88c,login 2014-01-25,nateware,a88c,friendlist 2014-01-25,nateware,a88c,gamestart

Page 29: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Measure Retention: Repeated Plays

create view events_by_user_by_month as

select user_id,

date_trunc('month', event_date)

as month_active,

count(*) as total_events

from events

group by user_id, month_active;

Page 30: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

First-Pass Retention – Too Noisy

05

10152025303540

# Play Sessions / Month

nateware

Lazyd0g

AK187

3strikes

Page 31: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts & Cambria

• Enables calculating relative metrics

• Group users by a common attribute – Month game installed

– Demographics

• Run analysis by cohort – Join with metrics

• Use Redshift as it's SQL – Example of where SQL is a good fit

Page 33: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Retention by Cohort – Join Events with Cohort

0

5

10

15

20

25

Week 1 Week 2 Week 3 Week 5 Week 6 Week 7

# Sessions / Week

2013-11

2013-12

2014-01

2014-02

2014-03

2014-04

Page 34: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Moar Cohorts

• Define multiple cohorts – By activity, time, demographics

– As many as you like

• Change cohort depending on analysis

• Join same metrics with different cohorts – Retention by date

– Retention by demographic

– Retention by average plays/month quartile

Page 35: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-03-17T09:52:08-07:00,nateware,e4b5,login

2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart

2014-03-17T09:53:15-07:00,nateware,e4b5,levelup

2014-03-17T09:54:06-07:00,nateware,e4b5,gameend

2014-03-17T09:54:23-07:00,nateware,30a4,gamestart

2014-03-17T09:55:14-07:00,nateware,30a4,gameend

2014-03-17T09:55:41-07:00,nateware,30a4,gamestart

2014-03-17T09:57:12-07:00,nateware,6ebd,levelup

2014-03-17T09:58:50-07:00,nateware,6ebd,levelup

2014-03-17T09:59:52-07:00,nateware,6ebd,gameend

Page 36: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-03-17T09:52:08-07:00,nateware,e4b5,login

2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart

2014-03-17T09:53:15-07:00,nateware,e4b5,levelup

2014-03-17T09:54:06-07:00,nateware,e4b5,gameend

2014-03-17T09:54:23-07:00,nateware,30a4,gamestart

2014-03-17T09:55:14-07:00,nateware,30a4,gameend

2014-03-17T09:55:41-07:00,nateware,30a4,gamestart

2014-03-17T09:57:12-07:00,nateware,6ebd,levelup

2014-03-17T09:58:50-07:00,nateware,6ebd,levelup

2014-03-17T09:59:52-07:00,nateware,6ebd,gameend

Page 37: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts by Type of Activity

create view cohort_by_first_play_date as

select user_id,

date_trunc('month', min(event_date))

as first_month

from events

where action = 'gamestart'

group by user_id;

Page 38: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 39: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Post-Match Heatmaps

Page 40: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Analytics

Batch

• What game modes do people like best?

• How many people have downloaded DLC pack 2?

• Where do most people die on map 4?

• How many daily players are there on average?

Real-Time

• What game modes are people playing now?

• Are more or less people downloading DLC today?

• Are people dying in the same places? Different?

• How many people are playing today? Variance?

Page 41: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Why Real-Time Analytics?

30x in 24 hours

What if you ran a promo?

Page 42: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Tools

Spark

• High-Performance

Hadoop Alternative

• Berkeley.edu

• Compatible with HiveQL

• 100x faster than Hadoop

• Runs on EMR

Kinesis

• Amazon fully-managed

streaming data layer

• Similar to Kafka

• Streams contain Shards

• Each Shard ingests data

up to 1MB/sec, 1000 TPS

• Data stored for 24 hours

Page 43: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Always Batch Due to S3

Back To Basics [Dubstep Remix]

EC2

Page 44: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Stream Data With Kinesis

• Multiple Writers and Readers

• Still Output to Redshift

Need Data Faster!

EC2

Page 45: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Stream Data With Kinesis

• Multiple Writers and Readers

• Still Output to Redshift

• Stream to Spark on EMR

• Storm via Kinesis Spout

• Custom EC2 Workers

Lots of Ins and Outs

EC2

EC2

Page 46: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Data Sources

App.4

[Machine Learning]

AW

S En

dp

oin

t

App.1

[Aggregate & De-Duplicate]

Data Sources

Data Sources

Data Sources

App.2

[Metric Extraction]

S3

DynamoDB

Redshift

App.3 [Sliding

Window Analysis]

Data Sources

Availability

Zone

Shard 1

Shard 2

Shard N

Availability

Zone Availability

Zone

Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion

Page 47: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Putting Data into Kinesis

• Producers use PUT to send data to a Stream

• PutRecord {Data, PartitionKey, StreamName}

• Partition Key distributes PUTs across Shards

• Unique Sequence # returned on PUT call

• Documentation:

http://docs.aws.amazon.com/kinesis/latest/dev/

introduction.html

Page 48: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Writing to a Kinesis Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-amz-Date: <Date> Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-type;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid, Signature=<Signature> User-Agent: <UserAgentString> Content-Type: application/x-amz-json-1.1 Content-Length: <PayloadSizeBytes> Connection: Keep-Alive X-Amz-Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }

Page 49: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Kinesis + Spark

http://aws.amazon.com/articles/4926593393724923

Page 50: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Death in Real-Time

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"}

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"}

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":20,"victim":37,"coord":"71,473,20"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":21,"victim":19,"coord":"332,381,17"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":0,"victim":10,"coord":"14,108,25"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}

Page 51: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Heatmaps

Page 52: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

But A Bow On It

• Collect data from the start

• Store it even if you can't process it (yet)

• Start simple – S3 + Redshift

• Add data sources – process with EMR

• Real-time – Kinesis + Spark

• Tons of untapped potential for gaming

Page 53: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Fallback Plan

Cheers – Nate Wiger @nateware