Download - Game Analytics with AWS - GDC 2014

Transcript
Page 1: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Game Analytics with AWS

Or, How to learn what your players love so they will love your game

Nate Wiger @nateware | Principal Gaming Solutions Architect

Page 2: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Mobile Game Landscape

• Free To Play

• In-App Purchases

• Long-Tail

• Cross-Platform

• Go Global

• User Retention = Revenue

Page 3: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Projected Mobile App Revenue

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

2011 2012 2013 2014 2015 2016 2017

Ads

IAP

Paid

Source:

Gartner

Page 4: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Winning at Free to Play

• Phase 1: Collect Data

• Phase 2: Analyze

• Phase 3: Profit

Page 5: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Analyze What?

Emotions

• Enjoying game

• Engaged

• Like/dislike new content

• Stuck on a level

• Bored

• Abandonment

Behaviors

• Hours played day/week

• Number of sessions/day

• Level progression

• Friend invites/referrals

• Response to mobile push

• Money spent/week

Page 6: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (One Metric)

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

# of Tries

Page 7: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (Two Metrics)

0

10

20

30

40

50

60

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

% Highest Level # of Tries

Page 8: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Key Takeaways

• Multiple data sources

• Correlate variables

• Deltas vs absolutes

• Settle on terminology (game vs level)

• Time matters

Page 9: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 10: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Events & Metrics

• Event = Moment in Time – Login/quit

– Game start/end

– Level up

– In-app purchase

• Metrics = What to Measure – KISS

– Numbers

– Booleans

– Strings (Enums)

• Always Include (ALWAYS) – User

– Action

– Session (context-dependent)

– Timestamp in ISO8601 2014-03-16T16:28:26

Page 11: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Off The Shelf Analytics

• Easy To Integrate

• Pre-Baked Reports

• Rate Limits

• Retention Windows

• Data Lock-In

Page 12: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest Store Process Analyze

Page 13: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest

• HTTP PUT

• Kafka

• Kinesis

• Scribe

Store

• S3

• DynamoDB

• HDFS

• Redshift

Process

• EMR (Hadoop)

• Spark

• Storm

Analyze

• Tableau

• Pentaho

• Jaspersoft

Page 14: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Write Events File on Device

• Periodically Upload to S3

• Process into Redshift

• Point GUI Tool to Redshift

Start Simple

2014-01-24,nateware,e4df,login 2014-01-24,nateware,e4df,gamestart 2014-01-24,nateware,e4df,gameend 2014-01-25,nateware,a88c,login 2014-01-25,nateware,a88c,friendlist 2014-01-25,nateware,a88c,gamestart

Profit!

Page 15: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift at a Glance

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

• Leader Node – SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes – Columnar table storage

– Load, backup, restore via Amazon S3

– Parallel load from Amazon DynamoDB

• Single node version available

Page 16: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Tableau + Redshift

Page 17: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Plumbing

① Create S3 bucket ("mygame-analytics-events")

② Request a security token for your mobile app: http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html

③ Upload data from your users' devices

④ Run a scheduled copy to Redshift

⑤ Setup Tableau to access Redshift

⑥ Go to the Beach

Page 18: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from S3

copy events from 's3://mygame-analytics-events' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>' delimiter=',';

Scheduled Redshift Load using Data Pipeline:

http://aws.amazon.com/articles/1143507459230804

Page 19: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Also Collect Server Logs

• Periodically Upload to S3

• Stuff into Redshift

• External Analytics Data Too

More Data Sources

EC2

External

Analytics

Page 20: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Logrotate to S3

/var/log/apache2/*.log { sharedscripts postrotate sudo /usr/sbin/apache2ctl graceful s3cmd sync /var/log/*.gz s3://mygame-logs/ endscript }

Blog Entry on Log Rotation:

http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/

And/or, Use ELB Access Logs:

http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/acce

ss-log-collection.html

Page 21: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Different File Formats

• Device vs Apache vs CDN

• Cleanup with EMR Job

• Output to Clean Bucket

• Load into Redshift

Dealing With Messy Data

EC2

Page 22: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift vs Elastic MapReduce

Redshift

• Columnar DB

• Familiar SQL

• Structured Data

• Batch Load

• Faster to Query

• Long-term Storage

Elastic MapReduce

• Hadoop

• Hive/Pig are SQL-like

• Unstructured Data

• Streaming Loop

• Scales > PB's

• Transient

Page 23: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Integrate Game DB

• Load Directly into Redshift

• Redshift does Intelligent Merge

• Tracks Hash Keys, Columns

Direct From DynamoDB

EC2

Page 24: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Integrate Game DB

• Load Directly into Redshift

• Redshift does Intelligent Merge

• Tracks Hash Keys, Columns

• Or Stream into EMR

Direct From DynamoDB

EC2

Page 25: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from DynamoDB

copy games

from 'dynamodb://games'

credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>';

copy events from 's3://mygame-analytics-events' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-access-key>' delimiter=',';

Page 26: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 27: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Funnel Cake

Page 28: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Back To Basics

2014-01-24,nateware,e4df,login 2014-01-24,nateware,e4df,gamestart 2014-01-24,nateware,e4df,gameend 2014-01-25,nateware,a88c,login 2014-01-25,nateware,a88c,friendlist 2014-01-25,nateware,a88c,gamestart

Page 29: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Measure Retention: Repeated Plays

create view events_by_user_by_month as

select user_id,

date_trunc('month', event_date)

as month_active,

count(*) as total_events

from events

group by user_id, month_active;

Page 30: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

First-Pass Retention – Too Noisy

05

10152025303540

# Play Sessions / Month

nateware

Lazyd0g

AK187

3strikes

Page 31: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts & Cambria

• Enables calculating relative metrics

• Group users by a common attribute – Month game installed

– Demographics

• Run analysis by cohort – Join with metrics

• Use Redshift as it's SQL – Example of where SQL is a good fit

Page 33: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Retention by Cohort – Join Events with Cohort

0

5

10

15

20

25

Week 1 Week 2 Week 3 Week 5 Week 6 Week 7

# Sessions / Week

2013-11

2013-12

2014-01

2014-02

2014-03

2014-04

Page 34: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Moar Cohorts

• Define multiple cohorts – By activity, time, demographics

– As many as you like

• Change cohort depending on analysis

• Join same metrics with different cohorts – Retention by date

– Retention by demographic

– Retention by average plays/month quartile

Page 35: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-03-17T09:52:08-07:00,nateware,e4b5,login

2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart

2014-03-17T09:53:15-07:00,nateware,e4b5,levelup

2014-03-17T09:54:06-07:00,nateware,e4b5,gameend

2014-03-17T09:54:23-07:00,nateware,30a4,gamestart

2014-03-17T09:55:14-07:00,nateware,30a4,gameend

2014-03-17T09:55:41-07:00,nateware,30a4,gamestart

2014-03-17T09:57:12-07:00,nateware,6ebd,levelup

2014-03-17T09:58:50-07:00,nateware,6ebd,levelup

2014-03-17T09:59:52-07:00,nateware,6ebd,gameend

Page 36: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-03-17T09:52:08-07:00,nateware,e4b5,login

2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart

2014-03-17T09:53:15-07:00,nateware,e4b5,levelup

2014-03-17T09:54:06-07:00,nateware,e4b5,gameend

2014-03-17T09:54:23-07:00,nateware,30a4,gamestart

2014-03-17T09:55:14-07:00,nateware,30a4,gameend

2014-03-17T09:55:41-07:00,nateware,30a4,gamestart

2014-03-17T09:57:12-07:00,nateware,6ebd,levelup

2014-03-17T09:58:50-07:00,nateware,6ebd,levelup

2014-03-17T09:59:52-07:00,nateware,6ebd,gameend

Page 37: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts by Type of Activity

create view cohort_by_first_play_date as

select user_id,

date_trunc('month', min(event_date))

as first_month

from events

where action = 'gamestart'

group by user_id;

Page 38: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 39: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Post-Match Heatmaps

Page 40: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Analytics

Batch

• What game modes do people like best?

• How many people have downloaded DLC pack 2?

• Where do most people die on map 4?

• How many daily players are there on average?

Real-Time

• What game modes are people playing now?

• Are more or less people downloading DLC today?

• Are people dying in the same places? Different?

• How many people are playing today? Variance?

Page 41: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Why Real-Time Analytics?

30x in 24 hours

What if you ran a promo?

Page 42: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Tools

Spark

• High-Performance

Hadoop Alternative

• Berkeley.edu

• Compatible with HiveQL

• 100x faster than Hadoop

• Runs on EMR

Kinesis

• Amazon fully-managed

streaming data layer

• Similar to Kafka

• Streams contain Shards

• Each Shard ingests data

up to 1MB/sec, 1000 TPS

• Data stored for 24 hours

Page 43: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Always Batch Due to S3

Back To Basics [Dubstep Remix]

EC2

Page 44: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Stream Data With Kinesis

• Multiple Writers and Readers

• Still Output to Redshift

Need Data Faster!

EC2

Page 45: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

• Stream Data With Kinesis

• Multiple Writers and Readers

• Still Output to Redshift

• Stream to Spark on EMR

• Storm via Kinesis Spout

• Custom EC2 Workers

Lots of Ins and Outs

EC2

EC2

Page 46: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Data Sources

App.4

[Machine Learning]

AW

S En

dp

oin

t

App.1

[Aggregate & De-Duplicate]

Data Sources

Data Sources

Data Sources

App.2

[Metric Extraction]

S3

DynamoDB

Redshift

App.3 [Sliding

Window Analysis]

Data Sources

Availability

Zone

Shard 1

Shard 2

Shard N

Availability

Zone Availability

Zone

Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion

Page 47: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Putting Data into Kinesis

• Producers use PUT to send data to a Stream

• PutRecord {Data, PartitionKey, StreamName}

• Partition Key distributes PUTs across Shards

• Unique Sequence # returned on PUT call

• Documentation:

http://docs.aws.amazon.com/kinesis/latest/dev/

introduction.html

Page 48: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Writing to a Kinesis Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-amz-Date: <Date> Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-type;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid, Signature=<Signature> User-Agent: <UserAgentString> Content-Type: application/x-amz-json-1.1 Content-Length: <PayloadSizeBytes> Connection: Keep-Alive X-Amz-Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }

Page 49: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Kinesis + Spark

http://aws.amazon.com/articles/4926593393724923

Page 50: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Death in Real-Time

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"}

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"}

PUT "kills" {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":20,"victim":37,"coord":"71,473,20"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":21,"victim":19,"coord":"332,381,17"}

PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":0,"victim":10,"coord":"14,108,25"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"}

PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}

Page 51: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Heatmaps

Page 52: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

But A Bow On It

• Collect data from the start

• Store it even if you can't process it (yet)

• Start simple – S3 + Redshift

• Add data sources – process with EMR

• Real-time – Kinesis + Spark

• Tons of untapped potential for gaming

Page 53: Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014

Fallback Plan

Cheers – Nate Wiger @nateware