Adience - Turning low level behavioural signals into user profiles

Post on 23-Jan-2018

390 views 3 download

Transcript of Adience - Turning low level behavioural signals into user profiles

Turning Low Level Behavioural Signals Into User ProfilesPablo Rosenman, VP Development

Adience

Leading the user-centric mobile revolution

- Harness Deep Learning to profile mobile app users

- Distill user/app interaction to actionable segmentation data

2

3

Adience Insights

4

Adience SDK

- Runs on tens of millions of devices

- Runs in the background, without interfering with device’s

operations

- Collects raw data from the system and environment (according

to available permissions)

- Reduces dimensionality and anonymizes the data

- Sends results to the SDK Server

5

SDK Server

- Receives tens of millions of data submissions from the Mobile

SDK installations per day

- It should be able to scale by two orders of magnitude

- It should handle requests quickly, so as not to hang the client

(i.e. Mobile SDK)

- It should avoid losing data

6

SDK Server (Architecture)

- Data is sent from the mobile SDK to an Apache server running

on EC2

- SDK Server verifies validity of incoming data

- Incoming data gets written immediately (no processing) to S3

Amazon EC2Mobile Client Amazon S3

7

SDK Server (Scaling)

- The ELB balances the load on all the servers

- Auto Scaling will make sure there are enough servers to

handle the load

Amazon EC2

Auto Scaling

Mobile Client Amazon S3Elastic Load Balancer

8

Insights Workflow

- Create insights on the device’s owner when new data

arrives from the device

- Doesn’t have to be real-time (as the data arrives), but

shouldn’t be far behind

9

Insights Workflow (cont.)

- Data report sent by SDK consists of:

- Simple data points requiring simple statistic and arithmetic

operations, for example:

- Device model

- OS version

- More complex data matrices requiring matrix operations,

for example:

- Machine Learning features on time series data

- Machine Learning features on photos10

Insights Workflow (Architecture)

- Simple pattern for streamlined processing server application:

- Read input S3 filename from input SQS

- Read the file from the input S3 bucket, and process it

- Write results to file in output S3 bucket

- Send output S3 filename to output SQS

EC2 Servers

Amazon SQS

S3 Bucket Auto Scaling S3 Bucket

Amazon SQS

11

Insights Workflow (Architecture)

- Aggregate the data from all reports to a single device object

- Create insights from all the device’s aggregated data

- Advantages of architecture:

- Scalability

- Decoupling

Insights Servers

Devices Servers

Amazon SQS

Amazon SQS

Reports S3 Bucket

Insights DynamoDB

Table

Deep Learning Servers (GPU)

Devices S3 Bucket

Amazon SQS

12

Adience Events

13

Events SDK

- Receives events based on user interaction with the app

- Some events are automatically implemented (app was started)

- Custom events are the real driving force (user has made an in-

app purchase for $3.99)

- Events should be sent to the Events Server

14

Events Server

- Receives hundreds of millions of data submissions from the

Mobile SDK installations per day

- It should be able to scale by two orders of magnitude

- It should handle requests quickly, so as not to hang the client

- Analytics engine should work on all data from the last 30 days

- Data should be enriched with the user insights

15

Events Server (Architecture)

- All incoming events are written to a file in the local volume

- Once every hour, we close the file in each instance and ship it

to S3

Amazon EC2

Auto Scaling

Mobile Client Amazon S3Elastic Load Balancer

Amazon EBS

logrotate

16

Insights MapReduce

- At the end of each day, all events from that day are in the

events S3 bucket

- We add to these a “mock event” per report sent to the SDK

Server

- Eventually, we wish to compare all the app’s users in the last 30

days to a subset of those users

17

Insights MapReduce (cont.)

- Using Amazon EMR, we aggregate the data per app, device,

day, and event type

- Example: device 0123, on 2016-01-04, in app Blappy Fird,

purchased in-app goods worth a total of $100

EventsS3 Bucket

Raw2DailyAmazon EMR

Mock EventsS3 Bucket

DailyS3 Bucket

18

Insights MapReduce (cont.)

- Using the Daily data for the last 30 days, we run an additional

EMR to aggregate per app, device, and event type

- Example: device 0123, in app Blappy Fird, purchased in-app

goods worth a total of $1000 (in the last 30 days)

- We enrich the data by adding the device’s insights to each record

DailyS3 Bucket

Daily2AggregateAmazon EMR

AggregateS3 Bucket

Insights DynamoDB

Table

19

Insights MapReduce (cont.)

- Accessing DynamoDB per event type is costly

- We know last day’s users - save them to an in-memory cache

DailyS3 Bucket

Daily2AggregateAmazon EMR

AggregateS3 Bucket

Insights DynamoDB

Table

20

Insights Servers

Insights ElastiCache

Insights MapReduce (cont.)

- Using the Aggregate data for the last 30 days, we run an

additional EMR to aggregate per app, country, age, gender, and

subset type

- Example: app Blappy Fird, in the US, for males aged 25-34

who purchased in-app goods worth a total of more than

$500 (in the last 30 days), 70% are tech savvy, 40% are

commuters, etc.

AggregateS3 Bucket

SubsetS3 Bucket

Aggregate2SubsetAmazon EMR

21

Insights MapReduce (cont.)

22

Insights MapReduce (cont.)

- How can we show data on apps that haven’t integrated us?

- Create a mock event per app that we know is installed on the

device!

EventsS3 Bucket

Raw2DailyAmazon EMR

Mock Events

S3 Bucket

DailyS3 Bucket

Daily2AggregateAmazon EMR

AggregateS3 Bucket

Insights DynamoDB

Table

Aggregate2SubsetAmazon EMR

SubsetS3 Bucket

23

24

Next Generation

25

SDK Server (Next Generation)

Amazon EC2

Auto Scaling

Mobile Client Amazon S3Elastic Load Balancer

Mobile Client Amazon S3Amazon API Gateway

AWS Lambda

26

Insights Workflow (Next Generation)

Insights Servers

Devices Servers

Amazon SQS

Amazon SQS

Reports S3 Bucket

Insights DynamoDB

Table

Deep Learning Servers (GPU)

Devices S3 Bucket

Amazon SQS

Reports S3 Bucket Devices

Lambda

Devices S3 BucketDeep Learning

Servers (GPU)

Amazon SQS

StagingS3 Bucket

InsightsLambda

Insights DynamoDB

Table

27

Events Server (Next Generation)

Amazon EC2

Auto Scaling

Mobile Client Amazon S3Elastic Load Balancer

Amazon EBS

logrotate

Mobile Client Amazon S3Amazon API Gateway

AWS Lambda

Amazon Kinesis

Firehose

28

Bonus:ELK with Amazon

29

ELK with Amazon

- Server code sends logs to local ZMQ process

- ZMQ process then asynchronously sends to Kinesis

- Logstash pulls the Kinesis stream, and writes in batches to

ElasticSearch

Server Code

Amazon KinesisZMQ Logstash Amazon

ElasticSearch

30

We’re Hiring!Server Developer

Full Stack Web Developer

Algorithm Developer

DevOps Engineer

THANK YOUpablo@adience.com