Dealing with Frauds at Appsflyer

Post on 16-Apr-2017

203 views 1 download

Transcript of Dealing with Frauds at Appsflyer

Click FraudsDealing with frauds at Appsflyer

Business

Advertiser

Business

Advertiser

Publisher

Business

Advertiser

Publisher

Business

Advertiser

Publisher

Click

Business

Advertiser

Publisher

Click

Install

Business

Advertiser

Publisher

Click

Install

Who’s the bad guy?

Advertiser

Publisher

Click

Install

Who’s the bad guy?

Advertiser

Publisher

Click

Install

What advertiser pays for?

Cost per impression

Cost per click

Cost per install

Cost per action

What advertiser pays for?

Cost per impression

Cost per click

Cost per install

Cost per action

What advertiser pays for?

Cost per impression

Cost per click

Cost per install

Fraud techniques of a different league

Less fraudulent installs than clicks/views as CPI is usually much higher

Cost per action

Fraud methods

Fraud methods

Programmatic (bots)

Fraud methods

Programmatic (bots)Humans

Fraud detection methods

Rule-based

Need expert knowledge of past fraud behaviour

Highly effective at detecting known fraud types

Ineffective at new types

Anomaly detection

Good for new kinds of deviations

Not good for known types of fraud

Supervised learning

Need examples of past fraud

Can be effective at detecting similar occurrences

Ineffective at new types of fraud

Rule-based

Unrecognized user agent string

Mozilla/4.0 (compatible; MSIE 4.5; Windows 98; )

Wrong IMEI

Too many applications installed from the same device

Frequent re-installs on a specific device

Save device installs from many different geographical locations

Inadequately short time between click and install

iOS app install receipt can’t be validated by iTunes

Anomaly detection

k-means clustering

Anomaly detection

k-means clustering

Anomaly detection

k-means clustering

Choosing features

Normally distributed values (or half-normally)

Normalizing data

Custom normalizer

StandardScaler (Spark >= 1.4)

Choose number of clusters

Iterate on different clusters number

Evaluate “clustering score”

Build k-means model

Find vectors with P(x) < 𝝴

k-means clustering

k-means clustering - parsing

k-means clustering - feature selection

k-means clustering - finding K

k-means clustering - find anomalies

Supervised learning

Logistic regression

Decision tree

Random forests

Training set {x1, x2, …., xN} -> E

Train the model

Validate, then train again..

Test

Apply!

Action items

Drop fraudulent requests

Pros:

Less traffic goes through the system

Cons:

False positives

Must capture all the frauds as they come in

Mark transactions, which are fraud (in our opinion)

Pros:

Let customer decide what to do

Allows offline fraud detection

Mixed approach

Thank you!