Fraud Detection - SQL R Server - Fraud...Fraud Detection...

Post on 24-Feb-2021

15 views 0 download

Transcript of Fraud Detection - SQL R Server - Fraud...Fraud Detection...


Fraud Detection

• Help data scientists easily build and deploy an online transaction fraud detection solution.

• Fraud detection typically handled as a binary classification problem.

• Class population is severely unbalanced, high volume of transaction.

• When fraudulent transactions discovered, business typically takes measures to block the accounts from transacting to prevent further losses.

Transaction Data

Source: Cortana Analytics Online-Fraud-Detection-Template-1

Data fields Description

transactionID Unique transaction ID

accountID Unique account ID

transactionAmountUSD Transaction amount in USD.

transactionAmount Transaction amount in currency expressed in transactionCurrencyCode

transactionCurrencyCode Currency code of the transaction. 3 alphabet letters, e.g., USD

transactionCurrencyConversionRate Conversion rate to US Dollars. e.g. 1.00000 for USD to USD

transactionDate Date when transaction occurred. Typically in the time zone of the processor.

transactionTime Time when transaction occurred. Typically in the time zone of the processing end.

localHour The hour in local time. Value of 0-23

• Online Fraud- Untagged Transactions.csv• Online Fraud- Fraud Transactions.csv

Steps in Demo

Prepare DataTrain/Predict/Evaluate


SQL Server 2016

Prepare Data: Create SQL Tables

0: Create Tables


•Online Fraud- Untagged Transactions.csv

•Online Fraud- Fraud Transactions.csv




1: Tagging






2: Preprocessing






3: Create Risk Tables




•sql_risk_var: stores the name of variables to be converted and the name of risk tables

•sql_risk_xxx: risk tables for variable xxx.

4: Feature Engineering






•sql_tagged_training: new created features will be appended to original sql_tagged_training table

Tags:• Non fraud – accountID does not appear in the fraud data

table• Fraud – transaction was present in fraud data table• Pre fraud – accountID was in fraud data table but transaction

was not.

Model in Database

5: Model Training

• Input:

• sql_tagged_training

• Output:

• sql_trained_model: stores a serialized model

6: Prediction

• Input:

• sql_trained_model

• sql_tagged_testing

• Output:

• sql_predict_score: stores the predicted score

7: Evaluation

• Input:

• sql_predict_score

• Output

• sql_performance: metrics on account level.

• sql_performance_auc: stores metrics on transaction level: AUC of ROC curve.


Send transaction data to SQL Server Model

Receive predicted probability of fraud

Use predicted probability to interrupt a purchase


Develop in Database from

an R client

Operationalize in Database with T- SQL

Deploy Model

Visualize Data in PowerBI

Develop in R Client IDE

Operationalize In Database

Visualize Data

Power BI

Visualize Predictions

Power BI



Online Fraud Detection Template with SQL Server R Services -

R and SQL code for this demo