Fraud Detection Architecture

45
Real Time Fraud Detection Patterns and reference architectures Ted Malaska // PSA Gwen Shapira // Software Engineer

Transcript of Fraud Detection Architecture

Real Time Fraud DetectionPatterns and reference architectures

Ted Malaska // PSA Gwen Shapira // Software Engineer

2

• Intro• Review Problem• Quick overview of key technology• High level architecture• Deep Dive into NRT Processing• Completing the Puzzle – Micro-batch, Ingest and Batch

Overview

©2014 Cloudera, Inc. All rights reserved.

3©2014 Cloudera, Inc. All rights reserved.

• 15 years of moving data• Formerly consultant• Now Cloudera Engineer:– Sqoop Committer– Kafka– Flume

• @gwenshap

Gwen Shapira

4

• Ted Malaska (PSA at Cloudera)

• Hadoop for ~5 years

• Contributed to – HDFS, MapReduce, Yarn, HBase, Spark, Avro, – Kite, Pig, Navigator, Cloudera Manager, Flume, Kafke, Sqoop, Accumulo – And working on a Sentry Patch

• Co-Author to O’Reilly Hadoop Application Architectures

• Worked with about 70 companies in 8 countries

• Marvel Fan Boy

• Runner

Hello

©2014 Cloudera, Inc. All rights reserved.

5

The Problem

©2014 Cloudera, Inc. All rights reserved.

6

Credit Card Transaction Fraud

©2014 Cloudera, Inc. All rights reserved.

7

Ikea Meat Balls

©2014 Cloudera, Inc. All rights reserved.

8

Coupon Fraud

©2014 Cloudera, Inc. All rights reserved.

9

Video Game Strategy

©2014 Cloudera, Inc. All rights reserved.

10

Health Insurance Fraud

©2014 Cloudera, Inc. All rights reserved.

11

• Typical Atomic Card Fraud Detection• Ikea Meat Ball• Multi Coupons Combinations • OP or Negative Video Games Strategies • Ad Serving • Health Insurance Fraud• Kid Coming Home From School

Review of the Problem

©2014 Cloudera, Inc. All rights reserved.

12

How do we React

• Human Brain at Tennis – Muscle Memory– Reaction Thought– Reflective Meditation

©2014 Cloudera, Inc. All rights reserved.

13

Overview of Key Technologies

©2014 Cloudera, Inc. All rights reserved.

14

Kafka

©2014 Cloudera, Inc. All Rights Reserved.

15©2014 Cloudera, Inc. All rights reserved.

•Messages are organized into topics•Producers push messages•Consumers pull messages• Kafka runs in a cluster. Nodes are called brokers

The Basics

16©2014 Cloudera, Inc. All rights reserved.

Topics, Partitions and Logs

17©2014 Cloudera, Inc. All rights reserved.

Each partition is a log

18©2014 Cloudera, Inc. All rights reserved.

Each Broker has many partitions

Partition 0 Partition 0

Partition 1 Partition 1

Partition 2

Partition 1

Partition 0

Partition 2 Partion 2

19©2014 Cloudera, Inc. All rights reserved.

Producers load balance between partitions

Partition 0

Partition 1

Partition 2

Partition 1

Partition 0

Partition 2

Partition 0

Partition 1

Partion 2

Client

20©2014 Cloudera, Inc. All rights reserved.

Producers load balance between partitions

Partition 0

Partition 1

Partition 2

Partition 1

Partition 0

Partition 2

Partition 0

Partition 1

Partion 2

Client

21©2014 Cloudera, Inc. All rights reserved.

Consumers

Consumer Group Y

Consumer Group X

Consumer

Kafka Cluster

Topic

Partition A (File)

Partition B (File)

Partition C (File)

Consumer

Consumer

Consumer

Order retained with in partition

Order retained with in partition but not over

partitionsO

ff S

et

X

Off

Set

X

Off

Set

X

Off

Set

YO

ff S

et

YO

ff S

et

Y

Off sets are kept per consumer group

22

Flume

23

Sources Interceptors Selectors Channels Sinks

Flume Agent

Short Intro to FlumeTwitter, logs, JMS, webserver, Kafka

Mask, re-format, validate…

DR, criticalMemory, file,

KafkaHDFS, HBase,

Solr

24

Flume and/or Kafka

©2014 Cloudera, Inc. All rights reserved.

Flume

UpStream

Flume Source

Interceptor

Flume Channel

Flume Sink

Down Stream

SelectorCan Be KafkaCan Be KafkaCan Be Kafka

25©2014 Cloudera, Inc. All rights reserved.

Interceptors

• Mask fields• Validate information against external source• Extract fields• Modify data format• Filter or split events

26

SparkStreaming

27

Spark Streaming Example

©2014 Cloudera, Inc. All rights reserved.

1. val conf = new SparkConf().setMaster("local[2]”)

2. val ssc = new StreamingContext(conf, Seconds(1))

3. val lines = ssc.socketTextStream("localhost", 9999)

4. val words = lines.flatMap(_.split(" "))

5. val pairs = words.map(word => (word, 1))

6. val wordCounts = pairs.reduceByKey(_ + _)

7. wordCounts.print()

8. SSC.start()

28

Spark Streaming Example

©2014 Cloudera, Inc. All rights reserved.

1. val conf = new SparkConf().setMaster("local[2]”)

2. val sc = new SparkContext(conf)

3. val lines = sc.textFile(path, 2)

4. val words = lines.flatMap(_.split(" "))

5. val pairs = words.map(word => (word, 1))

6. val wordCounts = pairs.reduceByKey(_ + _)

7. wordCounts.print()

29Confidentiality Information Goes Here

DStream

DStream

DStream

Spark Streaming

Single Pass

Source Receiver RDD

Source Receiver RDD

RDD

Filter Count Print

Source Receiver RDD

RDD

RDD

Single Pass

Filter Count Print

Pre-first Batch

First Batch

Second Batch

30Confidentiality Information Goes Here

DStream

DStream

DStreamSpark Streaming

Single Pass

Source Receiver RDD

Source Receiver RDD

RDD

Filter Count

Print

Source Receiver RDD

RDD

RDD

Single Pass

Filter Count

Pre-first Batch

First Batch

Second Batch

Stateful RDD 1

Print

Stateful RDD 2

Stateful RDD 1

31

Spark Streaming and HBase

©2014 Cloudera, Inc. All rights reserved.

Driver

Walker Node

Configs

Executor

Static Space

Configs

HConnection

Tasks Tasks

Walker Node

Executor

Static Space

Configs

HConnection

Tasks Tasks

32

High Level Architecture

©2014 Cloudera, Inc. All rights reserved.

33

Real-Time Event Processing Approach

©2014 Cloudera, Inc. All rights reserved.

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents Hbase /

Memory

Spark Streamin

g

HDFS

Hive/ImpalaMap/

Reduce

Spark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

Adjusting NRT Stats

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT

Changes and Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

34

NRT Processing

©2014 Cloudera, Inc. All rights reserved.

35

Focus on NRT First

©2014 Cloudera, Inc. All rights reserved.

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents Hbase /

Memory

Spark Streamin

g

HDFS

Hive/ImpalaMap/

Reduce

Spark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

Adjusting NRT Stats

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT

Changes and Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

NRT Event Processing with Context

36

Streaming Architecture – NRT Event Processing

©2014 Cloudera, Inc. All rights reserved.

Flume Source

Flume Source

Kafka

Initial Events Topic

Flume Source

Flume Interceptor

Event Processing Logic

Local Memory

HBase Client

Kafka

Answer Topic

HBase

Kafk

a C

onsu

mer

Kafk

a P

roduce

r

Able to respond with in 10s of milliseconds

37

Partitioned NRT Event Processing

©2014 Cloudera, Inc. All rights reserved.

Flume Source

Flume Source

Kafka

Initial Events Topic Flume Source

Flume Interceptor

Event Processing Logic

Local Memory

HBase Client

Kafka

Answer Topic

HBase

Kafk

a C

onsu

mer

Kafk

a P

roduce

r

Topic

Partition A

Partition B

Partition C

Producer

Partitio

ner

Producer

Partitio

ner

Producer

Partitio

ner

Custom Partitioner

Better use of local memory

38

Completing the Puzzle

©2014 Cloudera, Inc. All rights reserved.

39

Micro Batching

©2014 Cloudera, Inc. All rights reserved.

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents Hbase /

Memory

Spark Streamin

g

HDFS

Hive/ImpalaMap/

Reduce

Spark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

Adjusting NRT Stats

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT

Changes and Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

Micro Batching

Micro BatchingMicro Batching

40

Complex Topologies

©2014 Cloudera, Inc. All rights reserved.

Kafka

Initial Events Topic

Spark Streaming

Kafk

a D

irect

C

onnect

ion

Dag Topologies

Kafka

Initial Events Topic

Spark Streaming

Kafka Receivers Dag Topologies

Kafka Receivers

Kafka Receivers

• Manages Offset• Stores Offset is RDD• No longer needs HDFS for initial RDD check

pointing

• Lets Kafka Manage Offsets• Uses HDFS for initial RDD recovery

1.3

1.2

41©2014 Cloudera, Inc. All rights reserved.

MicroBatch Bad-Input Handling

0 1 2 3 4 5 6 7 8 910

11

12

13

Kafka – incoming events topic

Dag Topologies

0 1 2 3 4 5 6 7 8 910

11

12

13

Kafka – bad events topic

0 1 2 3 4 5 6 7 8 910

11

12

13

Kafka – resolved events topic

0 1 2 3 4 5 6 7 8 910

11

12

13

Kafka – results topic

42

Ingestion

©2014 Cloudera, Inc. All rights reserved.

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents Hbase /

Memory

Spark Streamin

g

HDFS

Hive/ImpalaMap/

Reduce

Spark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

Adjusting NRT Stats

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT

Changes and Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

Ingestion

Ingestion

43

Ingestion

©2014 Cloudera, Inc. All rights reserved.

Flume HDFS SinkKafka Cluster

Topic

Partition A

Partition B

Partition C

Sink

Sink

Sink

HDFS

Flume SolR SinkSink

Sink

SinkSolR

Flume Hbase SinkSink

Sink

SinkHBase

44

Reflective Thoughts

©2014 Cloudera, Inc. All rights reserved.

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents Hbase /

Memory

Spark Streamin

g

HDFS

Hive/ImpalaMap/

Reduce

Spark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

Adjusting NRT Stats

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT

Changes and Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

Research and Searching

©2014 Cloudera, Inc. All rights reserved.