Fraud Detection for Israel BigThings Meetup

41
Real Time Anomaly Detection Patterns and reference architectures Gwen Shapira, System Architect

Transcript of Fraud Detection for Israel BigThings Meetup

Page 1: Fraud Detection  for Israel BigThings Meetup

Real Time Anomaly DetectionPatterns and reference architectures

Gwen Shapira, System Architect

Page 2: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Overview• Intro• Review Problem• Quick overview of key technology• High level architecture• Deep Dive into NRT Processing• Completing the Puzzle – Micro-batch, Ingest and Batch

Page 3: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Gwen Shapira• 15 years of moving data• Formerly consultant, engineer• System Architect @ Confluent• Kafka Committer• @gwenshap

Page 4: Fraud Detection  for Israel BigThings Meetup

There’s a Book on That

Page 5: Fraud Detection  for Israel BigThings Meetup

Founded by creators of Kafka - @jaykreps, @nehanarkhede, @junrao

We help you gather, transport, organize, and analyze all of your stream data

What we offer• Confluent Platform• Kafka plus critical bug fixes not yet applied in Apache release• Kafka ecosystem projects• Enterprise support• Training and Professional Services

Page 6: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

The Problem

Page 7: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Credit Card Transaction Fraud

Page 8: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Coupon Fraud

Page 9: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Video Game Strategy

Page 10: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Health Insurance Fraud

Page 11: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

How do we React• Human Brain at Tennis

• Muscle Memory• Reaction Thought• Reflective Meditation

Page 12: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Overview of Key Technologies

Page 13: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All Rights Reserved.

Kafka

Page 14: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

The Basics

• Messages are organized into topics

• Producers push messages• Consumers pull messages• Kafka runs in a cluster. Nodes are called brokers

Page 15: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Topics, Partitions and Logs

Page 16: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Each partition is a log

Page 17: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Each Broker has many partitions

Partition 0 Partition 0

Partition 1 Partition 1

Partition 2

Partition 1

Partition 0

Partition 2 Partion 2

Page 18: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Producers load balance between partitions

Partition 0

Partition 1

Partition 2

Partition 1

Partition 0

Partition 2

Partition 0

Partition 1

Partion 2

Client

Page 19: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Producers load balance between partitions

Partition 0

Partition 1

Partition 2

Partition 1

Partition 0

Partition 2

Partition 0

Partition 1

Partion 2

Client

Page 20: Fraud Detection  for Israel BigThings Meetup

Consumers

Consumer Group Y

Consumer Group X

Consumer

Kafka Cluster

Topic

Partition A (File)

Partition B (File)

Partition C (File)

Consumer

Consumer

Consumer

Order retained with in partition

Order retained with in partition but not over

partitionsOff

Set

X

Off S

et X

Off S

et X

Off S

et Y

Off S

et Y

Off S

et Y

Off sets are kept per consumer group

Page 21: Fraud Detection  for Israel BigThings Meetup

Consumer-Producer Pattern

Page 22: Fraud Detection  for Israel BigThings Meetup

Keeping Things Simple• Consume records from Kafka Topic• Filter, transform, join, lookups, aggregate• Write to another Kafka Topic• https://github.com/confluentinc/examples/tree/master/specifi

c-avro-consumer

Page 23: Fraud Detection  for Israel BigThings Meetup

Kafka Makes Streams Easy• Producers partition the data• Consumers load balance partitions• Add / remove consumers any way you want• Will work with any framework (or none!)

Page 24: Fraud Detection  for Israel BigThings Meetup

Coming Soon to Kafka Near You

• KafkaConnect - Export / Import for Kafka - 0.9.0 (Its here!)• KStream

• Consumer-Producer client - Processor (0.10.0 - April?)• DSLs:

• KStream (a bit like Spark) - (0.10.0 - April?)• SQL - ???

Page 25: Fraud Detection  for Israel BigThings Meetup

KConnect - Its a thing• Easy to add connectors to Kafka• Existing connectors

• JDBC• HDFS• MySQL * 2• ElasticSearch * 4• Cassandra• S3 * 2• MQTT• Twitter

Page 26: Fraud Detection  for Israel BigThings Meetup
Page 27: Fraud Detection  for Israel BigThings Meetup

• Kafka Connectors:• http://www.confluent.io/developers/connectors• http://docs.confluent.io/2.0.0/connect/index.html

• KStreams:• https://github.com/gwenshap/kafka-examples/blob/master/

KafkaStreamsAvg

Page 28: Fraud Detection  for Israel BigThings Meetup

SparkStreaming

Page 29: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Spark Example1. val conf = new SparkConf().setMaster("local[2]”)

2. val sc = new SparkContext(conf)

3. val lines = sc.textFile(path, 2)

4. val words = lines.flatMap(_.split(" "))

5. val pairs = words.map(word => (word, 1))

6. val wordCounts = pairs.reduceByKey(_ + _)

7. wordCounts.print()

Page 30: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Spark Streaming Example1. val conf = new SparkConf().setMaster("local[2]”)

2. val ssc = new StreamingContext(conf, Seconds(1))

3. val lines = ssc.socketTextStream("localhost", 9999)

4. val words = lines.flatMap(_.split(" "))

5. val pairs = words.map(word => (word, 1))

6. val wordCounts = pairs.reduceByKey(_ + _)

7. wordCounts.print()

8. SSC.start()

Page 31: Fraud Detection  for Israel BigThings Meetup

Spark Streaming

Confidentiality Information Goes Here

DStream

DStream

DStream

Single Pass

Source Receiver RDD

Source Receiver RDD

RDD

Filter Count Print

Source Receiver RDD

RDD

RDD

Single Pass

Filter Count Print

Pre-first Batch

First Batch

Second Batch

Page 32: Fraud Detection  for Israel BigThings Meetup

Confidentiality Information Goes Here

DStream

DStream

DStream

Single Pass

Source Receiver RDD

Source Receiver RDD

RDD

Filter Count

Print

Source Receiver RDD

RDD

RDD

Single Pass

Filter Count

Pre-first Batch

First Batch

Second Batch

Stateful RDD 1

Print

Stateful RDD 2

Stateful RDD 1

Page 33: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

High Level Architecture

Page 34: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Real-Time Event Processing Approach

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents

Hbase / Memory

Spark Streaming

HDFS

Hive/ImpalaMap/

ReduceSpark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT Changes and

Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

Adjust NRT Statistics

Page 35: Fraud Detection  for Israel BigThings Meetup

Yarn / Mesos

Analytics Layer

SolR

ClientClientKStreams

Analytical Adjustment

s and Pattern

detection

Fetching & Updating Profiles

Adjusting NRT Stats Batch Time Adjustments

Review of NRT

Changes and

CountersLocal Cache

Kafka

Clients:(Swipe here!)

Web App

Kafka

HDFS

NoSQL

DWH

Connecor

Connector

Page 36: Fraud Detection  for Israel BigThings Meetup

KStreamProcessor

Profile Updates

Model Updates

Transactions

Local Store

Decisions

DWH

RedoLog

KStreamProcessorKStreamProcessor

Page 37: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

NRT Processing

Page 38: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Focus on NRT First

Hadoop Cluster IIStorage Processing

SolR

Hadoop Cluster I

ClientClientProcessor

Hbase / Memory

Spark Streaming

HDFS

Hive/ImpalaMap/

ReduceSpark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT Changes and

Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

Adjust NRT Statistics

Page 39: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Streaming Architecture – NRT Event Processing

Kafka

Initial Events TopicEvent Processing Logic

Local Memory

HBase Client

Kafka

Answer Topic

HBase

Kafk

a Co

nsum

er

Kafk

a Pr

oduc

er

Able to respond with in 10s of milliseconds

Page 40: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Partitioned NRT Event Processing

Kafka

Initial Events Topic

Event Processing Logic

Local Cache

HBase Client

Kafka

Answer Topic

HBase

Kafk

a Co

nsum

er

Kafk

a Pr

oduc

er

TopicPartition A

Partition B

Partition C

Producer

Partitioner

Producer

Partitioner

Producer

Partitioner

Custom Partitioner

Better use of local memory

Page 41: Fraud Detection  for Israel BigThings Meetup

©2014 Cloudera, Inc. All rights reserved.

Questions?http://confluent.io

@confluentInc@gwenshap

[email protected]