Business Intelligence and Data Mining in Banking · 2009-01-14 · to data to help enterprise users...
Transcript of Business Intelligence and Data Mining in Banking · 2009-01-14 · to data to help enterprise users...
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Professor Dr. Gholamreza NakhaeizadehProfessor Dr. Gholamreza Nakhaeizadeh
Business Intelligence and Data Mining in Banking
An Experience Report
Business Intelligence and Data Mining in Banking
An Experience Report
2
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Content
Part twoPart twoApplication of Data Mining in Banking
• General Aspects
• Application of Data Mining in:
- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management
Success Factors of Data Mining Projects
Application of Data Mining in Banking
• General Aspects
• Application of Data Mining in:
- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management
Success Factors of Data Mining Projects
• An introduction on BI
• Relation between BI and Data Mining
• Why data Mining ?
• What is Data Mining ?
• Data Mining Process
• Data Mining Algorithms
• An introduction on BI
• Relation between BI and Data Mining
• Why data Mining ?
• What is Data Mining ?
• Data Mining Process
• Data Mining Algorithms
Part OnePart One
3
©2008 Gholamreza Nakhaeizadeh. All rights reserved
What is BI ?
- BI is a broad category of Management Information Systems, applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better Business. decisions.
…………
- An effective BI system provides corporations with “one version of the truth”.
…………………………………………….
- BI is a broad category of Management Information Systems, applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better Business. decisions.
…………
- An effective BI system provides corporations with “one version of the truth”.
…………………………………………….
Source: http://www.managementlogs.com/2004/09/what-is-bi-really-multiple-versions-of.html
Some definitions (theoretical point of view)Some definitions (theoretical point of view)
One version of the truth
One version of the truth
4
©2008 Gholamreza Nakhaeizadeh. All rights reserved
What is BI ?
Business Intelligence systems are data-driven DSS.
Business Intelligence systems are data-driven DSS.
de facto : Today BI-Platforms are just Reporting Systems
Often based on OLAP-Tools
de facto : Today BI-Platforms are just Reporting Systems
Often based on OLAP-Tools
(In the praxis)
What about the Intelligence ?
5
©2008 Gholamreza Nakhaeizadeh. All rights reserved
BI-History (1)
Business intelligence was defined in an October 1958 IBM Journal article by Hans Peter LuhnLuhn wrote:
Business intelligence was defined in an October 1958 IBM Journal article by Hans Peter LuhnLuhn wrote:
In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera.
In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera.
The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system.
The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system.
The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."
The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."http://en.wikipedia.org/wiki/Business_intelligence
6
©2008 Gholamreza Nakhaeizadeh. All rights reserved
BI-History (2)
Howard Dresner
The term "business intelligence" was founded in 1989 by Howard Dresner an analyst of Gartner Group. He later created the secondary term business performance management
http://en.wikipedia.org/wiki/Business_intelligence
7
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Main Components of a BI-System
Operating System
Operating System
Flat files
Stagingarea
Stagingarea
Extraction Tools
Extraction Tools
Extraction Tools
Data TransformationData Cleaning
ArchitectureArchitecture
Data Warehouse
Loading Tools
ETL: Extraction, Transformation, LoadingETL: Extraction, Transformation, Loading
Data Mining Tools
Data Mining as an important process of a BI SystemData Mining as an important process of a BI System
Reporting OLAP Tools
8
©2008 Gholamreza Nakhaeizadeh. All rights reservedWhy Data Mining ?own experience
In the Automotive Industry
Why Data Mining ?own experience
In the Automotive Industry
9
©2008 Gholamreza Nakhaeizadeh. All rights reserved
e.g. Supplier datae.g. Supplier data
Automotive Industry is a „data reach“ Industry
picocool.com/go/news/post/car-parts/
www.kautex-group.com/.../automotive.html
saberan.com/product.htm
10
©2008 Gholamreza Nakhaeizadeh. All rights reserved
e.g. Development and Production datae.g. Development and Production data
Automotive Industry is a „data reach“ Industry
11
©2008 Gholamreza Nakhaeizadeh. All rights reserved
e.g. Customer datae.g. Customer data
Automotive Industry is a „data reach“ Industry
12
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Finance dataFinance data
Automotive Industry is a „data reach“ Industry
13
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Focus on data reach main business
Customer RelationshipManagement
QualityManagement
Financial RiskManagement
Service QualityData Quality
Service QualityData Quality
14
©2008 Gholamreza Nakhaeizadeh. All rights reserved
What is Data Mining ?
One of the most used definition (Fayyad et al 1996):
Knowledge Discovery in Databases (KDD) is a process that aims at finding valid, useful, novel and understandable patterns in data
One of the most used definition (Fayyad et al 1996):
Knowledge Discovery in Databases (KDD) is a process that aims at finding valid, useful, novel and understandable patterns in data
Understandable pattern: RulesNon-understandable: Trained artificial neural networks (ANN)
Understandable pattern: RulesNon-understandable: Trained artificial neural networks (ANN)
KDD and Data Mining:
KDD comes originally from AI
Data Mining is a part of KDD
In the praxis KDD and Data Mining are used as synonyms
KDD and Data Mining:
KDD comes originally from AI
Data Mining is a part of KDD
In the praxis KDD and Data Mining are used as synonyms
15
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Simple fictive example; credit risk
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10 low old F bad
16
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10 low old F bad
Simple fictive example; credit risk
17
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10 low old F bad
Simple fictive example; credit risk
18
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Income Car Gender Credit risk
Customer 1 old new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 old new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10 old old F bad
Simple fictive example; credit risk
19
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10 low old F bad
Simple fictive example; credit risk
20
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10 low old F bad
Simple fictive example; credit risk
21
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10low old F bad
If income= high Credit risk=goodIf income= low Credit risk=bad
If income= middle & Car=new Credit risk=good
If income= middle & Car=old Credit risk=bad
ClassifierClassifier
•Credit risk a new customer with high income = good•Credit risk a new customer who has old car
and middle income = bad• .....
•Credit risk a new customer with high income = good•Credit risk a new customer who has old car
and middle income = bad• .....
Credit risk of new CustomersCredit risk of new Customers
This classifier can be regarded as anInductive expert systems
This classifier can be regarded as anInductive expert systems
Simple fictive example; credit risk
22
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Demo : Construction of a “Credit risk Miner”RapidMiner
Open in Workspace: CreditToy.xml
and German_CreditPredict
23
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Statistics
Database Technology
Data Mining
Interdisciplinary aspects of Data Mining
AI (Machine Learning)
Visualization
Privacy
24
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Examples of Data Mining Tools (commercial)
SAS Enterprise MinerSAS Enterprise Miner
Statistica Data MinerStatistica Data MinerSPSS ClementineSPSS Clementine
CARTCART
25
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Examples of Data Mining Tools (Public Domain)
(open source)Ian witten, Frank Eibe: Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)
RapidMiner
26
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Free/Open Source Data Mining Software
Free/Open Source Data Mining SoftwareCommercial SoftwareCommercial Software
What data mining tools have you used for a real project (not just for evaluation) in the past 6 months?
Poll: Data Mining SoftwarePoll: Data Mining Software
• SPSS Clementine ( 74, 53 alone or with SPSS)• Excel (61, 1 alone) • SAS (55, 6 alone or with SAS EM) • KXEN (32, 25 alone) • SAS Enterprise Miner (24, 6 alone or with SAS) • MATLAB (22,1 alone) • SQL Server (20, 2 alone) • Other commercial tools (12)•…….
• Orange (3) • C4.5/C5.0 (8)
• Other free tools (18)• KNIME (30, 14 alone)
• Weka (36, 4 alone)
• R (39, 4 alone)
• RapidMiner (72, 49 alone)
27
©2008 Gholamreza Nakhaeizadeh. All rights reserved
KDD-89: IJCAI-89 workshop on Knowledge Discovery in Databases August 20, 1989, Detroit MI, USA
Dr. Gregory Piatetsky-Shapiro,
History of Data Mining: Data Mining rapid development
Results 1 - 10 of about 15,300,000 for "data mining" [definition]. (0.21 seconds)
Results 1 - 10 of about 15,300,000 for "data mining" [definition]. (0.21 seconds)
28
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Data Mining rapid development
30
©2008 Gholamreza Nakhaeizadeh. All rights reserved
• StatLog• CRISP-DM• INRECA• MetaL• READ• Data Mining
Grid
Some European funded Projects
31
©2008 Gholamreza Nakhaeizadeh. All rights reserved
• KDD • PKDD-ECML• SIAM-Data Mining• ICDM, • PAKDD• ICML•……
• KDD • PKDD-ECML• SIAM-Data Mining• ICDM, • PAKDD• ICML•……
• ACM Transactions on KDD (New)• IEEE Transactions On Knowledge and Data Engineering• KDD Explorations • Data Mining and Knowledge Discovery • Machine Learning •…
• ACM Transactions on KDD (New)• IEEE Transactions On Knowledge and Data Engineering• KDD Explorations • Data Mining and Knowledge Discovery • Machine Learning •…
ConferencesConferences
JournalsJournals
32
©2008 Gholamreza Nakhaeizadeh. All rights reserved
DataUnderstanding
DataPreparation
Modelling
BusinessUnderstanding
Deployment
Evaluation
CRISP-DM :
- Provides an overview of the life cycle of a data mining project
- Consists of six phases
- was partially funded by the EuropeanCommission
Data Mining Process
Project Partner:
- CRISP-DM Process Model is described in: http://www.crisp-dm.org/CRISPwP-0800.pdf
33
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Demo process support and data sources
RapidMiner
34
©2008 Gholamreza Nakhaeizadeh. All rights reserved
CRISP-DM: Modeling CRISP-DM: Modeling
Data Mining Process
Task IdentificationTask Identification
ClassificationCredit Risk• Good customer• Bad customer
Prediction
Concept DescriptionCustomers Loyalty :• Age• Income• Education•….
Dependency Analysis
A and B C
Clustering
Deviation detectionBusiness ChallengeBusiness Challenge
35
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Supervised and unsupervised learning
Observations
Attributes Target variable
(Tuples)
36
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Supervised Learning
Examples for Supervised Learning : Classification, PredictionExamples for Supervised Learning : Classification, Prediction
1
2
3......m
a11
a21
a31
am1
a1n
a2n
a3n
amn
a12
a22
a32
am2
a13
a23
a33
am3
.. …. …. …. ….. ….
t1
t2
t3
tm
Nr. A1 A2 A3……… An T
37
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Unsupervised Learning
1
2
3......m
a11
a21
a31
am1
a1n
a2n
a3n
amn
a12
a22
a32
am2
a13
a23
a33
am3
.. …. …. …. ….. ….
t1
t2
t3
tm
Nr. A1 A2 A3……… An T
Example for Unsupervised Learning: ClusteringExample for Unsupervised Learning: Clustering
38
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Supervised and Unsupervised Learning
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10low old F bad
Supervised LearningSupervised Learning
Income Car Gender Credit risk
Customer 1 low new F bad
Customer 2 middle old F bad
Customer 3 middle new M good
Customer 4 low new M bad
Customer 5 high new M good
Customer 6 high new F good
Customer 7 middle new F good
Customer 8 high old F good
Customer 9 middle old M bad
Customer 10low old F bad
Unsupervised LearningUnsupervised Learning
39
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Data Mining Algorithms
Data Mining algorithmsData Mining algorithms
Machine Learning
Rule Based Induction
Decision Trees
Neural Networks
Conceptional clustering…….
Statistics
Discriminant Analysis
Cluster Analysis
Regression Analysis
Logistic RegressionAnalysis
…….
Database Technology
Association Rules….
40
©2008 Gholamreza Nakhaeizadeh. All rights reserved
CRISP-DM: Data PreparationCRISP-DM: Data Preparation
Data Mining Process
Observation Reduction- Sampling- Intelligent Sampling- Learn to forget…….
Observation Reduction- Sampling- Intelligent Sampling- Learn to forget…….
Observations
Attributes
12345678
1 2 3 4 5
Observations
Attributes
12345678
1 2 3 4 5
Data SelectingData Selecting
Attribute ReductionAttribute Reduction
41
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Demos Data PreprocessingRapidMiner
Sampling:RapidMiner: German_CreditTr.amlUse: sampling in preprocessing
Sampling:RapidMiner: German_CreditTr.amlUse: sampling in preprocessing
42
©2008 Gholamreza Nakhaeizadeh. All rights reserved
CRISP-DM: Data PreparationCRISP-DM: Data Preparation
Data Mining Process
Dealing with :
Missing Values- Ignore the observation- Using the attribute mean- Predict the missing value
- Decision tree- Regression- ……..
Inaccurate data- Using Background Knowledge (Rules)
Duplicates- Straße , Strasse, Str. Robert X, Bob X- Professor, Prof. Dr.
Dealing with :
Missing Values- Ignore the observation- Using the attribute mean- Predict the missing value
- Decision tree- Regression- ……..
Inaccurate data- Using Background Knowledge (Rules)
Duplicates- Straße , Strasse, Str. Robert X, Bob X- Professor, Prof. Dr.
84747218471ß223
673374726
462156675
76320852876664638474
7218
471ß
223
673374726
462156675
7632085287666463
84747218471ß2236733
7472
6
4621
5667
5
7632
0852
8766
6463
673374726
462156675
7632085287666463
640928649427737062849
918227365410285396
Data CleaningData Cleaning
43
©2008 Gholamreza Nakhaeizadeh. All rights reserved
CRISP-DM: Data PreparationCRISP-DM: Data Preparation
Data Mining Process
Dealing with Outliers
- Outlier as noise- Outlier detection as interestingfinding
- Outliers Analysis Methods- Model-based outlier detection- Using distance measures- Density-Based local Outlier Detection
84747218471ß223
673374726
462156675
76320852876664638474
7218
471ß
223
673374726
462156675
7632085287666463
84747218471ß2236733
7472
6
4621
5667
5
7632
0852
8766
6463
673374726
462156675
7632085287666463
640928649427737062849
918227365410285396
Data CleaningData Cleaning
44
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Demo: Outlier detection and elimination
1. Demo : Sample Preprocessing Outlier
RapidMinerRapidMiner
45
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Content
Part twoPart twoApplication of Data Mining in Banking
• General Aspects
• Application of Data Mining in:- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management
Success Factors of Data Mining Projects
Application of Data Mining in Banking
• General Aspects
• Application of Data Mining in:- Fraud Detection- Anti Money Laundering - Financial risk management- Customer Relationship Management
Success Factors of Data Mining Projects
46
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Why Data Mining in Banking ?
• Credit Risk
• Market Risk
• Controlling
• Trading
• Portfolio Manag.
• Investm. Manag.
• CRM
•Regulations&Compliance
•….
• Credit Risk
• Market Risk
• Controlling
• Trading
• Portfolio Manag.
• Investm. Manag.
• CRM
•Regulations&Compliance
•….
Business IssuesBusiness Issues
Customer Data
Portfolio Data
Interest Rate Data
Regulation Data
Currency Data
………
DataData• Fraud Detection
• Anti Money Laundering
• Cross-Selling
• Up-Selling
• Churn Management
• Market Forecasting
• ….
• Fraud Detection
• Anti Money Laundering
• Cross-Selling
• Up-Selling
• Churn Management
• Market Forecasting
• ….
DM- ApplicationsDM- Applications
47
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Business Issues: Financial Risk Assessment
Financial Risk Assessment
Financial Risk Assessment
Market Risk AssessmentMarket Risk Assessment Credit Risk AssessmentCredit Risk Assessment
Country Risk AssessmentCountry Risk Assessment Liquidity Risk AssessmentLiquidity Risk Assessment
…………….…………….
Commercial RiskCommercial Risk
Low Risk High Risk Country Risk
Banking SectorRisk
B CCC
Fraud DetectionAnti Money Laundering
Fraud DetectionAnti Money Laundering
48
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Financial Risk Management
Application of Data Mining
inFraud Detection
Application of Data Mining
inFraud Detection
49
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, General Aspects
• Bribery • Embezzlement • Fraud • Extortion• Favouritism • Nepotism
• Bribery • Embezzlement • Fraud • Extortion• Favouritism • Nepotism
CorruptionCorruption
is a criminal deception or the use of false representations to gain an unjust advantage. It covers both bribery and embezzlement
is a criminal deception or the use of false representations to gain an unjust advantage. It covers both bribery and embezzlement
Fight Against FraudFight Against Fraud
PreventionPrevention DetectionDetection• Can’t be perfect• Inconvenient • Expensive
• Can’t be perfect• Inconvenient • Expensive
Identifying fraud as soon as it occurred Identifying fraud as soon as it occurred
50
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Genral Aspects
Types of FraudTypes of Fraud
• Credit Card Fraud• Money Laundering• Insurance Fraud• Telecommunication Fraud• Computer intrusion•……..
• Credit Card Fraud• Money Laundering• Insurance Fraud• Telecommunication Fraud• Computer intrusion•……..
Data Mining can help
detection
Important: Fraud Detection is a continual developing process,because patterns of fraud are dynamic and change over the timeImportant: Fraud Detection is a continual developing process,because patterns of fraud are dynamic and change over the time
• Internal Fraud• External Fraud• Internal Fraud• External Fraud
Fraud Detection systems „are used to catch bad guys doing bad things“Fraud Detection systems „are used to catch bad guys doing bad things“
51
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, why Data Mining ?
Why data mining is needed in Fraud Detection ?Why data mining is needed in Fraud Detection ?
Huge volume of Data; example:
• Over 1.59 Mrd. Visa cards in circulation
• 6800 transactions per second (peaks)
• 20000 members banks
• Millions of merchants (Source: http://www.rgrossman.com/talks/grossman-iciq-07-v4.pdf)
Huge volume of Data; example:
• Over 1.59 Mrd. Visa cards in circulation
• 6800 transactions per second (peaks)
• 20000 members banks
• Millions of merchants (Source: http://www.rgrossman.com/talks/grossman-iciq-07-v4.pdf)
• Performance Challenge
• Storage Challenge
• Performance Challenge
• Storage Challenge
Fast and efficient algorithms
Modern databases technology
Fast and efficient algorithms
Modern databases technology
Data Mining can help
52
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Importance
• A recent survey by KPMG Peat Marwick found that nearly 60 percent of all small business owners reported that their companies have experienced some type of internal financial fraud within their own Employee.
• More than 75 percent of companies surveyed had actually been the victim of employee fraud within the previous 12-month perio
Source: http://www.nfib.com/object/2991852.html
• A recent survey by KPMG Peat Marwick found that nearly 60 percent of all small business owners reported that their companies have experienced some type of internal financial fraud within their own Employee.
• More than 75 percent of companies surveyed had actually been the victim of employee fraud within the previous 12-month perio
Source: http://www.nfib.com/object/2991852.html
Extent of internal fraudExtent of internal fraud
53
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Credit Card FraudExtentExtent
"Credit card fraud costs the industry about a billion dollars a year, or 7 cents out of every $100 spent on plastic. But that is down significantly from its peak about a decade ago, Sorrentino says, in large part because of powerful technology that can recognize unusual spending patterns."
"Credit card fraud costs the industry about a billion dollars a year, or 7 cents out of every $100 spent on plastic. But that is down significantly from its peak about a decade ago, Sorrentino says, in large part because of powerful technology that can recognize unusual spending patterns."
21. July 2002
54
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, IT Impacts
Examples:
• Generating of bogus invoices and paying them to bogus companies
• Most large organizations swap millions into overnight instruments to take advantage of the best interest rates only to swap them back into their working accounts during the day. Skimming a piece of that transaction could be simple.
Examples:
• Generating of bogus invoices and paying them to bogus companies
• Most large organizations swap millions into overnight instruments to take advantage of the best interest rates only to swap them back into their working accounts during the day. Skimming a piece of that transaction could be simple.
http://blogs.zdnet.com/threatchaos/?p=341
• Internal fraud is as old as business
• Internal fraud coupled with IT-savvy is a killer combination
• Since the introduction of the first commercial computer (UNIVAC, on this date in 1951) computers have been used to make the fraudster’s job easier
• Internal fraud is as old as business
• Internal fraud coupled with IT-savvy is a killer combination
• Since the introduction of the first commercial computer (UNIVAC, on this date in 1951) computers have been used to make the fraudster’s job easier
Impact of IT on Fraud perpetrationImpact of IT on Fraud perpetration
55
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Data Mining Methods
There are a lot of data mining methods can be used :Common Characteristic of Data Mining Models used in FD
There are a lot of data mining methods can be used :Common Characteristic of Data Mining Models used in FD
Expected values can be:
• Numerical summaries of some aspect of behavior
• Simple graphical summaries showing
• Multivariate behavior profiles based on past behaviorExample: the way of an account has been used in the past
Expected values can be:
• Numerical summaries of some aspect of behavior
• Simple graphical summaries showing
• Multivariate behavior profiles based on past behaviorExample: the way of an account has been used in the past
*
* based on : http://metalab.uniten.edu.my/~abdrahim/ntl/Statistical%20Fraud%20Detection%20A%20Review.pdf
They are based on Comparing the observed Data with their
expected values
They are based on Comparing the observed Data with their
expected values
56
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Data Mining MethodsClassification of
the methodsClassification of
the methods
Based on Unsupervised Learning
Based on Unsupervised Learning
Based on Supervised Learning
Based on Supervised Learning
Outlier DetectionOutlier Detection
Alerting to the fact that an Observation is anomalous;more likely to be fraudulent
Alerting to the fact that an Observation is anomalous;more likely to be fraudulent
Suspicion ScoreSuspicion Score
• Modeling of a distribution normal behavior
• Detection of observations with greatest deviation from this norm
• Modeling of a distribution normal behavior
• Detection of observations with greatest deviation from this norm
57
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Suspicion Scores
observation ordered Suspicion score
o1
o2
o3
…..…..
s1
s2
s3
…..…..
Regarding analyzing cost,more attention should be paid to observations with highest scores
Regarding analyzing cost,more attention should be paid to observations with highest scores
• Compromise between the cost of detecting and saving reached
• Problematic of fraud publicity
• Damaging the customer relationin the case of false positive
• Compromise between the cost of detecting and saving reached
• Problematic of fraud publicity
• Damaging the customer relationin the case of false positive
Difficult to find case studiestogether with the used data
Difficult to find case studiestogether with the used data
58
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Example
Source: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningPages: 432 - 437
59
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Example, observation the expenditure and number of transactions
t t+n time
Change the behavior in the group
Change the behavior in the group
Change the behavior of a unique observation
Change the behavior of a unique observation
60
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Case Study , REVI-MINER
In order to get the refund of vehicle repair costs, workshops of DaimlerChrysler AG worldwideregularly submit the warranty and goodwill cost statements to the central warranty department
in Germany. These statements should be examined for validity and correctness, which is a very complex task for the warranty cost controllers.
REVI-MINER is a KDD-environment which supports the detection and analysis of deviations in warranty and goodwill cost statements. The system is developed within a cooperation between DaimlerChrysler Research & Technology and the direction Global Service and Parts (GSP) and is based upon the CRISP-DM methodology as a widely accepted process model for the solution of Data Mining problems.
Furthermore we have implemented different approaches based on Machine Learning and Statistics that can be used for data cleaning in the preprocessing phase. The applied Data Mining models are developed by using a statistical deviation detection approach. The tool supports the controller within his task to audit the authorized workshops.
In order to get the refund of vehicle repair costs, workshops of DaimlerChrysler AG worldwideregularly submit the warranty and goodwill cost statements to the central warranty department
in Germany. These statements should be examined for validity and correctness, which is a very complex task for the warranty cost controllers.
REVI-MINER is a KDD-environment which supports the detection and analysis of deviations in warranty and goodwill cost statements. The system is developed within a cooperation between DaimlerChrysler Research & Technology and the direction Global Service and Parts (GSP) and is based upon the CRISP-DM methodology as a widely accepted process model for the solution of Data Mining problems.
Furthermore we have implemented different approaches based on Machine Learning and Statistics that can be used for data cleaning in the preprocessing phase. The applied Data Mining models are developed by using a statistical deviation detection approach. The tool supports the controller within his task to audit the authorized workshops.
REVI-MINER, a KDD Environment for Deviation Detection and Analysis of Warranty and Goodwill Cost Statements
in the Automotive IndustryE. Hotz, W. Heuser, U. Grimmer,G. Nakhaeizadeh, M. Wieczorek
REVI-MINER, a KDD Environment for Deviation Detection and Analysis of Warranty and Goodwill Cost Statements
in the Automotive IndustryE. Hotz, W. Heuser, U. Grimmer,G. Nakhaeizadeh, M. Wieczorek
AbstractAbstract
Source: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningPages: 432 - 437
61
©2008 Gholamreza Nakhaeizadeh. All rights reserved
REVI-MINEROwn Experience
Refunding of vehicle repair costs
workshops worldwide regularly submit the warranty and goodwill cost statements to the central warranty department in Germany
These statements should be examined for validity and correctness
This is a very complex task for the warranty cost controllers
Refunding of vehicle repair costs
workshops worldwide regularly submit the warranty and goodwill cost statements to the central warranty department in Germany
These statements should be examined for validity and correctness
This is a very complex task for the warranty cost controllers
Business UnderstandingBusiness Understanding
increasing complexity of the product structure:
• different vehicle business divisions (passenger cars, trucks, transporters, busses, …• about 150 vehicle series with several body versions and combustion types• more than twenty production plants
different warranty and goodwill policy for different sales markets and repairareas
increasing complexity of the product structure:
• different vehicle business divisions (passenger cars, trucks, transporters, busses, …• about 150 vehicle series with several body versions and combustion types• more than twenty production plants
different warranty and goodwill policy for different sales markets and repairareas
Problem complexityProblem complexity
62
©2008 Gholamreza Nakhaeizadeh. All rights reserved
REVI-MINER
The old Audit System was a standard system and had the following shortcomings
• Inflexible, not very purposeful , time-consuming• The report generated by the system was a very complicated hardcopytable which had to be processed with difficulty manually.
The old Audit System was a standard system and had the following shortcomings
• Inflexible, not very purposeful , time-consuming• The report generated by the system was a very complicated hardcopytable which had to be processed with difficulty manually.
Old Audit SystemOld Audit System
periodic auditing of workshops within shortening time intervals
fast detection of possibly available abnormalities in the warranty cost statements, analyzingtheir trend and determining which workshop is responsible for these trends
avoidance of false alarms by indicating fraudulent activities that really justify the controlling ofthe workshops
choice from a wide range of parameters while initiating an audit report
visualization of the results
periodic auditing of workshops within shortening time intervals
fast detection of possibly available abnormalities in the warranty cost statements, analyzingtheir trend and determining which workshop is responsible for these trends
avoidance of false alarms by indicating fraudulent activities that really justify the controlling ofthe workshops
choice from a wide range of parameters while initiating an audit report
visualization of the results
Business goal: Developing an audit system allows for:Business goal: Developing an audit system allows for:
63
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Case Study , REVI-MINERData understandingData understandingThe available historical data about warranty and goodwill costs is a part of the database QUIS (QUality Information System) that can be considered as a kind of data warehouse containing information on producedvehicles and their repairs
The available historical data about warranty and goodwill costs is a part of the database QUIS (QUality Information System) that can be considered as a kind of data warehouse containing information on producedvehicles and their repairs
repair workshopsproduction plant
vehicle data
warranty and good-will claims
VEGAclaim processing
technical data
warranty claims data
commercial data
generalvehicledata
QUIS
partstestingdata
64
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Case Study , REVI-MINERData PreparationData Preparation
query 1: general vehicle data
• VIN (vehicle ID number)• date of production• motor type• continent• country⇒ new vehicle series⇒ new motor types for existing vehicle series
query 1: general vehicle data
• VIN (vehicle ID number)• date of production• motor type• continent• country⇒ new vehicle series⇒ new motor types for existing vehicle series
query 2: data on repair
• date of production• date of first permission• date of repair• date of credit note• VIN (vehicle ID number)• dealer number (workshop) ⇒
repair area• total cost• material cost• unit cost• incidental cost•...
query 2: data on repair
• date of production• date of first permission• date of repair• date of credit note• VIN (vehicle ID number)• dealer number (workshop) ⇒
repair area• total cost• material cost• unit cost• incidental cost•...
QUIS
query 1:workshop organization•workshop address•repair authorization for the different vehicle business divisions•affiliation to special workshop subgroups
branch offices•workshop (dealer) number
trade partnersrepresentatives
query 1:workshop organization•workshop address•repair authorization for the different vehicle business divisions•affiliation to special workshop subgroups
branch offices•workshop (dealer) number
trade partnersrepresentatives
VEGA
65
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Case Study , REVI-MINERData Preparation (continues)Data Preparation (continues)
Data CleaningData Cleaning
To check the quality of data the following approaches are developedTo check the quality of data the following approaches are developed
Descriptive statistic approach: Stored (historic) data has been described by descriptive statistics. The descriptions have been compared to values known from the documentation, or other sources than (accuracy Check)
Descriptive statistic approach: Stored (historic) data has been described by descriptive statistics. The descriptions have been compared to values known from the documentation, or other sources than (accuracy Check)
a statistical prototype based on normal distribution assumption (Outlier Detection)a statistical prototype based on normal distribution assumption (Outlier Detection)
Application of GritBot developed by Ross Quinlan (Outlier Detection)Application of GritBot developed by Ross Quinlan (Outlier Detection)**Quinlan, R., GritBot – An informal tutorial, http://www.rulequest.com/gritbot-unix.html, 2000
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Deviation Analysis Deviation Analysis
Criteria chosen for deviation analysisCriteria chosen for deviation analysisDiscussions with the end users showed that the needed criteria to identify and analyze deviations in warranty and goodwill data should cover the main cost types (damage types)
total cost (total number of repairs)
labor cost (number of working hours)
cost for repair material (number of repairs with deployment of repair material)
cost for exchange of vehicle aggregates, e.g. gear unit, air conditioner unit, motor unit (number of repairs with deployment of aggregates)
All criteria by cost and damage types must be calculated for each damage code on the chosen level of damage code aggregation (2-digit, 5-digit or 7-digit damage code) for each workshop.
Discussions with the end users showed that the needed criteria to identify and analyze deviations in warranty and goodwill data should cover the main cost types (damage types)
total cost (total number of repairs)
labor cost (number of working hours)
cost for repair material (number of repairs with deployment of repair material)
cost for exchange of vehicle aggregates, e.g. gear unit, air conditioner unit, motor unit (number of repairs with deployment of aggregates)
All criteria by cost and damage types must be calculated for each damage code on the chosen level of damage code aggregation (2-digit, 5-digit or 7-digit damage code) for each workshop.
Case Study , REVI-MINER
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Deviation Analysis , some resultsDeviation Analysis , some results
Case Study , REVI-MINER
Weighted absolute deviation of averages between workshop and workshop cluster for top damage codes
01000200030004000500060007000
3311
398
065
5407
183
497
5431
154
583
1812
482
485
3322
415
305
7814
154
102
5413
509
175
8343
7
damage code
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Deviation Analysis , some resultsDeviation Analysis , some results
Case Study , REVI-MINER
Sum of weighted absolute deviations of average costs between workshop and workshop cluster for
top damage codes
0200000400000600000800000
10000001200000
0830
121
400
2250
021
611
2120
000
050
2291
822
903
2190
321
414
workshop number
©2008 Gholamreza Nakhaeizadeh. All rights reserved
DeploymentDeployment
Case Study , REVI-MINER
The Data Mining tool REVI-MINER has been supporting the controlling efforts to detect and avoid fraudulent activities within the workshop organization
Its functionality covered the essential phases of a Data Mining process and provides a user interface with easily manageable menus based upon VISUAL BASIC forms
REVI-MINER provides the methods for a fast, efficient and meaningful analysis of the warranty and goodwill data for workshops thus giving the experts of the revision department crucial hints upon possibly fraudulent activities
The Data Mining tool REVI-MINER has been supporting the controlling efforts to detect and avoid fraudulent activities within the workshop organization
Its functionality covered the essential phases of a Data Mining process and provides a user interface with easily manageable menus based upon VISUAL BASIC forms
REVI-MINER provides the methods for a fast, efficient and meaningful analysis of the warranty and goodwill data for workshops thus giving the experts of the revision department crucial hints upon possibly fraudulent activities
70
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Credit Card Fraud*
• Credit card fraud is an international criminal activity that is increasingly run by organized crime syndicates that have industryinsiders on their payrolls.
• Global losses to the extent of $3.8 Mrd.
• Credit card fraud is an international criminal activity that is increasingly run by organized crime syndicates that have industryinsiders on their payrolls.
• Global losses to the extent of $3.8 Mrd.
ExtentExtent
• Counterfeit• Card-not-present• Lost-stolen card• Intercepted in post• Application Fraud•….
• Counterfeit• Card-not-present• Lost-stolen card• Intercepted in post• Application Fraud•….
CategoriesCategories
* Source: Fraud Magazine Volume:18 Issue:3 Dated:May/June 2004 Pages:26-29-48
71
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Credit Card FraudCase Study 5Case Study 5
Business task: Helping European banks to reduce credit card fraud
Data Mining task: Clustering and Classification
Data Mining algorithm: (Advanced) Artificial Neural Network ANN scrutinises card transactions to deliver a highly accurate risk score by analysing the spending behaviour of each cardholder along with the profile of each merchant.
Data used : unknown BUT related to customer spending behavior
Technology Partner: Fair Isaac
VISA EU: Fraud Detection Tool VISOR (Visa Intelligent Scoring of Risk )
The system analyses each card transaction and highlights any suspicious activity on an account, allowing the bank to take action.
VISA EU: Fraud Detection Tool VISOR (Visa Intelligent Scoring of Risk )
The system analyses each card transaction and highlights any suspicious activity on an account, allowing the bank to take action.
Source: http://www.out-law.com/page-4189Source: http://www.out-law.com/page-4189
72
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Credit Card Fraud
• The system works by analysing all transactions that pass through Visa's own payment processing system, known as VisaNet.
For each transaction VISOR will check:
• The cardholder profile, including previous spending patterns. • The merchant profile. • Up to 240 regional, country or bank specific fraud detection rules.
• The system works by analysing all transactions that pass through Visa's own payment processing system, known as VisaNet.
For each transaction VISOR will check:
• The cardholder profile, including previous spending patterns. • The merchant profile. • Up to 240 regional, country or bank specific fraud detection rules.
• Each analysis results in a score, and the higher the score the greater the probability of fraud.
• If the score is above a threshold set by the bank, an alert is sent so that the bank can view the details. These details will include the risk score, amount, currency and other account transactions over the past week.
• The bank then decides if the transaction is fraudulent, and feeds the results back into VISOR.
• Each analysis results in a score, and the higher the score the greater the probability of fraud.
• If the score is above a threshold set by the bank, an alert is sent so that the bank can view the details. These details will include the risk score, amount, currency and other account transactions over the past week.
• The bank then decides if the transaction is fraudulent, and feeds the results back into VISOR.Source: http://www.out-law.com/page-4189Source: http://www.out-law.com/page-4189
FunctionalityFunctionalityCase Study 5Case Study 5
73
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Fraud Detection, Credit Card Fraud
The system has already been piloted by
• Barclays Bank International in Germany, • ICS in the Netherlands • the Nationwide in the UK
A full roll-out of the system is planned from January this year, with the phasing out of Visa's existing fraud detection system, CRIS Online 2, by the end of March.
The system has already been piloted by
• Barclays Bank International in Germany, • ICS in the Netherlands • the Nationwide in the UK
A full roll-out of the system is planned from January this year, with the phasing out of Visa's existing fraud detection system, CRIS Online 2, by the end of March.
Deployment Deployment
John Chaplin, Executive Vice President of Visa EU said: "With growing fraud losses across Europe, fraud detection is an essential tool for any card issuer. Early pilots indicate that Members are seeing an increase of anywhere between 15 to 60 per cent in fraud detection rates, depending on the number of transactions scrutinised. These immediate short term results confirm that VISOR will be a powerful tool for our member banks to combat their exposure to fraud."
John Chaplin, Executive Vice President of Visa EU said: "With growing fraud losses across Europe, fraud detection is an essential tool for any card issuer. Early pilots indicate that Members are seeing an increase of anywhere between 15 to 60 per cent in fraud detection rates, depending on the number of transactions scrutinised. These immediate short term results confirm that VISOR will be a powerful tool for our member banks to combat their exposure to fraud."
Case Study 5Case Study 5
74
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Money LaunderingMoney Laundering
Application of Data Mining in
Anti Money Laundering
Application of Data Mining in
Anti Money Laundering
75
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Money Laundering
• Money laundering generally involves a series of multiple transactions used to disguise the source of financial assets
• Through money laundering, the criminal tries to transform the monetary proceeds derived from illicit activities into funds with an apparently legal source
• Money laundering generally involves a series of multiple transactions used to disguise the source of financial assets
• Through money laundering, the criminal tries to transform the monetary proceeds derived from illicit activities into funds with an apparently legal source
DefinitionDefinition Source: http://www.dmreview.com/specialreports/20071002/1093412-1.html
clean moneyclean money
Ban
king
Syst
em
dirty moneydirty money
76
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Money Laundering
Worldwide value of laundered funds in a year ranges between $500 to $1000 Mrd
Worldwide value of laundered funds in a year ranges between $500 to $1000 Mrd
ExtentExtent
• weak financial regulatory systems• lax enforcement• gaps in the information systems of
financial institutions • corruption
• weak financial regulatory systems• lax enforcement• gaps in the information systems of
financial institutions • corruption
Main Reasons for MLMain Reasons for ML
Source: http://www.dmreview.com/specialreports/20071002/1093412-1.html
77
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Money LaunderingMoney LaunderingProcess of Money LaunderingProcess of Money Laundering
Placement:Placement of currency into a financial services institution
Placement:Placement of currency into a financial services institution
Phase 1Phase 1
Layering:Movement of funds from institution to institution to hide the source and ownership of the funds
Layering:Movement of funds from institution to institution to hide the source and ownership of the funds
Phase 2Phase 2 Institution 1Institution 1
Institution 2Institution 2
Institution nInstitution n
…..
Integrationreinvestment of those funds in an ostensibly legitimate business
Integrationreinvestment of those funds in an ostensibly legitimate business
Phase 3Phase 3
Advances in inform
ation technologies for banking and financial services helpA
dvances in information technologies
for banking and financial services help
Source: http://www.dmreview.com/specialreports/20071002/1093412-1.html
78
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Example of Process of money launderingExample of Process of money laundering
http://money.howstuffworks.com/money-laundering2.htm
79
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Money LaunderingMoney Laundering
Application of Data Mining in Anti-Money Laundering (AML)Application of Data Mining in Anti-Money Laundering (AML)
transactions from/to uncooperative countries or exposed persons
unusual high cash deposits
high level of activity on accounts that are generally little used
withdrawal of assets shortly after they were credited to the account
many payments from different persons to one account
…………..
transactions from/to uncooperative countries or exposed persons
unusual high cash deposits
high level of activity on accounts that are generally little used
withdrawal of assets shortly after they were credited to the account
many payments from different persons to one account
…………..
Examples of what has to be detectedExamples of what has to be detected
Source: www.aifb.uni-karlsruhe.de/AIK/veranstaltungen/aik13/presentations/kietz-dataMining.ppt -
80
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Money LaunderingMoney Laundering
Application of Data Mining in Anti-Money Laundering (AML)Application of Data Mining in Anti-Money Laundering (AML)
Many of DM-Methods discussed in “Fraud Detection”can be used in AML tooMany of DM-Methods discussed in “Fraud Detection”can be used in AML too
Data warehousing can help enforcement to consolidate financial transactions from multiple institutions across several countries
This helps analysis of transactions
Data warehousing can help enforcement to consolidate financial transactions from multiple institutions across several countries
This helps analysis of transactions
81
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Financial Risk Management
Application of Data Mining in Market
Risk Management
Application of Data Mining in Market
Risk Management
82
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Business Issues
Market RiskMarket Risk
change ofchange of
InterestRate
InterestRate
Exchange Rate
Exchange Rate
Stock Indices
Stock Indices
…….. ……..
Data Mining ApplicationData Mining Application
Forecasting and Analyzing of Forecasting and Analyzing of
Market Risk AssessmentMarket Risk Assessment
83
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Market Risk, Examples
Steurer, Elmar; University of Karlsruhe „Econometrics methods and machine learning
procedures for exchange rate forecasting :Theoretical analysis and empirical comparison “.
Tae Horn Hann, Elmar Steurer: Much ado about nothing? Exchange rate forecasting: Neural networks vs. linear models using monthly and weekly data. Neurocomputing 10(4): 323-339 (1996)
84
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Market Risk, Examples
Rauscher, Folke; University of Karlsruhe „Hybrid forecasting methods for exchange rate analysis: Combination possibilities of multivariate cointegration, neural networks and multi-task Learning “
Rauscher, Folke; University of Karlsruhe „Hybrid forecasting methods for exchange rate analysis: Combination possibilities of multivariate cointegration, neural networks and multi-task Learning “
85
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Market Risk, Examples
Short term prediction of the dollar exchange rate by using neural networks
Jurgen Graf, Gholamreza Nakhaeizadeh : Application of Learning Algorithms to Predicting Stock Prices in: Plantamura, V.L. et al. : Frontier Decision Suppot Concept.. pp.241ff, John Wiley, 1994
86
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Application of Data Mining in Market Risk
Data Mining AlgorithmsData Mining Algorithms
Supervised Learning
Continues valuedtarget variable
Continues valuedtarget variable
nominal valuedtarget variable
nominal valuedtarget variable
ExampleValue of interest rate
3,53,54,13,84,2
ExampleValue of interest rate
3,53,54,13,84,2
ExampleChange Direction
ExampleChange Direction
• Decision Trees• Logistic Regression• Random forest• ANN• KNN•….
• Decision Trees• Logistic Regression• Random forest• ANN• KNN•….
• Regression• Regression Trees• ANN• KNN•….
• Regression• Regression Trees• ANN• KNN•….
87
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Y= GDPCO= Total Personal ConsumptionI= Total Gross Private InvestmentG= Government Purchases of Goods and ServicesR= Interest rateYD= Disposal Income
Y= GDPCO= Total Personal ConsumptionI= Total Gross Private InvestmentG= Government Purchases of Goods and ServicesR= Interest rateYD= Disposal Income
Market Risk, Demo: Forecasting of Interest Rate
RapidddMiner: InterestRate.xml
(comparison between regression and ANN)
88
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Market Risk, Demo: Forecasting of „Deutsche Aktien Index“ (DAX)
RapidddMiner: GermanStocks.xml(comparison between regression and ANN)
Description of VariablesInterest Rate
bmw BMW-Stock Price
mru Münchner Rückv.-Stock Price
rwe RWE-Stock Price
vow VW-Stock Price
kar Karstadt-Stock Price
sie Siemens-Stock Price
bas BASF-Stock Price
index Index of Dax
time Number of the days
89
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Financial Risk Management
Application of Data Mining inCustomer
Relationship Management
Application of Data Mining inCustomer
Relationship Management
90
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Business Issue: Customer Relationship Management (CRM) in Banking
Definition: CRM consists of the processes a company uses totrack and organize its contacts with its current and prospective customers
Source:http://en.wikipedia.org/wiki/Customer_relationship_management
• Customer retention and brand loyalty (it is more difficult to gain a new customer than to keep one)
• Customer retention and brand loyalty (it is more difficult to gain a new customer than to keep one)
• Identifying potential customers • Identifying potential customers
• Reduction of costs of operation• Reduction of costs of operation
• Providing 360-degree view of the customer• Providing 360-degree view of the customer
CRM GoalsCRM Goals
91
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Business Issue: CRM in Banking
CRMCRM
Collaborative CRMCollaborative CRM Geographic CRMGeographic CRM
Operational CRMOperational CRM Analytical CRMAnalytical CRM
Data MiningData Mining
92
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Com
pone
nts
for O
pera
tiona
l CRM
Components for Analytical CRM
Components for Collaborative CRM
Principal Components of
CRM Systems
Campaign Management
Sales Force Automation
ERP-systems
Customer Service
Call Center Mail/ Fax Web/ Email Personnel Contact
Channel Management
Data Storage and Selection
Data Analysis
Data Collection
Data Mining
93
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Business Issue: Analytical CRM
Data Mining ApplicationsData Mining Applications
Data Mining
94
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM
acquisition program loyalty program acquisition program
Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf
LOYALTY LOOP
time
level ofCLC stages
considerationawarenessformation purchase ownership
reconsiderationrepurchase
outsideownership
Customer Life Cycle
Typical aCRMtasks
Typical aCRMtasks
- customer segmentation- cross/up-selling analysis- customer value analysis-..
- customer segmentation- cross/up-selling analysis- customer value analysis-..
- data collection- predictive modeling-Response analysis-..
- data collection- predictive modeling-Response analysis-..
- detection of refection- churn analysis- modeling for recovery- response analysis-..
- detection of refection- churn analysis- modeling for recovery- response analysis-..
95
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM: Data Situation
Data Situation Along the Customer Life Cycle
Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf
External data
suspect prospect active customer former customer
Internal data
time
ratio of external and internal data
Acquisition Loyalty Recovery
96
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM in Banking: Case Study
Churn Analysis in Retail BankingChurn Analysis in Retail Banking Business UnderstandingBusiness Understanding
Business ProblemBusiness Problem
* This case study is described in “ Customer Churn Prediction- a case study in retail banking” by Mutanen, et all.The paper can be found in: http://wortschatz.uni-leipzig.de/~macker/dmbiz06/PracticalDataMining.pdf
*
• Customer Churn in banking is one of the important issue in highly competitive financial industry
• Customer Churn in banking is one of the important issue in highly competitive financial industry
• Customer Churn describes the number or percent of the customers who cut their relationship with the bank
• Customer Churn describes the number or percent of the customers who cut their relationship with the bank
97
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM in Banking: Case Study
Churn Analysis Retail BankingChurn Analysis Retail Banking Business UnderstandingBusiness Understanding
Goals :
• Identifying the customers who are at risk of leaving the bank with a certain probability
• Determining whether the effort is worth to retain such customers
Goals :
• Identifying the customers who are at risk of leaving the bank with a certain probability
• Determining whether the effort is worth to retain such customers
• How much is being lost because of customer churn ?
• What is the scale of the efforts that would be appropriate for retention campaign?
• How much is being lost because of customer churn ?
• What is the scale of the efforts that would be appropriate for retention campaign?
98
©2008 Gholamreza Nakhaeizadeh. All rights reserved
The 20% of customers generate 47.5 % of profit (based on a database containing information on 1.6 Million private customers)
The 20% of customers generate 47.5 % of profit (based on a database containing information on 1.6 Million private customers)
30.35%
47,47%59,99%
69,74%77,53%
84,35%90,11%
94,52% 97,97% 100,00%
0,00%10,00%20,00%30,00%40,00%50,00%60,00%70,00%80,00%90,00%
100,00%
1 2 3 4 5 6 7 8 9 10
Decile
Profit per Decile %
Analytical CRM in Banking: Customer Value
Customer value analysis makes differentiation in individual level possible. Therefore, it is pre condition for profitable growth
Customer value analysis makes differentiation in individual level possible. Therefore, it is pre condition for profitable growth
ImportanceImportance
Source: Dirk Arndt “value oriented customer management”, personal correspondence
99
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM in Banking: Case Study
Churn Analysis Retail BankingChurn Analysis Retail Banking Business UnderstandingBusiness Understanding
Customer churn rate has strong impact on the customer life value and affects
• the length of the services
• further revenues
Customer churn rate has strong impact on the customer life value and affects
• the length of the services
• further revenues
• Sufficient information representing the churn is the probability of its occurrence
• Given the limited resources, the high probability churners can be contacted first
• Sufficient information representing the churn is the probability of its occurrence
• Given the limited resources, the high probability churners can be contacted first
• In retail banking customers stay with the bank for very long time, but, potential loss of revenue due to churn can be pretty high
• In retail banking customers stay with the bank for very long time, but, potential loss of revenue due to churn can be pretty high
100
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM in Banking: Case Study
Churn Analysis Retail BankingChurn Analysis Retail Banking Data UnderstandingData Understanding
• Customer database from a Finish bank• Data was collected in the period December 2002 - September 2005• Number of observations 151000• 75 attributed were collected between them churn as target variable
• Customer database from a Finish bank• Data was collected in the period December 2002 - September 2005• Number of observations 151000• 75 attributed were collected between them churn as target variable
• Data Mining task: prediction of the churn’s probability • Data Mining task: prediction of the churn’s probability
Appropriate mining algorithms:
• Logistic Regression• ……
Appropriate mining algorithms:
• Logistic Regression• ……
was used was used
101
©2008 Gholamreza Nakhaeizadeh. All rights reserved
CRM : Churn Analysis, Demo
Rapid Miner: Churn_xml
RapidMiner
Comparison between DT and ANNComparison between DT and ANN
102
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM in Banking: Case Study
Customer Satisfaction MetricCustomer Satisfaction Metric
Characteristics of the Pilot
Goal: Compare the service quality of automotive financial service providers in terms of loan and lease with respect to
Goal: Compare the service quality of automotive financial service providers in terms of loan and lease with respect to
Content of mailing: Cover letter (one page) and questionnaire (two pages)
Type of questionnaire: Choice between a paper-and-pencil questionnaire andan identical online version
Time span: June - September 2002
Number of mailings: 34,198
Content of mailing: Cover letter (one page) and questionnaire (two pages)
Type of questionnaire: Choice between a paper-and-pencil questionnaire andan identical online version
Time span: June - September 2002
Number of mailings: 34,198
Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf
103
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Analytical CRM: Case Study
Source: http://www.prudsys.de/Service/Downloads/bin/DMC2003_arndt-daimlerchrysler.pdf
Human Factor: Example: knowledge of the person who set up the loan or lease
Human Factor: Example: knowledge of the person who set up the loan or lease
Set up Factor: Example: Ease of getting information on lease or loan issues
Set up Factor: Example: Ease of getting information on lease or loan issues
Fair Deal factor: Example: Accuracy of billingFair Deal factor: Example: Accuracy of billing
Convenience Factor: Example: clarity of financing contractConvenience Factor: Example: clarity of financing contract
Attributes used:
Method used: Regression AnalysisMethod used: Regression Analysis
104
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Optimal structure of a Data Mining Team
Visualization
DatabaseTechnology
Machine Learning
Data Mining
Customer RelationshipManagement
Examplesof application
areas
StatisticsVisualization
Privacy
QualityManagement
Credit RiskManagement
105
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Sunday, August 20, 1:30 -- 2:30 pm, Palais Des Congres, Montreal, Canada Position statements of:
- Tej Anand, AT&T GIS- Dr. Gholamreza Nakhaeizadeh, Daimler-Benz- Evangelos Simoudis, IBM, co-chair- Gregory Piatetsky-Shapiro, GTE Laboratories, co-chair- Ralphe wiggins, statement Harvesting- Kamran Parsaye, statement Discovery- Mario Schkolnick, SGI
Source: http://www-aig.jpl.nasa.gov/public/kdd95/KDD95-Panels.html
Sunday, August 20, 1:30 -- 2:30 pm, Palais Des Congres, Montreal, Canada Position statements of:
- Tej Anand, AT&T GIS- Dr. Gholamreza Nakhaeizadeh, Daimler-Benz- Evangelos Simoudis, IBM, co-chair- Gregory Piatetsky-Shapiro, GTE Laboratories, co-chair- Ralphe wiggins, statement Harvesting- Kamran Parsaye, statement Discovery- Mario Schkolnick, SGI
Source: http://www-aig.jpl.nasa.gov/public/kdd95/KDD95-Panels.html
KDD-95 panel on Commercial KDD Applications:The "Secret" Ingredients for Success
KDD-95 panel on Commercial KDD Applications:The "Secret" Ingredients for Success
Success Factors of DM-Applications
106
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Success Parameters of Data Mining Solutions
Clear defined goals
Importance of the business problem
Management attention and support
Data availability and quality
Competence of the Data Mining team
Close cooperation between the Data Mining team and the end-users
Integration of the Data Mining Solution in the daily business process of the users
Other parameters (Please describe briefly)
107
©2008 Gholamreza Nakhaeizadeh. All rights reserved
Lessons learned
• Clear defined goals
• Importance of the business problem
• Management attention and support
• Willingness for long term investment
• Understanding of the power and
importance of data
Think big! Slice smart! Start small!
Think big! Slice smart! Start small!