Data Science by Chappuis Halder & Co.

CHAPPUIS HALDER & CO.

Data scienceOpportunities and challenges in the financial services industry

July 2016

CH&Co. | Data science offering

2CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.

Agenda

Data Science | What is it?1

Data science in the financial industry | Challenges and opportunities2

Data science with CH&Co. | Our convictions, what we do and what we offer3

Data science a closer look | The deep dip and some use cases4

To conclude | Our credentials…5

Appendices6


Data ScienceWhat data science is according to Chappuis Halder & Co.

Data Science in figures | Trends & OverviewFrom Google trend – 2016

0 20 40 60 80 100 120

India

Nigeria

Singapour

US

Pakistan

Philippines

Hong Kong

South Africa

Irland

UK

Australia

South Korea

Canada

New-Zeland

Malaisya

Iran

Switzerland

Netherland

Taiwan

Belgium

China

Sweden

Germany

France

Spain

Indonesia

Italy

Russia

Mexico

Poland

Brazil

Japan

Turkey

“Data science is the transformation

of data using mathematics and

statistics into valuable insights,

decision and products”John W. Foreman

Today

2006 2007 2008 2009 2010 20112012

20132014

2015

CH&Co. offices are

clearly following the

« Data science »

trend. Financial

cities play a key role

Geographical distribution of the

Data science interest

Capturing and analysing data, building predictive

models and running simulations of financial

events is complex and important but there is an

even bigger question.

The first priority and biggest challenge is to find

the question “What do our customers care

about?”

That is why the CH&Co.’s data science offer is

based on the following key streams:

Expertise in statistics is not enough.To make sense of data sets experience and

knowledge of the financial industry is key

Data does not need to be “Big”. Data

intelligence is not size dependent

1

Data science is not a magic formula.

Knowledge of how, when and in what

context to apply data science to what ends is

necessary to extract insight

2

3

Evolution of interest for

google search for the last 10

years

« Data science » is following

an exponential trend


Data ScienceLevels of maturity

2

3

1

0

Reporting

Descriptive

AnalyticsPredictive

Analytics

Integrated

AnalyticsComputer

power

Time / Sophistication of solution

Da

ta

Integrated AnalyticsWhat should I do?

De

cisi

on

Act

ion

Decision support

Decision Automation

Business Value

Descriptive

AnalyticsWhat happened?

Predictive AnalyticsWhat could happen?

Analytics Human Input

Uni / bivariate

data

Complex

multi-variate

data

Structure /

unstructured

data / Big Data

� Both predictive and prescriptive analytics support proactive

optimization of what is best in the future, based on a variety of

scenarios

� The difference between the two approaches is that predictive

analytics helps model future events, while prescriptive analysis

aims to show users how different actions will affect business

performance and point them toward the optimal choice

� Integrated Analytics extends beyond predictive analytics by

specifying both the actions necessary to achieve predicted

outcomes, and the interrelated effects of each decision

Various levels of analytics maturity can be

distinguished, depending on how much of the

decision process is automated, and how much is

carried out through human intervention


Agenda




Data science a closer look | The deep dip4


Appendices6


Data Science in the financial industry4 Drivers for a financial transformation

Customer experience Information & ReportingSupport functions

• Profitability

• Product innovation

• Branding

• Cost reduction

• Efficiency (Speed,

Op risk reduction)

• Process Automation

• Man day reduction

• Added value in analysis

• HR Talent & Governance

• Interactivity

• Real time analysis

• Visualisation

1 3 4

4 KEY DRIVERS | Where and why the financial industry is using Data science…Chappuis Halder & Co. – 2016

Raw data collectedData is

processed

Cleaned

dataset

Models &

Algorithm

Communicate &

visualize report

Exploratory

data

analysis

Data product

Make

Decision

Data Science | Process flowchart

Internal operations &

transaction processes2

Where does data science sit in the organisation and how does it bring value?


Data science is helping the financial services industry to become smarter in managing the myriad challenges it faces today.

1. Compliance: evolving and more stringent regulatory environment, increasing costs of compliance, significant risk of non-compliance

2. Profitability growth and solvency: greater volatility across asset classes, traditional retail banking product losing money, raising occurrence of fraud

incidents, integrated risk management at enterprise level

3. Competitive advantage: eroding product differentiation and customer loyalty, explosion in volume, velocity and variety of data, faster response time to

changing macroeconomic variables

Challenges requiring greater insight

Consumer behavior and

marketingRisk, fraud, and AML/KYC

Product and portfolio

optimization

Re

po

rtin

g a

nd

De

scri

pti

ve

An

aly

tics

Pre

dic

tiv

e a

nd

inte

gra

ted

An

aly

tics

•Customer Lifetime Value

•Customer profitability

dashboards

•Drill down reporting by

customer

•Campaign analytics

Data Science | Uses for Financial Services

•VaR calculations (historical /

non-parametric)

•Suspicious activity reporting

and customer risk scoring

•Account validation against

watch-lists

•Risk alerts at customer /

geography / product level

•Detailed asset level reporting

•Portfolio dashboards

•Static analysis of portfolio for

capital requirements

estimation

•Collateral analysis

•Collections delinquency

•Customer segmentation

•Channel mix modeling

•Next-best offer

•Trigger-based cross sell

•Bundled pricing

•Social media listening and

measurement

•VaR calculations (variance-

covariance and Monte Carlo)

•Behavioral PD, LGD, and EAD

modeling

•Stress-testing of economical

scenarios

•Pattern recognition and ML

•Simulations to predict default

or repayment risk

•Determining regulatory /

economical capital based on

credit portfolio

•Central limits management

� Advanced analytics offers FS the power

study customer behavior and

significantly improve marketing outcomes

without a proportionate increase in

budget

� It allows for a better understanding of

the risk dimensions faster, without

expanding the pool of human resources,

and help reduce the burden of

compliance with AML/KYC departments

� Effective use of analytics to fight fraud

helps improve profitability, reduce

payouts and legal hassles, and most

importantly, improve customer

satisfaction

� It not only helps determine asset pool

quality, but also prepayments,

delinquencies, defaults and cash flows

Illustration of Data Science in FS (Not exhaustive)

While basic reporting and descriptive analytics continues to be a must-have for banks, advanced predictive and integrated analytics

are now starting to generate powerful insights, resulting in significant business impact

Data Science in the financial industryWhat does it mean for financial services?


(Big?) Data

Segmentation

Dimensionality Reduction

Extraction of Groups

Missing Values & Outliers

Descriptive

Analysis

Predictive

Analysis

Integrated

Analysis

ID Causes and Effects

Real-time DataPattern Recognition

Data Warehouse� Region

� Subject of matter

� Product category

� Principal

Components

Analysis� K Nearest Neighbours

� Support Vector Machines

� Factor Analysis

Ad

just

Generalize

ANOVA, T test, etc.

billions

millions

thousands+

CompareDefine Training Set Define Fitness Function f(x)

� Markov Chain Monte

Carlo

� Random Forests

� Recommender Systems

� Neural Networks

� Logistics Regression

� Correlations

� Linear Regression

� Sqr. Error

� Accuracy

� Likelihood

� Probability

� Cost/Utility

� Neural Networks

� Gradient Descent

� Symbolic Artificial

Intelligence

� Agent-Based Models

� Cellular Automata

� Discrete Event Simulation

Ma

chin

e L

ea

rnin

g

thousands

Simulation

Predicting the likely future outcome of events often leveraging structured and unstructured data from a variety of sources

Examples: pattern recognition | machine learning

to predict fraud | risk alert generation at customer,

geographical, product level | trigger-based cross-

sells

Generating actionable insights on the current situation using complex and multi-variate data

Example: client segmentation | client profitability

| VaR calculation

1

Advises on possible outcomes and results in actions that are likely to maximize key business metrics. It is used in scenarios where there are too many options, variables, constraints and data points for the human mind to efficiently evaluate without assistance from technology

Examples: behavioural PD, LGD and EAD modelling, real-time offer models

1 2 32

3

| Descriptive Analysis |

| Predictive Analysis |

| Integrated Analysis |

Data Visualization1’

The ability to present and organize information intuitively which allows the detection of patterns, trends and correlation that might go undetected in text-based data

Example: heat map, 3D scatter plot, network

1’ | Data Visualization |

Data Science in the financial industryAnalytics leveraged in the banking sector


� IT infrastructure

� Storage and processing

(Hadoop, Spark)

� Data collection

Storage and

processing

Data Science

Data

governance1

Data

visualization

� Data quality

� Data Compliance

� Data management

� Strategic axis /

indicators

� Reporting format /

content

� Process / Tools / Data

� Data science is– Driven by the Big Data revolution, the emergence

of new technologies, the development of “new”

techniques and new business strategies

– An interdisciplinary field whose purpose is to

make the data speak. It can be used for prediction,

rating & discrimination, anticipation & simulation,

behavioural analysis, etc.

� The financial services industry needs to address

4 key issues:– How to extract the value of the data?

– What techniques and methodologies should be

used and for what?

– How can machine learning better support the

business strategy?

– How to organise / structure internally to address

this challenge?

1. Prediction / Anticipation

& simulation

2. Estimation

3. Ranking/Discrimination

4. Behavioural analysis

5. Self-learning models

Machine learning challenges

1CH&Co. has a dedicated offer on Data Governance. If you are interested we

would be happy to discuss this major challenge with you

“Statistics are ubiquitous in

life, and so should be statistical

reasoning.”

Alan Blinder, former Federal Reserve vice chairman,

NYTimes

Computer

scienceHacking

Maths &

Statistics

EngineeringFinancial

industry

expertise

Consulting

skills

� How CH&Co. sees a data scientist in the FS ?

A lot of different domains already exist

around data management. Not all have

the same meanings or objectives.

Chappuis Halder & Co. has recently

developed a R&D team to focus on

future needs of our clients: Data science

Data Science in the financial industryThe financial services industry is facing 4 major data science challenges


Data Science in the financial industryData science 3.0 | What is next ?

The Data science trends in brief | Future concepts & techniquesIllustration & examples (not exhaustive)

“Data science is an opportunity to

not only peak beyond the horizon,

but an opportunity to influence.”

Marcus, Chappuis Halder & Co.

The tendency, which seems to be converging

towards a somewhat standardised, cross-

industry risk framework, is forcing financial

industry in the direction of automated straight

through processes and intraday business

monitoring.

According to us, the emerging trend for

treatment in banks is based upon four drivers:

1. Regulation - Causing convergence across the

industry with a narrowing freedom of interpretation and

back stop models that are becoming the new standard

2. Transparency – Shareholders, Stakeholders and

customers all demand clear visibility. Clear and ordered

data across the institution that generates coherent

reports

3. Risk/Reward – Better understanding and

management of risks; capturing, modelling, monitoring

and optimizing all risk types

4. Cost reduction – New technology and pressure to

reduce costs drive automated solution and replacement

of manual interference wherever possible

A Algo Hedging – How AI and ML can influence or determine hedging strategy

New ways of capturing risks – Supply Chain Finance and Complexity Networks

The old oldest trick in the book… to get off the books

The full picture with the technology and techniques of tomorrow

Machine learning and Artificial Intelligence to increase effectiveness & efficiency

• Example : A complex portfolio can be hedged in many ways. Here an algorithm using sensitivities and a library

of hedging products would be able to construct alternative and improved ways to hedge a portfolio. Ways that

are not intuitive to a trader and may be more cost effective. By combining the hedging tool with Machine

Learning (ML) technics calibrated on past data, alerts for optimal hedge at optimal time could be generated, as

could recommendations for switching to new hedging strategies

• Example : Embracing new methodologies to capture and view risk. For example consider Supply Chain Finance

(SCF), this could be viewed as a closed system from a credit risk perspective. Which would open up new ways for

raising capital, engaging with clients and offer services. Similarly one could use complexity networks to model

market interactions and improve the understanding of various factors to enable impact analysis.

• Example : Transferring risk through new products such as Credit Suisse’s bond issue earlier this year. Could this

be taken one step further and be tranched against the proposed buckets of operational risk losses in proposed

in BIS new operational risk framework?

• Example : It is an overwhelming task to find the hedges or best collateral solution for a complex portfolio. Even

more so to understand the future margin requirements and align this with other liquidity strains. To bring the

full picture of outflows and inflows, risk and capital requirements an integrated system is required. What would

the dashboard, an insightful overview with alerts look like? How can this be achieved? What can be monitored in

real-time vs time-slicing?

• Example : Leveraging new technology and methods by developing machine learning for back and stress testing

(automation). This would reduce the cost of resources and free up quants to analyze and improve models.

B

C

D

E


Agenda






Appendices6


Data Science with CH&Co.Our convictions

• The most powerful thing is the ability to find the right question not the

right answer

• Data does not need to be big to express science

• Data science should be used to help see and to reveal things that is not

intuitive or easily realised

• Data science is not new. What is new is the perspective that this science

offers the financial industry

• Data will be increasingly unstructured (Figures, Pictures, sound, files,

internal, external …)

“You can have data without

information, but you cannot

have information without

data.”Daniel Keys Moran

Our convictions | Based on our experiences From CH&Co. – 2016

Convictions

Market

Discussions

Uncertainty

Capturing and analysing data, building predictive models and running simulations of

financial events is complex and important but there is an even bigger question.

The first priority and biggest challenge is to find the question “What do our customers care

about?”. This is why the CH&Co.’s data science offer is centred around the following pillars:

1

2

3

4

5


Data Science with CH&Co.Our Knowledge | The holy grail

� � � = � � � . �(�)�(�)

�� = ��(�|�)�(�) = ��

�

��

� � = �� + �. � ��

��

�(�) = �� + �

��

� �� , �� = (�� − ��)"+(#� − #�)"$

Naive Bayes Perceptron1 Linear Regression2

K Nearest Neighbor Neural Network PCA

Support Vector Machine Backpropagation Gradient descent

%��&�'&�(� = Engeinvalue �� … ��

�� = �� − �̅

�(�) = %��&�'&�(�4 �� … ��

3

4 5 6

∆��(�) = �67�� + 8∆��(� − 1) :� = :� − 8 � ℎ(��) − # . ��

��

�(�) = �� &��ℎ(. #. �(�� · ��)

�(�� · ��) =(��=��)"+(#� − #�)"$

��>(ℎ

$

# = 1 ? # = −1

@>>� ��(� = A� �(�|�)1 − �(�|�)

��(# = 1) = 11 + &=B(∑ DEFEGHIEJK

7 8 9

Logit Regression

TOP 10 FORMULAS | What the Financial industry is recurrently using …From Rubens Zimbres – 2016“There are two kinds of

statistics, the kind you look up

and the kind you make up.”

Rex Stout, Death of a Dox

10

Everyone at CH&Co. are believers in

science. This is why we invest in a

dedicated research team that drives

new initiatives and explores new

techniques and areas of application.

Practitioners of statistics are well aware

that

- A handful of techniques, approaches

and formulas provides solutions for

99% of your problems

- Choosing the right formula is crucial

and based on experience only

Just like other data experts, CH&Co.

keeps developing expertise and

knowledge by applying and testing

recurrent statistical equations


Data Science with CH&Co.What Chappuis Halder & Co. offers its clients

Data strategy

Data structuration

Data exploration

Data mining

Data visualization

Proof of Concept

� Formulation (find the question)

� Simulation (impact measurement)

� Benchmarking (who does what & how)

� Market study

� PMO on Data project

� Localisation (where is the data)

� Collection

� Cleaning & analysis (quality check)

� Data correction

� Data description (descriptive statistics)

� Data correlation & interest analysis (PCA …)

� Modelling

� Forecasting

� Stress testing

OUR OFFER| What CH&Co. is ready to do…Details of our capabilities – 2016

� Dashboard design

� Dynamic reporting 2.0 5dynamic, OCR …)

Our tools (not exhaustive)

Our team of expert does not only bring

techniques to our clients but also the

essential key ingredients:

Knowledge of the financial industry

Creativity (as we develop our own and

some of our clients real PoC)

1

Expertise in different areas around data

science (Digital, FinTech observatory,

data governance, regulatory ...)

2

3

“The goal is to turn data into

information, and information

into insight.”

Carly Fiorina, former CEO, Hewlett-Packard Co.

Speech given at Oracle OpenWorld


Agenda




Data science a closer look | The deep dip and some use cases4


Appendices6


What is it? Techniques used Concrete examples

Self-learning

models

Prediction,

anticipation &

simulation

Ranking /

Discrimination

� Creating various homogenous

classes making the ranking of

individuals possible

Behavioural

analysis

� Interpreting and predicting

behaviours using statistical data and

text/sound/image mining

� Implementing models that

automatically teach themselves how

to optimise their parameters from

available data

� Time series

� Artificial Neural Network

� Regressions

� Unsupervised / supervised

Classification

� PCA / MCA / FCA

� K-means / Neural Networks /

Random Forest / SVM

� Text mining using sophisticated

machine learning algorithms

� Descriptive statistics

� Computational statistics

� Mathematical optimisation

� Predict future value of a stock

� Estimate a variable of interest

for new people

� Detect risk periods

� Homogeneous risk class in

Credit risk (PD/LGD)

� Risk segmentation for

insurance products

� Behavioural analysis from the

emails database of a company

� Human resources digitalization

� Client targeting

� Classification (SVM, clustering,

logistic regression, k-means,

PCA, etc.)

Estimation

� Estimate the value of a variable of

interest based on explanatory

variables

� Regression models

� Classification models

� Optimization of products offers

� Estimation of credit interest

rate

� Modelling of a variable from existing

data, enabling its prediction &

anticipation according to several

scenarios

A

B

C

D

E

From data science to quantitative techniques Data science & machine learning can be mapped across 5 dimensions


Summary

Based on the historical values of a random process, machine learning

models are used in order to extract its trend and variations

Step 1: Model Calibration

� The variable of interest can depend on past values (time series) or on

external variables. In both cases, coefficients are calibrated on a learning

base and test out of sample.

� For a time series, the value of the process depends on itself and can

thus be extrapolated without external information

Step 2: Confidence intervals of predicted values

� Error of prediction observed on the sample can be used to define

confidence intervals of the predicted values

Step 3: Auto adjustments over time

� Error of prediction can be used to adjust further predictions

Step 4: Simulation

� To complete the predictions, adverse scenarios can be simulated based

on stress methods on macroeconomics indicators, extreme movements

of interest rate, of the stock market, etc.

Description

� Predict future values of a random process like default rates or recovery rates in

retail banking or even sales volumes in marketing

� Anticipate high risk periods such as increase of default, losses or a fall in retail

sales

� Simulate adverse scenarios and measure of those impacts on required capital

or on budget forecasts

� Predict future values of a random process

� Complete the prediction with simulations of adverse

scenarios

Objectives

� Risk capital management

� Budget forecastingScope

� Time series

� Artificial Neural NetworkTechniques

� LGD Backtesting (CH&Co)

� DP Stress testing (CH&Co)Examples

Outcome

90% Confidence interval of predictions

5% probability

Predicted values

5% probabilityHistorical values

Prediction period

Sample use case Prediction, anticipation & simulationA


Summary

� Step 1: What do we want to explain?

The first step is to define the variable that is not observable and that we

wish to determinate, the variable of interest, from other variables, the

explanatory ones

� Step 2: Model calibration

a second step, the data base is segmented in 2 subbases in order to

control the consistency of the model. A learning base that is used to

calibrate the coefficients of the model and a test subbase to check if the

model is efficient on data that were not used to calibrate the model. It

can be used to define other parameters of the model by choosing the

ones that minimize the error one the test subbase

� Step 3: Estimation in a new sample

The third step consists in using the model fitted on the entire database

in order to estimate the variable of interest of new individuals based on

their explanatory variables

Description

� Insurance policies underwriting choices made the historical clients permit to be

more accurate in targeting new clients

� Insurance can optimize the product offers for new clients according to their

characteristics

� Similarly, in retail banking, it allows to affect to a new credit a probability of

default or an estimation of the average loss when a default occurs on this type

of credit

Outcome

Objectives

� Credit Risk Management

� Insurance policies proposalsScope

� Regression models

� Classification modelsTechniques

� Optimization of products offers (CH&Co)

� Estimation of credit interest rate for a new client

� Estimation of PD/LGD of new credits (CH&Co)

Examples

Learning subbase Test subbase

Coefficients’ calibration

Error of

Estimation

Full data baseNew

individuals

Coefficients’ calibration

1

� Estimate a variable of interest of new individuals

according to their characteristics

2

General methodology of model calibration and new estimation

Sample use caseEstimation: Better estimateB


Summary

� Supervised classification:

– Input data have a know label, 0 and 1 for instance. The aim is to

assign each individual to the class he belongs

– The coefficients and the parameters of the model are defined on the

learning base and tested out of sample

– The model kept is the one that have the best accuracy rate

� Unsupervised classification:

– Input data are not labelled, there are no groups defined

– The classes are created so that the inter-class variance is minimised

and the intra-class variance is maximised

� Component Analysis:

– Data dimension reduction based on a linear combination of the

variables that explain most of the variance

– Discrimination of the individuals based on their representation with

the variables selected in the PCA process

3 main types of techniques used

� Fraud detection can be based on artificial neural networks that will assign

probabilities of fraud and thus a level of risk according to the clients

characteristics

� Ranking credits permits to affect them to a risk class with a corresponding

probability of default and a loss given default rate

Outcome

� Create homogeneous classes

� Rank individualsObjectives

� Credit Risk profiles

� Insurance policies proposals

� Marketing clients’ segmentation

Scope

� Supervised classification: Logit, Neural Networks

� Unsupervised classification: HAC / HDC / K-means

� Component analysis models: PCA / MCA / FCA

Techniques

� Homogeneous Risk Classes in Credit Risk (PD/LGD)

� Risk Segmentation for insurance products

� Fraud detection

Examples

4

3

2

1 High risk

Medium risk 2

Medium risk 1

Low risk

…

X1

X2

Xn

Classification with an Artificial Neural Network

Inputs

: C

lient/C

redit

chara

cte

ristics

Hidden layers: data transformation

Outputs

Sample use caseRanking / DiscriminationC


Summary Description

Outcome

� Making sense of many different kinds of data

� Getting knowledge from diversityObjectives

� Organization optimization

� Client behavioural analysis (retail/CIB)Scope

Techniques

� Detect the service or product preference of a client

� Detect a bottleneck in a work organization Examples

� Data mining

� Statistical processing

Input- Data base Information extractedOutput-Behaviours

Detection

Statistical processing

and data interpretation

Crossing of the computed

statistics and interpreted data

The behavioural analysis or behavioural detection is the crossing of various

types of data that enables to characterize and then detect a behaviour.

There is no unique way to perform behavioural analysis. It all depends on

the nature of the data we are dealing with.

It can however be summarized in 3 steps

1. Organizing and pre-processing the data composing the available

database

2. Extract the information from the data by making a statistical processing

and interpretation

3. Detect behaviours by crossing the statistics and interpreted data

� Making the data speak by extracting many kinds of information

� Interpret these data in a new way as experienced in the HighWayToMail

CH&Co project

� Taking advantage of this new approach and improve the client behavioural

analysis or optimise an organization

DB

Sample use case Behavioural analysis: make the data speakD


Summary

Step 1: Definition of the actions and the targeted result

� Modelling of the transition from an initial situation to a gain or loss

position based on the actions set up during the length of the algorithmic

process

Step 2: Initial probability of each action

� During the initialization, the probabilities of occurrence of each action

for a selected situation are equal

Step 3: Automatic learning

� Each time, the achievement of the final situation is defined as a gain or

loss result and a list of specific actions

� The results are automatically integrated in the probabilities of each

actions.

� The several recurrences of the algorithm allows a dynamic refining of

the actions’ probabilities based on the maximization of the chances to

obtain a gain in the final situation

Description

� Asset management models’ dynamic choice based on what is defined as a gain

(high return, low volatility etc.) and the levels of past return, volatility,

macroeconomics indicators that represent the different situations

� Automatic recalibration of model coefficients

Outcome

� Create a model that learns by itself each time it

experiments a situationObjectives

� Asset management allocation strategy

� Risk managementScope

� Algorithms

� Bayesian methodsTechniques

� Automatic adjustment in time of regression or

classification models

� Error correction model (ECM)

Examples

Action 1 Probability= 1/3






Initial

Situation

Situation 1

Gain

Loss

Situation 2

Situation 3

Situation 3

Example of an iteration

If action 1 in

the third

situation led

to a final gain

Sample use caseSelf-learning models E


Agenda






Appendices6


� Products and services real

time offering

� Churn detection

� HR Planning

What the banker / the insurer is looking for?

Business application CH&Co Credentials Other examples

Self-learning models

� Improve process, including decision process

(time, quality, information, etc)

� Automatize task / process

� Real time insurance pricing

� Automatic asset management

re-allocation

Automatic and dynamic

models backtesting

4

Behavioural analysis

� Understand your business and your

customers

� Develop new products or services

Proof of Concept “Highway to

mail”

3

Prediction /

Anticipation &

simulation

� Improve profitability

� Assess & monitor

� Predict and anticipate

� Expected return

� Natural catastrophes

prediction

� Terrorism anticipation

Improving and optimizing the

stress-testing exercise 5

Estimation

� Improve the quality of your services

� Improve profitability

� Optimal pricing of new

products

� Credit interest rate valuation

Marketing intelligence2

Scoring / Risk Estimation6

Client segmentationRanking /

Discrimination

� Optimize your costs

� Optimize your risk

� Develop customized existing products or

services

� Fraud detection

� Credit granting choice1

Scoring / Risk estimation6

Marketing intelligence2


Credentials 1. Client segmentation

Summary

� Establishment of segmentation and scoring

methodology to optimize the commercial

approach and reduce costs

Objectives

� Marketing clients’ segmentation

� Credit risk profilesScope

� Scoring

� ClassificationTechniques

� Context

– Growing tensions on scare resources allocations and costs

– Increase willingness to optimize the commercial techniques to

maintain profitability

– A clear client segmentation is a strategic topic for the management

and a key success factor in the transversal initiatives led by the CIB

� Approach

– Validation of the perimeter focus : client segmentation on risk, cross-

sell, growth, scarce resources and profitability

– Data collection, filtering and selection of strategic variables to build

ratios. Variables analysis and profiling : statistical description,

abnormalities detection, correlation studies and variables weighting

– Classification of clients

– Analysis of the clients’ clusters and identification of pockets of

opportunities

Case description

-8

-6

-4

-2

0

2

4

6

8

10

12

-8 -6 -4 -2 0 2 4 6 8 10

Groupe 3Groupe 1

Groupe 2

Outcomes

Revenue generation

Operating cost optimisation

Risk optimisation

Time Optimisation

Quality optimisation

Illustration


Credentials 2. Marketing intelligence

Summary

� Exploitation of customers databases in order to refine

the insurance offer of products and complementary

options according to clients’ profilesObjectives


� Fraud detection

� Credit risk

Scope

� Logistic regression

� Neural network

� Random forest

Techniques

Outcomes

� Context

– Massive clients databases are not fully exploited

– The use of a machine learning model enables to offer additional options and

products depending on the client’s characteristics

– A good modelling of the customers’ specification allows an efficient

targeting of the offer and a high probability of complementary options

underwriting

� Approach

– Segmentation of the data base in 2 subbases : one for the learning and the

second one to test the model

– Initialization of coefficients and calibration of parameters through a

machine learning model in the learning base to optimize the predictive

power on the test base

– Extension of the model on the entire base

– Exploitation of the data with different machine learning models

– Selection of the best model through cross-validation

– Focus on the three products or additional options the most likely to be

underwritten

Case description

Revenue generation


Risk optimisation

Time Optimisation


Machine learning

model

P1

P2

P3P3

P4P4

P5

85%85%

72%72%

68%68%

5%5%

92%92%

X1X1

X2X2

X3X3

X4X4

Client profileUnderwriting probability for each

products/additional options

Illustration


Credentials 3. Highway to mail: how to detect behaviours from emails exchanges.

Summary

� Optimizing companies organization

� Improving clients (retail/CIB) targetingObjectives

� All kinds of enterprisesScope

� Descriptive statistics

� Text mining using machine learning technicsTechniques

Illustration

Outcomes

� Context

– Everywhere, emails exchanges have become the main way to

communicate internally as well as externally

– The nature of these exchanges give a lot of information on how

people behave and lead to optimize the processes

� Approach

– We are working with databases of emails. Our approach is based on

both a statistical analysis of the emails and a text mining of their

subjects

– The computed statistics answer the questions: “how many emails are

exchanged on Mondays? Between 8 and 12 am”, Who are M. Y 10

main contacts?”, etc.

– Text mining (opinion/sentiment analysis) enables to interpret the

emails subjects

– Finally, crossing both kinds of data will enable to characterize and

then detect behaviours

Case description


Risk optimisation Time Optimisation


Urgent

MeetingContract

Private

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

-3 -2 -1 0 1 2 3 4

Invitation

Presentation

Deal


Credentials 4. Automatic and dynamic models backtesting

Summary

� Automatization of backtesting models, calibration of

alerts’ thresholds, identification of poor performances

and actions to takeObjectives

� Credit Risk

� Asset managementScope

� Scoring

� Classification


Techniques

Outcomes

� Context

– Backtesting of banking models is time-consuming and extremely

resource-intensive

– Targeting the causes of poor performances allows to automatize

actions to take

� Approach

– A first layer of tests is implemented (Gini, Stability, Conservatism

etc.).

– According to the results, several layers of tests are built in order to

identify the causes of model’s failures

– Actions to take are automatically established

– Thresholds are dynamically calibrated

Case description

Revenue generation


Risk optimisation

Time Optimisation


Test 1

Test 2

Test 3

No alert

Minor alert

type 1

Major alert

type 1

Minor alert

type 2

Major alert

type 2

No action

Action 1

Action 2

Action 3

Action 4


Credentials 5. Improving and optimizing the stress-testing exercise

Summary

� Improving the modelling of the probability of

default (PD)

� Improving risk management

Objectives

� Credit Risk

� Risk managementScope

� Neuronal networks

� ClassificationTechniques

Outcomes

� Context

– Stress tests are required by financial institutions using internal

rating-based approaches for credit risk to assess the robustness of

their internal capital assessments

– Stress tests use very unstable models and many validation tests that

are time-consuming

� Approach

– Focusing on robust Default Rate modelling using machine learning

algorithms

– Important statistical properties that are to be verified by the model

are directly included into the model optimization

– Variables to take into account are automatically chosen

– Models thresholds are dynamically calibrated

Case description

Time optimization

Cost optimisation

Risk optimization

Quality Optimization

Projection du taux de défaut du protefeuille X

selon des scénarios central et adverse

Adverse Central


Credentials 6. Scoring Risk Factor Estimation

Summary

� Segment credits into homogeneous risk classes

in order to affect a probability of default or a loss

given default rate to each credit

Objectives

� Credit Risk

� Risk capital management


Scope

� Scoring

� Classification


Techniques

Outcomes

� Context

– To estimate the probability of default or the loss given default rate of

a portfolio, it has to be well segmented in order to get the best

estimations of the Basel parameters in each group

� Approach

– A logistic regression is calibrated on historical data

– It provides each individual with a score value

– The range of values is segmented in several layers

– According to the score value, each individuals is affected to a risk

class (a layer)

– The segmentation is made so that the individuals of each class have a

similar risk profile and the profiles of individuals of different classes

are significantly different

– In each homogeneous risk class, the Basel parameters are then

estimated for the entire group

Case description

Revenue generation


Risk optimisation

Time Optimisation


Definition of a new credit score and affectation to an HRC

X1X2

X3X4

X5 X6

Logistic

regression

Credit Score value

Credit characteristics

HRC 2

HRC 3

HRC 4

HRC 1

Sco

re v

alu

e

0

30CHAPPUIS HALDER & CO.GRA – Data Science Offer – March 2016 Strictly confidential - © Chappuis Halder & Co.

Agenda






Appendices6


Machine LearningOverview

Linear Regression

Support Vector Machines

K Nearest Neighbor

Principal Components

Analysis

PerceptronNaïve Bayes

Backpropagation Gradient Descent

Neural Networks

Logistic Regression

� � � = � � � . �(�)�(�)

�� = ��(�|�)

@>>� ��(� = A� �(�|�)1 − �(�|�)

��(# = 1) = 11 + &=B(∑ DEFEGHIEJK

�(�) = ��

��

� � = �� + �. � ��

��

�� = �� − �̅%��&�'&�(� = Engeinvalue �� … ��

�(�) = %��&�'&�(�4 �� … ��

∆��(�) = �67�� + 8∆��(� − 1) :� = :� − 8 � ℎ(��) − # . ��

��

�(�) = �� &��ℎ(. #. �(�� · ��)

�(�� · ��) =(��=��)"+(#� − #�)"$

��>(ℎ

$

�&��ℎ → MN = 0# = 1 ? # = −1

�(�) = �� + �

�� , �� = (�� − ��)"+(#� − #�)"$

assumes that the presence of a

particular feature in a class is

unrelated to the presence of any

other feature

A network of neurons in which

the output(s) of some neurons

are connected through weighted

connections to the input(s) of

other neurons

used to describe data and to

explain the relationship between

one dependent (e.g. age)

variable and one or more

independent variables (e.g.

income)

classify an unknown example

with the most common class

among k closest examples (e.g.

“tell me who your neighbors are,

and I’ll tell you who you are”)

composed of a large number of

highly interconnected processing

elements (neurones) working in

unison to solve specific problems

(like the human brain)

a technique used to emphasize

variation and bring out strong

patterns in a dataset. It's often

used to make data easy to

explore and visualize

a supervised machine learning

algorithm which can be used for

both classification or regression

challenges

A popular algorithm used to

optimize neural networks by

starting with an initial set of

parameter values and

iteratively moving toward a

set of parameter values that

minimize the function

a statistical method for analyzing

a dataset in which there are one

or more independent variables

that determine an outcome

A common method of training a

neural net in which the initial

system output is compared to

the desired output, and the

system is adjusted until the

difference between the two is

minimized

CHAPPUIS HALDER & CO.

MONTREAL

1501 McGill College

avenue – Suite 2920

Montreal H3A 3MB,

Quebec

PARIS

20, rue de la Michodière

75002, Paris, France

NIORT

19 avenue Bujault

79000 Niort, France

NEW YORK

1441, Broadway

Suite 3015, New York

NY 10018, USA

SINGAPORE

60 Tras Street,

#03-01

Singapore 078999

HONG KONG

1205-06, 12/F,

Kinwick Centre

32 Hollywood Road,

Central, Hong Kong

LONDON

50 Great Portland Street

London W1W 7ND, UK

GENEVA

Rue de Lausanne 80

CH 1202 Genève, Suisse

[email protected]

Benoit Genest| Partner | London

[email protected]

Ziad Fares | Head of R&D | Paris

[email protected]

Patrick Bucquet | Partner | New [email protected]

CONTACTS

Data Science by Chappuis Halder & Co.

Data & Analytics

Transcript of Data Science by Chappuis Halder & Co.