Data Science by Chappuis Halder & Co.
-
Author
genest-benoit -
Category
Data & Analytics
-
view
566 -
download
1
Embed Size (px)
Transcript of Data Science by Chappuis Halder & Co.

CHAPPUIS HALDER & CO.
Data scienceOpportunities and challenges in the financial services industry
July 2016
CH&Co. | Data science offering

2CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Agenda
Data Science | What is it?1
Data science in the financial industry | Challenges and opportunities2
Data science with CH&Co. | Our convictions, what we do and what we offer3
Data science a closer look | The deep dip and some use cases4
To conclude | Our credentials…5
Appendices6

3CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data ScienceWhat data science is according to Chappuis Halder & Co.
Data Science in figures | Trends & OverviewFrom Google trend – 2016
0 20 40 60 80 100 120
India
Nigeria
Singapour
US
Pakistan
Philippines
Hong Kong
South Africa
Irland
UK
Australia
South Korea
Canada
New-Zeland
Malaisya
Iran
Switzerland
Netherland
Taiwan
Belgium
China
Sweden
Germany
France
Spain
Indonesia
Italy
Russia
Mexico
Poland
Brazil
Japan
Turkey
“Data science is the transformation
of data using mathematics and
statistics into valuable insights,
decision and products”John W. Foreman
Today
2006 2007 2008 2009 2010 20112012
20132014
2015
CH&Co. offices are
clearly following the
« Data science »
trend. Financial
cities play a key role
Geographical distribution of the
Data science interest
Capturing and analysing data, building predictive
models and running simulations of financial
events is complex and important but there is an
even bigger question.
The first priority and biggest challenge is to find
the question “What do our customers care
about?”
That is why the CH&Co.’s data science offer is
based on the following key streams:
Expertise in statistics is not enough.To make sense of data sets experience and
knowledge of the financial industry is key
Data does not need to be “Big”. Data
intelligence is not size dependent
1
Data science is not a magic formula.
Knowledge of how, when and in what
context to apply data science to what ends is
necessary to extract insight
2
3
Evolution of interest for
google search for the last 10
years
« Data science » is following
an exponential trend

4CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data ScienceLevels of maturity
2
3
1
0
Reporting
Descriptive
AnalyticsPredictive
Analytics
Integrated
AnalyticsComputer
power
Time / Sophistication of solution
Da
ta
Integrated AnalyticsWhat should I do?
De
cisi
on
Act
ion
Decision support
Decision Automation
Business Value
Descriptive
AnalyticsWhat happened?
Predictive AnalyticsWhat could happen?
Analytics Human Input
Uni / bivariate
data
Complex
multi-variate
data
Structure /
unstructured
data / Big Data
� Both predictive and prescriptive analytics support proactive
optimization of what is best in the future, based on a variety of
scenarios
� The difference between the two approaches is that predictive
analytics helps model future events, while prescriptive analysis
aims to show users how different actions will affect business
performance and point them toward the optimal choice
� Integrated Analytics extends beyond predictive analytics by
specifying both the actions necessary to achieve predicted
outcomes, and the interrelated effects of each decision
Various levels of analytics maturity can be
distinguished, depending on how much of the
decision process is automated, and how much is
carried out through human intervention

5CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Agenda
Data Science | What is it?1
Data science in the financial industry | Challenges and opportunities2
Data science with CH&Co. | Our convictions, what we do and what we offer3
Data science a closer look | The deep dip4
To conclude | Our credentials…5
Appendices6

6CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data Science in the financial industry4 Drivers for a financial transformation
Customer experience Information & ReportingSupport functions
• Profitability
• Product innovation
• Branding
• Cost reduction
• Efficiency (Speed,
Op risk reduction)
• Process Automation
• Man day reduction
• Added value in analysis
• HR Talent & Governance
• Interactivity
• Real time analysis
• Visualisation
1 3 4
4 KEY DRIVERS | Where and why the financial industry is using Data science…Chappuis Halder & Co. – 2016
Raw data collectedData is
processed
Cleaned
dataset
Models &
Algorithm
Communicate &
visualize report
Exploratory
data
analysis
Data product
Make
Decision
Data Science | Process flowchart
Internal operations &
transaction processes2
Where does data science sit in the organisation and how does it bring value?

7CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data science is helping the financial services industry to become smarter in managing the myriad challenges it faces today.
1. Compliance: evolving and more stringent regulatory environment, increasing costs of compliance, significant risk of non-compliance
2. Profitability growth and solvency: greater volatility across asset classes, traditional retail banking product losing money, raising occurrence of fraud
incidents, integrated risk management at enterprise level
3. Competitive advantage: eroding product differentiation and customer loyalty, explosion in volume, velocity and variety of data, faster response time to
changing macroeconomic variables
Challenges requiring greater insight
Consumer behavior and
marketingRisk, fraud, and AML/KYC
Product and portfolio
optimization
Re
po
rtin
g a
nd
De
scri
pti
ve
An
aly
tics
Pre
dic
tiv
e a
nd
inte
gra
ted
An
aly
tics
•Customer Lifetime Value
•Customer profitability
dashboards
•Drill down reporting by
customer
•Campaign analytics
Data Science | Uses for Financial Services
•VaR calculations (historical /
non-parametric)
•Suspicious activity reporting
and customer risk scoring
•Account validation against
watch-lists
•Risk alerts at customer /
geography / product level
•Detailed asset level reporting
•Portfolio dashboards
•Static analysis of portfolio for
capital requirements
estimation
•Collateral analysis
•Collections delinquency
•Customer segmentation
•Channel mix modeling
•Next-best offer
•Trigger-based cross sell
•Bundled pricing
•Social media listening and
measurement
•VaR calculations (variance-
covariance and Monte Carlo)
•Behavioral PD, LGD, and EAD
modeling
•Stress-testing of economical
scenarios
•Pattern recognition and ML
•Simulations to predict default
or repayment risk
•Determining regulatory /
economical capital based on
credit portfolio
•Central limits management
� Advanced analytics offers FS the power
study customer behavior and
significantly improve marketing outcomes
without a proportionate increase in
budget
� It allows for a better understanding of
the risk dimensions faster, without
expanding the pool of human resources,
and help reduce the burden of
compliance with AML/KYC departments
� Effective use of analytics to fight fraud
helps improve profitability, reduce
payouts and legal hassles, and most
importantly, improve customer
satisfaction
� It not only helps determine asset pool
quality, but also prepayments,
delinquencies, defaults and cash flows
Illustration of Data Science in FS (Not exhaustive)
While basic reporting and descriptive analytics continues to be a must-have for banks, advanced predictive and integrated analytics
are now starting to generate powerful insights, resulting in significant business impact
Data Science in the financial industryWhat does it mean for financial services?

8CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
(Big?) Data
Segmentation
Dimensionality Reduction
Extraction of Groups
Missing Values & Outliers
Descriptive
Analysis
Predictive
Analysis
Integrated
Analysis
ID Causes and Effects
Real-time DataPattern Recognition
Data Warehouse� Region
� Subject of matter
� Product category
� Principal
Components
Analysis� K Nearest Neighbours
� Support Vector Machines
� Factor Analysis
Ad
just
Generalize
ANOVA, T test, etc.
billions
millions
thousands+
CompareDefine Training Set Define Fitness Function f(x)
� Markov Chain Monte
Carlo
� Random Forests
� Recommender Systems
� Neural Networks
� Logistics Regression
� Correlations
� Linear Regression
� Sqr. Error
� Accuracy
� Likelihood
� Probability
� Cost/Utility
� Neural Networks
� Gradient Descent
� Symbolic Artificial
Intelligence
� Agent-Based Models
� Cellular Automata
� Discrete Event Simulation
Ma
chin
e L
ea
rnin
g
thousands
Simulation
Predicting the likely future outcome of events often leveraging structured and unstructured data from a variety of sources
Examples: pattern recognition | machine learning
to predict fraud | risk alert generation at customer,
geographical, product level | trigger-based cross-
sells
Generating actionable insights on the current situation using complex and multi-variate data
Example: client segmentation | client profitability
| VaR calculation
1
Advises on possible outcomes and results in actions that are likely to maximize key business metrics. It is used in scenarios where there are too many options, variables, constraints and data points for the human mind to efficiently evaluate without assistance from technology
Examples: behavioural PD, LGD and EAD modelling, real-time offer models
1 2 32
3
| Descriptive Analysis |
| Predictive Analysis |
| Integrated Analysis |
Data Visualization1’
The ability to present and organize information intuitively which allows the detection of patterns, trends and correlation that might go undetected in text-based data
Example: heat map, 3D scatter plot, network
1’ | Data Visualization |
Data Science in the financial industryAnalytics leveraged in the banking sector

9CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
� IT infrastructure
� Storage and processing
(Hadoop, Spark)
� Data collection
Storage and
processing
Data Science
Data
governance1
Data
visualization
� Data quality
� Data Compliance
� Data management
� Strategic axis /
indicators
� Reporting format /
content
� Process / Tools / Data
� Data science is– Driven by the Big Data revolution, the emergence
of new technologies, the development of “new”
techniques and new business strategies
– An interdisciplinary field whose purpose is to
make the data speak. It can be used for prediction,
rating & discrimination, anticipation & simulation,
behavioural analysis, etc.
� The financial services industry needs to address
4 key issues:– How to extract the value of the data?
– What techniques and methodologies should be
used and for what?
– How can machine learning better support the
business strategy?
– How to organise / structure internally to address
this challenge?
1. Prediction / Anticipation
& simulation
2. Estimation
3. Ranking/Discrimination
4. Behavioural analysis
5. Self-learning models
Machine learning challenges
1CH&Co. has a dedicated offer on Data Governance. If you are interested we
would be happy to discuss this major challenge with you
“Statistics are ubiquitous in
life, and so should be statistical
reasoning.”
Alan Blinder, former Federal Reserve vice chairman,
NYTimes
Computer
scienceHacking
Maths &
Statistics
EngineeringFinancial
industry
expertise
Consulting
skills
� How CH&Co. sees a data scientist in the FS ?
A lot of different domains already exist
around data management. Not all have
the same meanings or objectives.
Chappuis Halder & Co. has recently
developed a R&D team to focus on
future needs of our clients: Data science
Data Science in the financial industryThe financial services industry is facing 4 major data science challenges

10CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data Science in the financial industryData science 3.0 | What is next ?
The Data science trends in brief | Future concepts & techniquesIllustration & examples (not exhaustive)
“Data science is an opportunity to
not only peak beyond the horizon,
but an opportunity to influence.”
Marcus, Chappuis Halder & Co.
The tendency, which seems to be converging
towards a somewhat standardised, cross-
industry risk framework, is forcing financial
industry in the direction of automated straight
through processes and intraday business
monitoring.
According to us, the emerging trend for
treatment in banks is based upon four drivers:
1. Regulation - Causing convergence across the
industry with a narrowing freedom of interpretation and
back stop models that are becoming the new standard
2. Transparency – Shareholders, Stakeholders and
customers all demand clear visibility. Clear and ordered
data across the institution that generates coherent
reports
3. Risk/Reward – Better understanding and
management of risks; capturing, modelling, monitoring
and optimizing all risk types
4. Cost reduction – New technology and pressure to
reduce costs drive automated solution and replacement
of manual interference wherever possible
A Algo Hedging – How AI and ML can influence or determine hedging strategy
New ways of capturing risks – Supply Chain Finance and Complexity Networks
The old oldest trick in the book… to get off the books
The full picture with the technology and techniques of tomorrow
Machine learning and Artificial Intelligence to increase effectiveness & efficiency
• Example : A complex portfolio can be hedged in many ways. Here an algorithm using sensitivities and a library
of hedging products would be able to construct alternative and improved ways to hedge a portfolio. Ways that
are not intuitive to a trader and may be more cost effective. By combining the hedging tool with Machine
Learning (ML) technics calibrated on past data, alerts for optimal hedge at optimal time could be generated, as
could recommendations for switching to new hedging strategies
• Example : Embracing new methodologies to capture and view risk. For example consider Supply Chain Finance
(SCF), this could be viewed as a closed system from a credit risk perspective. Which would open up new ways for
raising capital, engaging with clients and offer services. Similarly one could use complexity networks to model
market interactions and improve the understanding of various factors to enable impact analysis.
• Example : Transferring risk through new products such as Credit Suisse’s bond issue earlier this year. Could this
be taken one step further and be tranched against the proposed buckets of operational risk losses in proposed
in BIS new operational risk framework?
• Example : It is an overwhelming task to find the hedges or best collateral solution for a complex portfolio. Even
more so to understand the future margin requirements and align this with other liquidity strains. To bring the
full picture of outflows and inflows, risk and capital requirements an integrated system is required. What would
the dashboard, an insightful overview with alerts look like? How can this be achieved? What can be monitored in
real-time vs time-slicing?
• Example : Leveraging new technology and methods by developing machine learning for back and stress testing
(automation). This would reduce the cost of resources and free up quants to analyze and improve models.
B
C
D
E

11CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Agenda
Data Science | What is it?1
Data science in the financial industry | Challenges and opportunities2
Data science with CH&Co. | Our convictions, what we do and what we offer3
Data science a closer look | The deep dip4
To conclude | Our credentials…5
Appendices6

12CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data Science with CH&Co.Our convictions
• The most powerful thing is the ability to find the right question not the
right answer
• Data does not need to be big to express science
• Data science should be used to help see and to reveal things that is not
intuitive or easily realised
• Data science is not new. What is new is the perspective that this science
offers the financial industry
• Data will be increasingly unstructured (Figures, Pictures, sound, files,
internal, external …)
“You can have data without
information, but you cannot
have information without
data.”Daniel Keys Moran
Our convictions | Based on our experiences From CH&Co. – 2016
Convictions
Market
Discussions
Uncertainty
Capturing and analysing data, building predictive models and running simulations of
financial events is complex and important but there is an even bigger question.
The first priority and biggest challenge is to find the question “What do our customers care
about?”. This is why the CH&Co.’s data science offer is centred around the following pillars:
1
2
3
4
5

13CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data Science with CH&Co.Our Knowledge | The holy grail
� � � = � � � . �(�)�(�)
�� = ��(�|�)�(�) = ���� � ��
�
������
� � = �� + �. � �����
���
�(�) = ���� � ���� + �
���
� �� , �� = (�� − ��)"+(#� − #�)"$
Naive Bayes Perceptron1 Linear Regression2
K Nearest Neighbor Neural Network PCA
Support Vector Machine Backpropagation Gradient descent
%���&�'&�(� = Engeinvalue �� … ��
�� = �� − �̅
�(�) = %��&�'&�(�4 ��� … ���
3
4 5 6
∆���(�) = �67��� + 8∆���(� − 1) :� = :� − 8 � ℎ(��) − # . ���
���
�(�) = ���� �&��ℎ(. #. �(�� · ��)
�(�� · ��) =(��=��)"+(#� − #�)"$
��>(ℎ
$
# = 1 ? # = −1
@>>� ��(� = A� �(�|�)1 − �(�|�)
��(# = 1) = 11 + &=B(∑ DEFEGHIEJK
7 8 9
Logit Regression
TOP 10 FORMULAS | What the Financial industry is recurrently using …From Rubens Zimbres – 2016“There are two kinds of
statistics, the kind you look up
and the kind you make up.”
Rex Stout, Death of a Dox
10
Everyone at CH&Co. are believers in
science. This is why we invest in a
dedicated research team that drives
new initiatives and explores new
techniques and areas of application.
Practitioners of statistics are well aware
that
- A handful of techniques, approaches
and formulas provides solutions for
99% of your problems
- Choosing the right formula is crucial
and based on experience only
Just like other data experts, CH&Co.
keeps developing expertise and
knowledge by applying and testing
recurrent statistical equations

14CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Data Science with CH&Co.What Chappuis Halder & Co. offers its clients
Data strategy
Data structuration
Data exploration
Data mining
Data visualization
Proof of Concept
� Formulation (find the question)
� Simulation (impact measurement)
� Benchmarking (who does what & how)
� Market study
� PMO on Data project
� Localisation (where is the data)
� Collection
� Cleaning & analysis (quality check)
� Data correction
� Data description (descriptive statistics)
� Data correlation & interest analysis (PCA …)
� Modelling
� Forecasting
� Stress testing
OUR OFFER| What CH&Co. is ready to do…Details of our capabilities – 2016
� Dashboard design
� Dynamic reporting 2.0 5dynamic, OCR …)
Our tools (not exhaustive)
Our team of expert does not only bring
techniques to our clients but also the
essential key ingredients:
Knowledge of the financial industry
Creativity (as we develop our own and
some of our clients real PoC)
1
Expertise in different areas around data
science (Digital, FinTech observatory,
data governance, regulatory ...)
2
3
“The goal is to turn data into
information, and information
into insight.”
Carly Fiorina, former CEO, Hewlett-Packard Co.
Speech given at Oracle OpenWorld

15CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Agenda
Data Science | What is it?1
Data science in the financial industry | Challenges and opportunities2
Data science with CH&Co. | Our convictions, what we do and what we offer3
Data science a closer look | The deep dip and some use cases4
To conclude | Our credentials…5
Appendices6

16CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
What is it? Techniques used Concrete examples
Self-learning
models
Prediction,
anticipation &
simulation
Ranking /
Discrimination
� Creating various homogenous
classes making the ranking of
individuals possible
Behavioural
analysis
� Interpreting and predicting
behaviours using statistical data and
text/sound/image mining
� Implementing models that
automatically teach themselves how
to optimise their parameters from
available data
� Time series
� Artificial Neural Network
� Regressions
� Unsupervised / supervised
Classification
� PCA / MCA / FCA
� K-means / Neural Networks /
Random Forest / SVM
� Text mining using sophisticated
machine learning algorithms
� Descriptive statistics
� Computational statistics
� Mathematical optimisation
� Predict future value of a stock
� Estimate a variable of interest
for new people
� Detect risk periods
� Homogeneous risk class in
Credit risk (PD/LGD)
� Risk segmentation for
insurance products
� Behavioural analysis from the
emails database of a company
� Human resources digitalization
� Client targeting
� Classification (SVM, clustering,
logistic regression, k-means,
PCA, etc.)
Estimation
� Estimate the value of a variable of
interest based on explanatory
variables
� Regression models
� Classification models
� Optimization of products offers
� Estimation of credit interest
rate
� Modelling of a variable from existing
data, enabling its prediction &
anticipation according to several
scenarios
A
B
C
D
E
From data science to quantitative techniques Data science & machine learning can be mapped across 5 dimensions

17CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Summary
Based on the historical values of a random process, machine learning
models are used in order to extract its trend and variations
Step 1: Model Calibration
� The variable of interest can depend on past values (time series) or on
external variables. In both cases, coefficients are calibrated on a learning
base and test out of sample.
� For a time series, the value of the process depends on itself and can
thus be extrapolated without external information
Step 2: Confidence intervals of predicted values
� Error of prediction observed on the sample can be used to define
confidence intervals of the predicted values
Step 3: Auto adjustments over time
� Error of prediction can be used to adjust further predictions
Step 4: Simulation
� To complete the predictions, adverse scenarios can be simulated based
on stress methods on macroeconomics indicators, extreme movements
of interest rate, of the stock market, etc.
Description
� Predict future values of a random process like default rates or recovery rates in
retail banking or even sales volumes in marketing
� Anticipate high risk periods such as increase of default, losses or a fall in retail
sales
� Simulate adverse scenarios and measure of those impacts on required capital
or on budget forecasts
� Predict future values of a random process
� Complete the prediction with simulations of adverse
scenarios
Objectives
� Risk capital management
� Budget forecastingScope
� Time series
� Artificial Neural NetworkTechniques
� LGD Backtesting (CH&Co)
� DP Stress testing (CH&Co)Examples
Outcome
90% Confidence interval of predictions
5% probability
Predicted values
5% probabilityHistorical values
Prediction period
Sample use case Prediction, anticipation & simulationA

18CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Summary
� Step 1: What do we want to explain?
The first step is to define the variable that is not observable and that we
wish to determinate, the variable of interest, from other variables, the
explanatory ones
� Step 2: Model calibration
a second step, the data base is segmented in 2 subbases in order to
control the consistency of the model. A learning base that is used to
calibrate the coefficients of the model and a test subbase to check if the
model is efficient on data that were not used to calibrate the model. It
can be used to define other parameters of the model by choosing the
ones that minimize the error one the test subbase
� Step 3: Estimation in a new sample
The third step consists in using the model fitted on the entire database
in order to estimate the variable of interest of new individuals based on
their explanatory variables
Description
� Insurance policies underwriting choices made the historical clients permit to be
more accurate in targeting new clients
� Insurance can optimize the product offers for new clients according to their
characteristics
� Similarly, in retail banking, it allows to affect to a new credit a probability of
default or an estimation of the average loss when a default occurs on this type
of credit
Outcome
Objectives
� Credit Risk Management
� Insurance policies proposalsScope
� Regression models
� Classification modelsTechniques
� Optimization of products offers (CH&Co)
� Estimation of credit interest rate for a new client
� Estimation of PD/LGD of new credits (CH&Co)
Examples
Learning subbase Test subbase
Coefficients’ calibration
Error of
Estimation
Full data baseNew
individuals
Coefficients’ calibration
1
� Estimate a variable of interest of new individuals
according to their characteristics
2
General methodology of model calibration and new estimation
Sample use caseEstimation: Better estimateB

19CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Summary
� Supervised classification:
– Input data have a know label, 0 and 1 for instance. The aim is to
assign each individual to the class he belongs
– The coefficients and the parameters of the model are defined on the
learning base and tested out of sample
– The model kept is the one that have the best accuracy rate
� Unsupervised classification:
– Input data are not labelled, there are no groups defined
– The classes are created so that the inter-class variance is minimised
and the intra-class variance is maximised
� Component Analysis:
– Data dimension reduction based on a linear combination of the
variables that explain most of the variance
– Discrimination of the individuals based on their representation with
the variables selected in the PCA process
3 main types of techniques used
� Fraud detection can be based on artificial neural networks that will assign
probabilities of fraud and thus a level of risk according to the clients
characteristics
� Ranking credits permits to affect them to a risk class with a corresponding
probability of default and a loss given default rate
Outcome
� Create homogeneous classes
� Rank individualsObjectives
� Credit Risk profiles
� Insurance policies proposals
� Marketing clients’ segmentation
Scope
� Supervised classification: Logit, Neural Networks
� Unsupervised classification: HAC / HDC / K-means
� Component analysis models: PCA / MCA / FCA
Techniques
� Homogeneous Risk Classes in Credit Risk (PD/LGD)
� Risk Segmentation for insurance products
� Fraud detection
Examples
4
3
2
1 High risk
Medium risk 2
Medium risk 1
Low risk
…
X1
X2
Xn
Classification with an Artificial Neural Network
Inputs
: C
lient/C
redit
chara
cte
ristics
Hidden layers: data transformation
Outputs
Sample use caseRanking / DiscriminationC

20CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Summary Description
Outcome
� Making sense of many different kinds of data
� Getting knowledge from diversityObjectives
� Organization optimization
� Client behavioural analysis (retail/CIB)Scope
Techniques
� Detect the service or product preference of a client
� Detect a bottleneck in a work organization Examples
� Data mining
� Statistical processing
Input- Data base Information extractedOutput-Behaviours
Detection
Statistical processing
and data interpretation
Crossing of the computed
statistics and interpreted data
The behavioural analysis or behavioural detection is the crossing of various
types of data that enables to characterize and then detect a behaviour.
There is no unique way to perform behavioural analysis. It all depends on
the nature of the data we are dealing with.
It can however be summarized in 3 steps
1. Organizing and pre-processing the data composing the available
database
2. Extract the information from the data by making a statistical processing
and interpretation
3. Detect behaviours by crossing the statistics and interpreted data
� Making the data speak by extracting many kinds of information
� Interpret these data in a new way as experienced in the HighWayToMail
CH&Co project
� Taking advantage of this new approach and improve the client behavioural
analysis or optimise an organization
DB
Sample use case Behavioural analysis: make the data speakD

21CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Summary
Step 1: Definition of the actions and the targeted result
� Modelling of the transition from an initial situation to a gain or loss
position based on the actions set up during the length of the algorithmic
process
Step 2: Initial probability of each action
� During the initialization, the probabilities of occurrence of each action
for a selected situation are equal
Step 3: Automatic learning
� Each time, the achievement of the final situation is defined as a gain or
loss result and a list of specific actions
� The results are automatically integrated in the probabilities of each
actions.
� The several recurrences of the algorithm allows a dynamic refining of
the actions’ probabilities based on the maximization of the chances to
obtain a gain in the final situation
Description
� Asset management models’ dynamic choice based on what is defined as a gain
(high return, low volatility etc.) and the levels of past return, volatility,
macroeconomics indicators that represent the different situations
� Automatic recalibration of model coefficients
Outcome
� Create a model that learns by itself each time it
experiments a situationObjectives
� Asset management allocation strategy
� Risk managementScope
� Algorithms
� Bayesian methodsTechniques
� Automatic adjustment in time of regression or
classification models
� Error correction model (ECM)
Examples
Action 1 Probability= 1/3
Action 2 Probability= 1/3
Action 3 Probability= 1/3
Action 1 Probability= 2/3
Action 2 Probability= 1/6
Action 3 Probability= 1/6
Initial
Situation
Situation 1
Gain
Loss
Situation 2
Situation 3
Situation 3
Example of an iteration
If action 1 in
the third
situation led
to a final gain
Sample use caseSelf-learning models E

22CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Agenda
Data Science | What is it?1
Data science in the financial industry | Challenges and opportunities2
Data science with CH&Co. | Our convictions, what we do and what we offer3
Data science a closer look | The deep dip4
To conclude | Our credentials…5
Appendices6

23CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
� Products and services real
time offering
� Churn detection
� HR Planning
What the banker / the insurer is looking for?
Business application CH&Co Credentials Other examples
Self-learning models
� Improve process, including decision process
(time, quality, information, etc)
� Automatize task / process
� Real time insurance pricing
� Automatic asset management
re-allocation
Automatic and dynamic
models backtesting
4
Behavioural analysis
� Understand your business and your
customers
� Develop new products or services
Proof of Concept “Highway to
mail”
3
Prediction /
Anticipation &
simulation
� Improve profitability
� Assess & monitor
� Predict and anticipate
� Expected return
� Natural catastrophes
prediction
� Terrorism anticipation
Improving and optimizing the
stress-testing exercise 5
Estimation
� Improve the quality of your services
� Improve profitability
� Optimal pricing of new
products
� Credit interest rate valuation
Marketing intelligence2
Scoring / Risk Estimation6
Client segmentationRanking /
Discrimination
� Optimize your costs
� Optimize your risk
� Develop customized existing products or
services
� Fraud detection
� Credit granting choice1
Scoring / Risk estimation6
Marketing intelligence2

24CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Credentials 1. Client segmentation
Summary
� Establishment of segmentation and scoring
methodology to optimize the commercial
approach and reduce costs
Objectives
� Marketing clients’ segmentation
� Credit risk profilesScope
� Scoring
� ClassificationTechniques
� Context
– Growing tensions on scare resources allocations and costs
– Increase willingness to optimize the commercial techniques to
maintain profitability
– A clear client segmentation is a strategic topic for the management
and a key success factor in the transversal initiatives led by the CIB
� Approach
– Validation of the perimeter focus : client segmentation on risk, cross-
sell, growth, scarce resources and profitability
– Data collection, filtering and selection of strategic variables to build
ratios. Variables analysis and profiling : statistical description,
abnormalities detection, correlation studies and variables weighting
– Classification of clients
– Analysis of the clients’ clusters and identification of pockets of
opportunities
Case description
-8
-6
-4
-2
0
2
4
6
8
10
12
-8 -6 -4 -2 0 2 4 6 8 10
Groupe 3Groupe 1
Groupe 2
Outcomes
Revenue generation
Operating cost optimisation
Risk optimisation
Time Optimisation
Quality optimisation
Illustration

25CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Credentials 2. Marketing intelligence
Summary
� Exploitation of customers databases in order to refine
the insurance offer of products and complementary
options according to clients’ profilesObjectives
� Insurance policies proposals
� Fraud detection
� Credit risk
Scope
� Logistic regression
� Neural network
� Random forest
Techniques
Outcomes
� Context
– Massive clients databases are not fully exploited
– The use of a machine learning model enables to offer additional options and
products depending on the client’s characteristics
– A good modelling of the customers’ specification allows an efficient
targeting of the offer and a high probability of complementary options
underwriting
� Approach
– Segmentation of the data base in 2 subbases : one for the learning and the
second one to test the model
– Initialization of coefficients and calibration of parameters through a
machine learning model in the learning base to optimize the predictive
power on the test base
– Extension of the model on the entire base
– Exploitation of the data with different machine learning models
– Selection of the best model through cross-validation
– Focus on the three products or additional options the most likely to be
underwritten
Case description
Revenue generation
Operating cost optimisation
Risk optimisation
Time Optimisation
Quality optimisation
Machine learning
model
P1
P2
P3P3
P4P4
P5
85%85%
72%72%
68%68%
5%5%
92%92%
X1X1
X2X2
X3X3
X4X4
Client profileUnderwriting probability for each
products/additional options
Illustration

26CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Credentials 3. Highway to mail: how to detect behaviours from emails exchanges.
Summary
� Optimizing companies organization
� Improving clients (retail/CIB) targetingObjectives
� All kinds of enterprisesScope
� Descriptive statistics
� Text mining using machine learning technicsTechniques
Illustration
Outcomes
� Context
– Everywhere, emails exchanges have become the main way to
communicate internally as well as externally
– The nature of these exchanges give a lot of information on how
people behave and lead to optimize the processes
� Approach
– We are working with databases of emails. Our approach is based on
both a statistical analysis of the emails and a text mining of their
subjects
– The computed statistics answer the questions: “how many emails are
exchanged on Mondays? Between 8 and 12 am”, Who are M. Y 10
main contacts?”, etc.
– Text mining (opinion/sentiment analysis) enables to interpret the
emails subjects
– Finally, crossing both kinds of data will enable to characterize and
then detect behaviours
Case description
Operating cost optimisation
Risk optimisation Time Optimisation
Quality optimisation
Urgent
MeetingContract
Private
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
-3 -2 -1 0 1 2 3 4
Invitation
Presentation
Deal

27CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Credentials 4. Automatic and dynamic models backtesting
Summary
� Automatization of backtesting models, calibration of
alerts’ thresholds, identification of poor performances
and actions to takeObjectives
� Credit Risk
� Asset managementScope
� Scoring
� Classification
� Logistic regression
Techniques
Outcomes
� Context
– Backtesting of banking models is time-consuming and extremely
resource-intensive
– Targeting the causes of poor performances allows to automatize
actions to take
� Approach
– A first layer of tests is implemented (Gini, Stability, Conservatism
etc.).
– According to the results, several layers of tests are built in order to
identify the causes of model’s failures
– Actions to take are automatically established
– Thresholds are dynamically calibrated
Case description
Revenue generation
Operating cost optimisation
Risk optimisation
Time Optimisation
Quality optimisation
Test 1
Test 2
Test 3
No alert
Minor alert
type 1
Major alert
type 1
Minor alert
type 2
Major alert
type 2
No action
Action 1
Action 2
Action 3
Action 4

28CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Credentials 5. Improving and optimizing the stress-testing exercise
Summary
� Improving the modelling of the probability of
default (PD)
� Improving risk management
Objectives
� Credit Risk
� Risk managementScope
� Neuronal networks
� ClassificationTechniques
Outcomes
� Context
– Stress tests are required by financial institutions using internal
rating-based approaches for credit risk to assess the robustness of
their internal capital assessments
– Stress tests use very unstable models and many validation tests that
are time-consuming
� Approach
– Focusing on robust Default Rate modelling using machine learning
algorithms
– Important statistical properties that are to be verified by the model
are directly included into the model optimization
– Variables to take into account are automatically chosen
– Models thresholds are dynamically calibrated
Case description
Time optimization
Cost optimisation
Risk optimization
Quality Optimization
Projection du taux de défaut du protefeuille X
selon des scénarios central et adverse
Adverse Central

29CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Credentials 6. Scoring Risk Factor Estimation
Summary
� Segment credits into homogeneous risk classes
in order to affect a probability of default or a loss
given default rate to each credit
Objectives
� Credit Risk
� Risk capital management
� Insurance policies proposals
Scope
� Scoring
� Classification
� Logistic regression
Techniques
Outcomes
� Context
– To estimate the probability of default or the loss given default rate of
a portfolio, it has to be well segmented in order to get the best
estimations of the Basel parameters in each group
� Approach
– A logistic regression is calibrated on historical data
– It provides each individual with a score value
– The range of values is segmented in several layers
– According to the score value, each individuals is affected to a risk
class (a layer)
– The segmentation is made so that the individuals of each class have a
similar risk profile and the profiles of individuals of different classes
are significantly different
– In each homogeneous risk class, the Basel parameters are then
estimated for the entire group
Case description
Revenue generation
Operating cost optimisation
Risk optimisation
Time Optimisation
Quality optimisation
Definition of a new credit score and affectation to an HRC
X1X2
X3X4
X5 X6
Logistic
regression
Credit Score value
Credit characteristics
HRC 2
HRC 3
HRC 4
HRC 1
Sco
re v
alu
e
0

30CHAPPUIS HALDER & CO.GRA – Data Science Offer – March 2016 Strictly confidential - © Chappuis Halder & Co.
Agenda
Data Science | What is it?1
Data science in the financial industry | Challenges and opportunities2
Data science with CH&Co. | Our convictions, what we do and what we offer3
Data science a closer look | The deep dip4
To conclude | Our credentials…5
Appendices6

31CHAPPUIS HALDER & CO.GRA – Data Science Offer –2016 Strictly confidential - © Chappuis Halder & Co.
Machine LearningOverview
Linear Regression
Support Vector Machines
K Nearest Neighbor
Principal Components
Analysis
PerceptronNaïve Bayes
Backpropagation Gradient Descent
Neural Networks
Logistic Regression
� � � = � � � . �(�)�(�)
�� = ��(�|�)
@>>� ��(� = A� �(�|�)1 − �(�|�)
��(# = 1) = 11 + &=B(∑ DEFEGHIEJK
�(�) = ���� � ���
������
� � = �� + �. � �����
���
�� = �� − �̅%���&�'&�(� = Engeinvalue �� … ��
�(�) = %��&�'&�(�4 ��� … ���
∆���(�) = �67��� + 8∆���(� − 1) :� = :� − 8 � ℎ(��) − # . ���
���
�(�) = ���� �&��ℎ(. #. �(�� · ��)
�(�� · ��) =(��=��)"+(#� − #�)"$
��>(ℎ
$
�&��ℎ → MN = 0# = 1 ? # = −1
�(�) = ���� � ���� + �
���� �� , �� = (�� − ��)"+(#� − #�)"$
assumes that the presence of a
particular feature in a class is
unrelated to the presence of any
other feature
A network of neurons in which
the output(s) of some neurons
are connected through weighted
connections to the input(s) of
other neurons
used to describe data and to
explain the relationship between
one dependent (e.g. age)
variable and one or more
independent variables (e.g.
income)
classify an unknown example
with the most common class
among k closest examples (e.g.
“tell me who your neighbors are,
and I’ll tell you who you are”)
composed of a large number of
highly interconnected processing
elements (neurones) working in
unison to solve specific problems
(like the human brain)
a technique used to emphasize
variation and bring out strong
patterns in a dataset. It's often
used to make data easy to
explore and visualize
a supervised machine learning
algorithm which can be used for
both classification or regression
challenges
A popular algorithm used to
optimize neural networks by
starting with an initial set of
parameter values and
iteratively moving toward a
set of parameter values that
minimize the function
a statistical method for analyzing
a dataset in which there are one
or more independent variables
that determine an outcome
A common method of training a
neural net in which the initial
system output is compared to
the desired output, and the
system is adjusted until the
difference between the two is
minimized

CHAPPUIS HALDER & CO.
MONTREAL
1501 McGill College
avenue – Suite 2920
Montreal H3A 3MB,
Quebec
PARIS
20, rue de la Michodière
75002, Paris, France
NIORT
19 avenue Bujault
79000 Niort, France
NEW YORK
1441, Broadway
Suite 3015, New York
NY 10018, USA
SINGAPORE
60 Tras Street,
#03-01
Singapore 078999
HONG KONG
1205-06, 12/F,
Kinwick Centre
32 Hollywood Road,
Central, Hong Kong
LONDON
50 Great Portland Street
London W1W 7ND, UK
GENEVA
Rue de Lausanne 80
CH 1202 Genève, Suisse
Benoit Genest| Partner | London
Ziad Fares | Head of R&D | Paris
Patrick Bucquet | Partner | New [email protected]
CONTACTS