Data Science Governance
-
Upload
bart-hamers -
Category
Data & Analytics
-
view
149 -
download
0
Transcript of Data Science Governance
DATA (SCIENCE) GOVERNANCE.
DATA SCIENCE IN BANKING, 23-5-2015 BRUSSELS DATA SCIENCE COMMUNITY.
Bart Hamers be.linkedin.com/in/hamersbart
DATA SCIENCE IN BANKING
Marketing • Customer segmentation
• LTV • Cross & upselling • Churn
Risk Management • Credit Risk • Market Risk • Operational Risk
Markets • Pricing • Trading • High Frequency Trading
Security & Fraud • Intrusion detection • Anti Money Laundering
• Rogue Trading
BANKING: RULES, RULES AND MORE RULES
risk
bank
data
reporting
aggregation
managementprinciples
supervisors
capabilities
include
information
requirements
expect
practices
processes
appropriate basel board
business committee
crisis
effective ensure
exposures
meet
review senior stress
timely
able accuracy action
apply
enhancements
financial governance group
identify implementation improve
internal level
measures needs
recipients relevant
supervisory system
ability accurate
assess
completeness
compliance cooperation critical decision-making develop
document eg
framework frequency g-sibs
infrastructure integrity key limited
material
operations organisation
provide remedial
requests
type used validation
• Basel 3 • CDR IV • Solvency II • BSBS 239 • …
The regulatory text also influence all aspects of data science modeling.
HOW SHOULD WE DEAL WITH THIS?
The results of all data science initiatives produce new information and data.
Using data science, data even more becomes a company asset.
All ‘traditional’ principles of data quality management and data governance remain applicable.
PRINCIPLES OF DATA (SCIENCE) QUALITY?
Recency
Volatility Timeliness
Inter-relational
Time
Intra-relational
Co
nsis
ten
cy
q Time: the time dimension of the data science q Volatility: characterizes the frequency with which
data vary in time and models need to be refreshed.
q Timeliness: expresses how current the models are for the task at hand
q Recency: how promptly are DS results updated. (outdated information)
q Accuracy: the closeness between real-life phenomena and its representation
q Validity : the semantic meaning of the data science results. Are the results following the business logic
q Comprehensiveness: ability of the user to interpret correctly the data science results
q Metadata: Is there formal description of the data science wrt technical, operational and business information.
q Can the data science results easy by understood by non-technical users.
q Consistency: Captures inconsistencies between similar data attributes in data
q Inter-relational: captures of the violation or conflicting opinions of the data science results on the same data
q Intra-relational: captures of the risk of a to limited view on the subject. (ex. only cross selling, no churn and LTV view. )
q Completeness: degree to which concepts are not missing
q Can and do we cover the full client portfolio?
q Operational Risk : Is the data secured in terms of human and IT errors?
q Human aspects: ad hoc human manipulation, unfollowed regulations and hierarchical access levels
q IT aspects: unrealistic implementation
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
1. Data science should focus on the end-user’s needs.
2. Data science should be well managed, it should be transparent who has the authority to create, modify, delete, use and control the data science initiatives.
3. The data science results should be trustworthy.
4. All data science should be easily available for the end-users
5. Data science should be fit-for-purpose.
6. Data science initiatives should be globally managed in order to be lean, agile and forward looking.
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
1. Data science initiatives should focus on the end-user’s needs.
• What is the business problem we are trying to solve? • Will the data science solution provide a measurable
improvement and how will this be evaluated?
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
2. Data science should be well managed, it should be transparent who has the authority to create, modify, delete, use and control the data science initiatives.
• Apply data governance principles to data science in order to create policies and install trust. • Ownership, stewardship, end-users,… • Ownership is at business side!
• Write guidelines about who and how the data science results can be used without constraining the usage.
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
3. The the results of data science should be trustworthy.
• Guarantee the data quality used by the models. • More (big) data is not a solution for bad quality data.
• Test and backtest the result of your model frequently. • Test your results on accuracy, precision and stability. • The results quantitatively and qualitatively. • Take into account the time dimension and expiration
date of the results.
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
4. All data science results should be easily available for the end-users
• Data science you not be something magical for the happy few.
• A data driven company is only created by sharing the data results at all levels of the company. • Marketing predictions • Sales predictions • Risk and finance forecasting • Business process optimization.
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
5. Data science should fit-for-purpose.
• Never forget Occam’s razor!
• Be aware of the risk of over-fitting!
MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE
6. All data science initiatives should be globally managed in order to be lean, agile and forward looking.
• Do not create data science silos. • Share your experience, systems, methodologies and
data. • Create data sandboxes. • Define a forward looking data strategy linked to your
business plan. (data is not collected overnight.)