1© Cloudera, Inc. All rights reserved.
How Apache Spark and Apache
Hadoop is helping to keep the Banking
regulators happy
2© Cloudera, Inc. All rights reserved.
Agenda
• Existing Architecture for Analytics & Risk
• Ever-changing Regulatory Landscape
• Challenges with existing architectures
• Modern architecture for Financial Risk
• Demo of key capabilities
3© Cloudera, Inc. All rights reserved.
Typical Existing Analytical Architecture
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
4© Cloudera, Inc. All rights reserved.
Regulatory Landscape
2012 2013 2014 2015 2016 2017 2018 2019
ICB Ring-fencing
ICB Loss
Absorbency
Leverage
Ratio -
Basel III
NSFR – Basel
III
MiFID II
T2S
LCR -
Basel III
ICB / Competition
Audit Policy
Cross Border
Debt Recovery
Financial
Transaction Tax
Market Abuse
Directive (MAD
II)
PRIP
Accounting
Directive
Review
AIFM Directive
EU Transparency
Directive
EU Reg on
Credit Rating
Agencies
CRDV
Internal
Governance
GuidelinesFATCA
PD
EMIR
SWAPS Push Out
– Dodd Frank
Securities Law
Directive (SLD)
Volker Rule –
Dodd Frank
Short Selling
Close Out
Netting
Crisis
Management
Recovery &
Resolution
Effective dates yet to be confirmed
BCBS 239 FRTB
5© Cloudera, Inc. All rights reserved.
Existing Architectures under pressureLimited Data – Incorporating new risk factors
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!Limited Data & Insight
• Adding new data source
• Risk Factors
!Latent Value
• How long to get new
reports with new risk factors
6© Cloudera, Inc. All rights reserved.
Existing Architectures under pressureMissed SLA’s for VaR, ES & Stress scenarios
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!Overloaded Bottlenecks
* Ever-increasing ETL
windows
!Overloaded Bottlenecks
* Ever-increasing batch
windows to extract data
7© Cloudera, Inc. All rights reserved.
Existing Architectures under pressureFrustrated Quants on the “edge” nodes (not-only-sql)
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!Lack of Tooling
* Ad-hoc, on-demand
complex risk modeling
requirements
8© Cloudera, Inc. All rights reserved.
http://www.bis.org/publ/bcbs239.pdf
9© Cloudera, Inc. All rights reserved.
III - Accuracy &
IntegrityStrive for a single
authoritative source for
risk data. Aggregate on
an automated basis.
IV - CompletenessCapture and aggregate
all material risk data.
Data available by
business line, legal entity,
asset type, industry,
region.…
V - TimelinessGenerate aggregate
and up-to-date risk
data in a timely
manner.
VI - AdaptabilityMeet a broad range of
on-demand, ad-hoc
risk management
reporting requests.
BCBS-239: Principles for Risk Data Aggregation
• Data, models and
processes live in silos
• Hard to get enterprise
wide view of risk
• Difficult to aggregate
• Lack of enterprise data
taxonomy
• Failed audits
• Aggregate / reported
risk data is infrequent
and stale
• Unable to handle
crisis situations
• Complex risk
modeling process
• Unable to handle
crisis situations
10© Cloudera, Inc. All rights reserved.
A modern risk platform calls for…
Scalability
More risk measures, more
scenarios. Fine-grained risk
data result in an order of
magnitude increase in
volume.
Speed
More frequent stress testing
and regulatory reporting.
High velocity scenario
development and
deployment.
Agility
More frequent stress testing
and Support for variety of
languages. Pre-trade
decisions. “What-if”
scenarios.
Transparency
Verifiable data. Timely
response to audits. Data
quality and lineage. Data and
model governance.
11© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming & Real Time
• Mkt Surveillance
• Best Execution
Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop
HDFS
High-throughput, scalable,
fault-tolerant, distributed
file system.
MapReduce
Distributed parallel
processing
frameworks.
12© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming & Real Time
• Mkt Surveillance
• Best Execution
Apache Impala
Massively Parallel
Processing (MPP) SQL
engine.
Apache Spark
In-memory distributed
processing framework.
Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop
13© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming & Real Time
• Mkt Surveillance
• Best Execution
Apache Spark
Distributed compute
framework. Can support
Python / C++, as well as
Java and Scala.
Data Science Workbench
Fully integrated data science
notebook application.
Cloudera Data Science Workbench
Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop
14© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming & Real Time
• Mkt Surveillance
• Best Execution
Cloudera Data Science Workbench
Apache Kudu
Real-time streaming
architectures for true
Aggregated Risk of
Demand
Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop
15© Cloudera, Inc. All rights reserved.
Modern Platform for Analytics and Machine Learning
Data
Sources
EDW
Analytic
Database
Operational
Database
Data Science
& Engineering
Shared Data
Layer
Modern Data Platform
Fixed
Reports
Dashboards/
Analytic
Applications
Non-SQL
WorkloadsSelf-
Service
BI/Ad Hoc
Flexible
Reporting
MiFID II, FRTB, IFRS-9, BCBS-239, MAD/MAR, GDPR, ….
16© Cloudera, Inc. All rights reserved.
BCBS 239 / FRTB “Illustrative” Architecture
Market Data Revaluation Calculation & Aggregation Reporting
Market Data Feeds
IPVIndependent Price Valuation Function
MRF / NMRFModelable & Non-
Modelable Risk Factors
Calibration
Fixed IncomeFront Office
Pricing Engines
Equity MktsFront Office
Pricing Engines
FXFront Office
Pricing Engines
… Other MktsFront Office
Pricing Engines
Enterprise Data Hub
Static Data Market Data Configuration
P&L Vectors Sensitivities Events
Positions & Transaction Data
Scenarios- Current- Historic- Stressed- Projected
RiskMetrics SA-related Risk
ComponentsCounter-Party
Credit Risk XVA
ES & Stressed ES P&L Attribution VaR
RegulatoryApplications
MiFID 2 Stress Testing GDPR
FRTB SA FRTB IMA EMIR
Regulatory
Reporting
Management
Reporting
Scenarios
Risk Sen
sitivities
17© Cloudera, Inc. All rights reserved.
BCBS 239 – Timeliness (Real-time risk)Simplifying Lambda architectures with Apache Kudu
KafkaSpark
StreamingKudu
Spark MLlib
ApplicationData
Sources
Individual Session
Full Model/Learning
Genesis
Real-time
Risk with
Greeks
1Event
Occurs
2Market
Data 3Stream
Processin
g
4Land in
RDBMS
5Batch
Valuation
18© Cloudera, Inc. All rights reserved.
Metadata Management
IngestValidationProfiling
Developer Tools: IDEs, Notebooks, SCM Operations Tools: Scheduling, Workflow, Publishing
Data Management Exploration / Model Development Production / Model Deployment
Feature Engineering
Model Training & Testing
Visualization
ProductionFeature
Generation
ProductionModel Port
Production Testing
ResultValidation
Serving
User: Data Engineer User: Quant Analyst Users: Data / Dev / Ops Engineer
Modern Platform for Analytics and Machine LearningSupporting complete development lifecycle for risk
19© Cloudera, Inc. All rights reserved.
Risk Footprint with
Apache Spark and Hadoop
o 19 GSIB customers
o 9 banks with risk use
cases in production
o 6000+ nodes deployed
o >5 years in production
20© Cloudera, Inc. All rights reserved.
Market Risk aggregation platform for a Global Systemically Important Bank
55x faster processing, 8x more data
capacity
300+ daily interactive users analyzing
current and historical data
21© Cloudera, Inc. All rights reserved.
Global Systemically Important Bank
On-premise and cloud-based Hadoop clusters according to workload.
Tested on AWS to 40,000 cores. Demonstrated linear scaling of simulation workloads.
Top Related