Enhancing Catastrophic Risk Analysis with IBM Puredata for Analytics
-
Upload
winfield-reynold -
Category
Documents
-
view
37 -
download
8
description
Transcript of Enhancing Catastrophic Risk Analysis with IBM Puredata for Analytics
© 2012 IBM Corporation1
Enhancing Catastrophic Risk Analysiswith IBM Puredata for Analytics
© 2012 IBM Corporation2
Agenda
Leveraging IBM Puredata into Catastrophic Risk Analysis
IBM Puredata Success Stories in Catastrophic Risk Analysis
IBM Puredata In-database Analytics
IBM Puredata User Defined Extensions (UDX)
Migration of a Catastrophic Risk Application to IBM Puredata
© 2012 IBM Corporation3
IBM Big Data Platform
InfoSphere BigInsights
Hadoop-based low latency analytics for variety and volume
IBM Puredata 2000
BI+Ad Hoc Analytics Structured Data
IBM Smart Analytics System
Operational Analytics on Structured Data
IBM InfoSphere Warehouse
Large volume structured data analytics
InfoSphere Streams
Low Latency Analytics for streaming data
MPP Data Appliances
Stream ComputingInformation Integration
Hadoop (NoSQL)
InfoSphere Information Server
High volume data integration and transformation
© 2012 IBM Corporation4
On-Demand Catastrophe Risk Analysis with IBM Puredata for Analytics
© 2012 IBM Corporation5
Who is interested in Catastrophe Risk Models?
Insurers- Managing their exposure and
filing for rates
Catastrophic Risk Models
Brokers – Assessing risk management
strategies for clients
Reinsurers – Pricing reinsurance
Capital markets – Pricing cat bonds
Rating agencies– Evaluating a company’s
capital requirements
© 2012 IBM Corporation6
Leveraging Catastrophe Risk Modeling
Reduce the risk that an insurer is unable to meet claims- Reduce policyholder loss if firm is unable to fully meet all claims- Provide an early warning system if capital falls below a required
level Promote confidence in financial stability
- Evaluate the company's risk profile and related reinsurance and investment strategies
- Discuss capital management with other external parties (ratings)
Evaluate returns on risk-adjusted capital for strategy development and implementation for individual business segments
Understand the relative contribution of the major risk categories to the overall risk profile (non-cat losses, catastrophes, reserve, credit and market)
© 2012 IBM Corporation7
Catastrophe Risk Modeling
Treaty Conditions
Standard Models
Scenario Based Models
Value at Risk
Underwriting
Re-Insurance
Policy Pricing
Policyholder Loss
Loss Estimating
Sensitivity Analysis
Capital Management
Geospatial Peril ModelsHistorical/ForecastedTemporal/Real Time
Performance Improvement by Understanding Risk
Simulations
Temporal Correlation
Likelihood/Probability
© 2012 IBM Corporation8
Changing the Game in Catastrophe Risk Modeling
Back-office Applications Downstream Analytics
Catastrophe Modeling Workflow ControlPolicy Demographics
IBM Puredata Analytic Appliance
Net
ezza
Hig
h-Sp
eed
Spa
tial
Dat
a Lo
ader
( AIR
,RM
S da
ta)
SPSS
Workflow Management
FasterNear-Real-Time Data Ingestion
Shortened Analytic Cycles
New MethodsComprehensive
Risk AnalysisIn-process Risk
Analysis
Flexibility & Understanding
What-if ModelingHigh-Speed Risk
Analysis
Increased DepthIncreased Analytic
Dimensionality
Expanded Peril Models
Cognos
Treaty Conditions
Standard Models
ScenarioModel
Simulations
Temporal Correlation
Likelihood/Probability
IBM NetezzaIn-Database
Analytics
Embedded Customer Algorithms (SQL & UDX)
Stat & Treaty Engine
Value at Risk
Underwriting
Re-Insurance
Policy Pricing
Policyholder Loss
Loss Estimates
Sensitivity Analysis
Capital Management
Ad hoc QueryData Mining
© 2012 IBM Corporation9
Complementing AIR & RMS with IBM Puredata for Analytics
Data Extraction
& Grouping
Simulation
Recovery
SortOn Year
SQL ExportStats
Module
Sorted onYearly
Max Loss
Sorted onYearly
Total Loss
CalculationEngine
Pre-CatStats
RecoveryStats
Post-CatStats
Apply TreatyData
Calc net losses
Report generationAd hoc query
Pre Cat data
Initial Scope
Upstream RMS & AIR
application
In-database Analytics
EPDefinition
Expanded Capability by moving to in-database Analytics
© 2012 IBM Corporation10
Key Points for Migrating to IBM Puredata for Analytics
Database Migration- IBM Puredata is a SQL-92 compliant database- If you are using SQL-Server proprietary extensions there will be some migration
effort- Initial review indicates we may not want to use the existing UDF, but rather optimize
the SQL for IBM Puredata Analytic Applications
- Netezza Analyitcs UDX framework essentially allows a wrapper to be put around typical “file-in – file-out” applications to run in-database
- We may want to alter some of the existing application for improved parallelism (non-serial) as well as set-based logic
Long-term Simplicity- IBM Puredata essentially eliminates the need for database tuning and performance
issues associated with Analytic- Consolidation of analytics into the database simplifies the entire architecture
Only the IBM Puredata Analytic Performance is proprietary- Again, IBM Puredata is SQL-92 compliant- Our UDX wrappers are similar to every other database platform
© 2012 IBM Corporation11
IBM Puredata Advanced Analytics Improved Analytics for Catastrophe Risk
© 2012 IBM Corporation12
Up-to-the-minute Risk Modeling – Guy Carpenter
Large reinsurance company
Exposure management application calculates risk on insured properties
Risk data changes constantly as hurricane is approaching
4 million insured properties, tens of thousands of risk polygons
Previously analysis took 45 minutes using Oracle Spatial
Now takes 5 seconds using IBM Puredata
© 2012 IBM Corporation13
National Fire Station Alignment
Determine the 5 Nearest Fire Stations to each household - 41,000 US Fire Stations- 114,00,000 million Zip 12 Points (Parcels) for Entire US…. - Calculated all scenarios in 30 minutes!- Analysis was never possible on Oracle! 41,000 – U.S. Fire stations
© 2012 IBM Corporation14
Proximity to Coast
Shortest Distance to Coast: Florida- 14,700 coast segments (each defined by 300 vertices on average)- 8,500,000 Points ZIP12 Points- Cartesian Join
Netezza: 3 Hours, 42 Minutes
Inhouse GIS – 3 weeks!
(100+x Improvement)
© 2012 IBM Corporation15
Policy Accumulation – Total Insured Value
Define a “buffer” around each insured property Sum all the insured properties in each buffer
© 2012 IBM Corporation16
Calculate Total Insured Value
Sample Data – Miami Florida (Miami-Dade County), - 939,000 properties, Sum each value within a buffer centered around each point - 1 km radius search, On average 600 properties summed into each calculation- Individual Calculation - < 1 second- Bulk calculation - 2 hours
© 2012 IBM Corporation17
Determining Portfolio Value-at-Risk In-Database
CHALLENGE Evaluate massive portfolios as fast as possible to minimize future losses and risk exposure
BENEFITSReal-time, high performance, scalable in-database analytics enables faster risk analysis
“This technology will allow us to revolutionize our risk calculation environment...we will be able to completely change the we that we look at and calculate risk.”
- Risk Quant at a Top 3 Bank
SOLUTIONIn-database analytics moves the complex calculations
next to the data, harnessing the power of up to 920
CPU cores to attack one of the most challenging
trading analytic processes, Value-at-Risk that uses
statistical simulations to compute forward looking
portfolio values running in minutes as opposed to
hours
© 2012 IBM Corporation18
Calculating Value-at-Risk In-Database
Determine the Value-at-Risk for an equity options desk- 200,000 positions – different instruments and maturities- 1000 underlying stocks
Required to do the following:- Calculate daily returns on underlying stocks using historical prices- Calculate the correlation of daily returns- Perform Singular Value Decomposition (SVD)
• Simulate correlated returns for all underlying stocks using SVD for next 1 year- Perform 10,000 simulations and calculate the 95% percentile loss on each day for the entire portfolio
Puredata TF-6 Puredata TF-12
Nodes 12 CPU/48 Core 24 CPU/96 Core
Storage 60 TB 120 TB
#rows(data volume)
200 Positions 200 Positions
#columns(dimensions, features)
10,000 Simulations,1000 Stocks,
250 Days
10,000 Simulations,1000 Stocks,
250 Days
Total Simulations 2.5 Billion – 3 minutes 2.5 Billion – 1.5 minutes
Calculations 200 Thousand – 7 minutes 200 Thousand – 3.5 minutes
Total Elapsed Time < 10 minutes 5 minutes
© 2012 IBM Corporation19
OpenRisk uses In-Database Scoring and Spatial Analytics on Netezza
CHALLENGE Quickly and on-demand determine combined risk across all portfolios of any size (1M+) for all insured catastrophic events
BENEFITSReal-time, high performance, scalable in-database analytics enables broader risk analysis
“Because of Netezza, we were able to launch a new business model – an on-demand, software-as-a-service large scale catastrophic risk modeling – that radically reduces the exposure for insurance companies.”
- Shajy Mathai, CTO, OpenRisk
SOLUTIONIn-database analytics eliminate data movement and execute 500B+ complex calculations in minutes to determine risk across portfolios
© 2012 IBM Corporation20
OpenRisk Natural Disaster Portfolio Loss Estimate
Statistical model with a stochastic set of hurricane events that are applied to portfolio of properties to generate loss estimates over time - 1M policies assessed for the entire state of Rhode Island
Required to do the following:- Computing the nearest “surface roughness” coefficient- Nearest GID for every impacted site (Lat/Long accuracy of .2 minute)- Interpolation on continuous distribution functions
Puredata TF-6 Puredata TF-12
Nodes 12 CPU/48 Core 24 CPU/96 Core
Storage 60 TB 120 TB
#rows(data volume)
1 Million Policies 1 Million Policies
#columns(dimensions, features)
100K Events1 M Locations
40K Geographic Bins
100K Events1 M Locations
40K Geographic Bins
Loss Matrix > 1 TB > 1 TB
Total Elapsed Time 45 minutes 20 minutes
© 2012 IBM Corporation21
Optimizing Your Own Advanced Analytics OpenRisk Hurricane Risk Model
© 2012 IBM Corporation22
Use Case Summary – Hurricane Risk Assessment
Catastrophe modelers run various models which simulate hazard and vulnerability over extremely large time periods (thousands of years) for portfolios of property risk. - This process generates terabytes of data which in turn is
analyzed to make loss estimates.Challenge: To develop a framework for implementing a
Hurricane Model that will:- Improve performance from days to hours- Reduce data movement- Increase integration flexibility- Reduce operational footprint by integrating database with
analysis grid
© 2012 IBM Corporation23
Technical Architecture Imperatives
IBM Netezza Analytics as a SaaS Platform Facilitate rapid porting of Existing Hurricane Insurance Risk Models Maximum performance & scalability
- Millions of sites affected by a disaster event, i.e.. Hurricane Simplicity of a SQL call to run a sophisticated hurricane model Leverage the flexibility of IBM Netezza Analytics to implement a
hurricane risk model- UDX: User Defined Extensions to incorporate legacy code- Geospatial Analytics: Run risk for sites impacted in hurricane
polygon Facilitate rich, high-performance reporting and 3D map rendering!
- Accurately forecast damage assessment to property- Report discrepancy between coverage and damage assessment
© 2012 IBM Corporation24
Database FortranProgram
ODBC
• Process a site if in hurricane• Gather building structural characteristics
• Gather terrain data• Apply mathematical modeling to score risk
• Compute predicated losses to site in $
Results to Files
Bulk load todatabase
The Existing Solution
The Existing Solution- Single threaded processing, very slow- Potent risk modeling intellectual property
locked away in Fortran- Difficult to apply parallel processing- Lots of infrastructure- Bulk-movement of data
Challenges– How to leverage existing code without
significant rewrite?– How to apply parallel processing in
simple way? – How to avoid massive data shipping?
© 2012 IBM Corporation25
Analytics Computing Grid
C++/Fortran UDX, Geospatial Analytics
Client Company 1Proprietary Risk Model
Client Company nProprietary Risk Model
IBM Netezza SolutionMulti-tenant Solution for Applying Advanced Analytics in-Database
Simplicity of SQL!!! – Two Steps:- 1. Run Models on Demand!- 2. Execute Reports!
Massively parallel!!!- Speed!- Optimal distribution of site, building,
terrain, and physics data In-database Analytics
- Geospatial analytics applies latitude & longitude appropriately.
- C++/Fortran UDX implement model.- 1 thread/Shared Nothing node.- Elimination of DATA SHIPPING…- Emphasis on FUNCTION SHIPPING- True Multi-Tenant SaaS
© 2012 IBM Corporation26
Running the Model on Demand
Reporting Layer
ETL Preprocessing • SQL • ETL Tools
Run Model • Simple & elegant single SQL statement• Use C++/Fortran UDXs that execute for a site:
• Determine of building characteristics.• Determine terrain factors.• Determine physical forces in effect.• Use proprietary mathematics • Output data in complex, proprietary data structure.
Populate input tables • Simple SQL insert statements using pure C++ UDX.
Process reporting tables • A SQL stored procedure
One elegant,master storedprocedure
© 2012 IBM Corporation27
THANKSQUESTIONS?