Analytics for the Insurance Industry_SAS
-
Upload
abhaytiku584 -
Category
Documents
-
view
219 -
download
0
Transcript of Analytics for the Insurance Industry_SAS
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 1/22
WHITE PAPER
D Mii i I IdSolving Business Problems Using SAS® Enterprise Miner™ Software
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 2/22
i
Data MInIng In the Insurance InDustry
Table of Contents
1. Introduction...............................................................................................12. Opportunities and Challenges for Insurance Firms .................................1
3. Using Data Mining in the Insurance Industry ..........................................3
3.1 Products and Pricing Optimization ........................................................3
3.2 Acquiring New Customers .....................................................................6
3.3 Retaining Existing Customers ................................................................8
3.4 Performing Sophisticated Campaign Management ...............................9
3.5 Detecting Fraudulent Claims ...............................................................11
3.6 Estimating Outstanding Loss Reserves ...............................................12
4. Implementing Data Mining Projects .........................................................13
4.1 Accessing the Data ..............................................................................13
4.2 Warehousing the Data..........................................................................14
4.3 Analyzing Data Using the SEMMA Methodology ..................................14
4.3.1 Sample ................................................................................................15
4.3.2 Explore ................................................................................................15
4.3.3 Modiy ..................................................................................................16
4.3.4 Model ..................................................................................................16
4.3.5 Assess .................................................................................................17
4.4 Reporting the Results ..........................................................................18
4.5 Exploiting the Results for Business Advantage....................................18
5. Summary .................................................................................................19
References .....................................................................................................19
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 3/22
Data MInIng In the Insurance InDustry
1. Introduction
Data mining can be dened as the process o selecting, exploring and modeling largeamounts o data to uncover previously unknown patterns. In the insurance industry,
data mining can help rms gain business advantage. For example, by applying
data mining techniques, companies can ully exploit data about customers’ buying
patterns and behavior – as well as gaining a greater understanding o their business
to help reduce raud, improve underwriting and enhance risk management.
This paper discusses how insurance companies can benet by using modern data
mining methodologies and thereby reduce costs, increase prots, acquire new
customers, retain current customers and develop new products.
2. Opportunities and Challenges for Insurance Firms
It is likely that the impact o the recent nancial crisis will have a dramatic eect
on the insurance industry. Insurance companies will look at improving operational
ineciencies to combat a global sot market, plus the industry can expect aggressive
regulatory activities as a result o the crisis. This will include a renewed push or an
Optional Federal Charter, designed to regulate the US insurance industry at a ederal
level, plus the impending Solvency II legislation that will aect European insurers.
Other opportunities and challenges facing insurers include:
• Changesininformationtechnology.
• Globalizationandmarketconsolidation.
• Multichanneldistribution.
Changes in Information Technology
As with other industries, the insurance industry has experienced many changes
in inormation technology over the years. Advances in hardware, sotware andtelecommunications have oered benets, such as reduced costs and real-time
processing , resulting in increased potential or prot. These advances have also
resulted in new challenges, particularly in the area o increased competition.
1
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 4/22
Data MInIng In the Insurance InDustry
Technological innovations, such as data mining and data warehousing, have greatly
reduced the cost o storing, accessing and processing data. Business questions that
were previously impossible, impractical or unprotable to address due to the lack
o availability or reliability o data can now be answered using data mining solutions.
For example, a common business question is, “How can insurance rms retain their
best customers?” Through data mining technology, insurance rms can tailor rates
and services to meet the customer’s needs, and over time, more accurately correlate
rates to the customer behaviors that increase exposure.
Globalization and Market Consolidation
TheemergenceoftheinsurancemarketsinEasternEurope,India,Brazilandother
developingcountriesprovidestheopportunityformatureinsuranceorganizationsto
not only diversiy geographically, but also identiy new target markets or products
and services. However, one o the biggest challenges in expanding into new globalmarkets is navigating each country’s distinct regulatory environment.
The insurance industry has been in a consolidation mode or the past several years
and this is not expected to change in the near uture; in act, we may begin to see
some large acquisitions and the emergence o megacarriers.
Multichannel Distribution
Technology is changing the traditional insurance selling and distribution model.
Insurance companies are beginning to implement multichannel integration strategies
as people are buying insurance via the Internet and insurance aggregators.
To either survive or thrive, insurance rms will need sophisticated data warehousing,
data mining and reporting sotware that enables “deep dives” into their customer and
distribution data, allowing them to answer questions such as:
• “Whenisanagentmostlikelytoleave?”
• “Whenandhowaremysaleschannelsmosteffective?”
2
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 5/22
Data MInIng In the Insurance InDustry
3. Using Data Mining in the Insurance Industry
Data mining methodology oten can improve upon traditional statistical approachesto solving business solutions. For example, linear regression may be used to solve
a problem because insurance industry regulators require easily interpretable models
and model parameters. Data mining oten can improve existing models by nding
additional, important variables, identiying interaction terms and detecting nonlinear
relationships. Models that predict relationships and behaviors more accurately lead
to greater prots and reduced costs.
Specically, data mining can help insurance rms in business practices such as:
• Optimizingproductsandpricing.
• Acquiringnewcustomers.
• Retainingexistingcustomers.
• Performingsophisticatedcampaignmanagement.
• Detectingfraudulentclaims.
• Estimatingoutstandinglossreserve.
3.1 Products and Pricing Optimization
As a result o changing demographics, economic actors and customer buying
habits, it is critical that insurance companies identiy and monitor the varying needs
o their customers and adjust their product portolio. Problems with protability can
occur i rms do not oer the right policy, rate or customer segment at the right time.
For example, the most protable customer segment might be higher-risk customers,
which may command higher rates.
An important problem in actuarial science concerns rate setting or the pricing o
each policy. The goal is to set rates that refect the risk level o the policyholder by
establishing the break-even rate (premium) or the policy. The lower the risk, the
lower the rate.
3
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 6/22
Data MInIng In the Insurance InDustry
Identify Risk Factors that Predict Profits, Claims and Losses
The critical question in rate making is the ollowing: “What are the risk actors or
variablesthatareimportantforpredictingthelikelihoodofclaimsandthesize
o a claim?” Although many risk actors that aect rates are obvious, subtle and
nonintuitive relationships can exist among variables that are dicult, i not impossible,
to identiy without applying more sophisticated analyses. Modern data mining models
can more accurately predict risk, thereore insurance companies can set rates more
accurately, which in turn results in lower costs and greater prots. For example, in
the automobile insurance industry, companies are beginning to use telematic devices
that take into consideration the driver’s driving habits, as well as other actors such
as environmental and weather-related risk inormation.
Creating Geographic Exposure Reports
Insurance rms also can augment their business and demographic databases with
sociogeographic data, which is also reerred to as spatial attribute data or latitude/
longitude data. The reason or augmenting existing data with socio¬geographic data
is that the social prole, including geographic location o potential customers, can
be an important risk actor in the rate-making model. For example, driving conditions
and the likelihood o accidents and auto thets vary across geographic regions.
Dierences in risk actors indicate dierences in the likelihood o claims, expected
claim amounts, and ultimately, rates.
Including purely geographic data in a data warehouse enables the insurance rm to
create digital maps. Business analysts can overlay (or map) the data, then assess
and monitor exposure by geographic region. Such data processing and data
mapping capabilities are not merely or the purpose o plotting the geographic data
or display. Instead, the data also can be included in rate making and other analytical
models. I an area o overexposure is identied, then the risk can be mitigated,
possibly by rate adjustment or reinsurance.
4
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 7/22
Data MInIng In the Insurance InDustry
Figure 1: A geospatial mapping o claims data
Improve Predictive Accuracy – Segment Databases
To improve predictive accuracy, databases can be segmented into more
homogeneousgroups.Thenthedataofeachgroupcanbeexplored,analyzedand
modeled. Depending on the business question, segmentation can be done using
variables associated with risk actors, prots or behaviors. Segments based on these
types o variables oten provide sharp contrasts, which can be interpreted more
easily. As a result, actuaries can more accurately predict the likelihood o a claim and
the amount o the claim.
For example, one insurance company ound that a segment o the 20- to 25-year-
old male drivers had a noticeably lower accident rate than the entire group o 20- to
25-year-old males. What variable did this subgroup share that could explain the
dierence? Investigation o the data revealed that the members o the lower risk
subgroup drove cars that were signicantly older than the average and that the
driversoftheoldercarsspenttimecustomizingtheir“vintageautos.”Asaresult,
membersofthesubgroupwerelikelytobemorecautiousdrivingtheircustomized
automobiles than others in their age group.
5
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 8/22
Data MInIng In the Insurance InDustry
Figure 2: A data exploration diagram or the distribution o educational degrees. The
shaded area represents the amount o claims fled. Notice that the higher the education
level, the lower the percentage o claims. Claimants with a PhD had the lowest rate, while
those without high school diplomas had the highest rate.
3.2 Acquiring New Customers
Another important business problem is the acquisition o new customers. Although
traditional approaches involve attempts to increase the customer base by simply
expanding the eorts o the sales department, sales eorts that are guided by more
quantitative data mining approaches can lead to more ocused and more successul
results.
Focusing Marketing Strategy to Reach Targeted Group
A traditional sales approach is to increase the number o policyholders by simply
targeting those who meet certain policy constraints (illustrated in Figure 3). A
drawback to this approach is that much o the marketing eort may yield little return. At some point, sales become more dicult and greater marketing budgets lead to
lower and lower returns.
6
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 9/22
Data MInIng In the Insurance InDustry
7
Figure 3: Marketing to all those meeting policy criteria
Increasing a Marketing Campaign’s Return on Investment
In contrast to the traditional sales approach, data mining strategies enable analysts
torenethemarketingfocus.Forexample,thefocuscouldberenedbymaximizing
the lietime value o policyholders; that is, the prots expected rom policyholders
over an extended period o time. Thus as Figure 4 illustrates, the crucial marketingquestion becomes, “Who o those meeting the criteria are most likely to actually
purchase a policy?”
Figure 4: Marketing to those most likely to purchase
Because only the segmented group o those likely to purchase is targeted, the return
per unit o marketing eort is greater.
Can even better results be obtained? In other words, as more data is collected,
can better models be developed and can marketing eorts be ocused urther?
Tosharpenthefocus,analystsintheinsuranceindustrycanutilizeadvanceddata
mining techniques that combine segmentations to group (or prole) the high lietime-
value customer and produce predictive models to identiy those in this group who are
likely to respond.
For example, perhaps the rst group or the marketing campaign is made up o those who meet the policy criteria, are likely to purchase and are likely to remain
loyal by not switching to another company (as illustrated in Figure 5). Segmenting
the universe o potential customers to ocus on specic groups can make marketing
campaigns more ecient and urther increase the return per unit o marketing eort.
Existing andpotential customers
Those meeting
policy criteria
Existing andpotential customers
Those meetingpolicy criteria
Prole of those mostlikely to purchase
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 10/22
8
Data MInIng In the Insurance InDustry
Figure 5: Marketing to those most likely to retain policies
Companies can increase response rates and protability by rst targeting those
prospects that have characteristics similar to those o the high lietime-value
customers. Moreover, additional data mining work could be perormed to identiy
the best time o day, the best season and best media or marketing to the
targeted group.
3.3 Retaining Existing Customers
As acquisition costs increase, insurance companies are beginning to place a greater
emphasis on customer retention programs. Experience shows that a customer
holding two policies with the same company is much more likely to renew than is
a customer holding a single policy. Similarly, a customer holding three policies is
less likely to switch than a customer holding less than three. By oering quantity
discounts and selling bundled packages to customers, such as home and auto
policies, a rm adds value and thereby increases customer loyalty, reducing the
likelihood the customer will switch to a rival rm.
Analyze at the Customer Level
Successfullyretainingcustomersrequiresanalyzingdataatthemostappropriate
level, the customer level. Unortunately, the insurance industry tends to be a laggard
andinmostcasescontinuestoanalyzeprotabilityatapolicylevel.
Using a data mining technique called association analysis, insurance rms can more
accurately select which policies and services to oer to which customers. With
this technique, insurance companies can perorm sequential (over time) market
basket analyses on customer segments. For example, what percentage o newauto insurance policyholders also purchases a homeowners insurance policy within
ve years?
Existing andpotential customers
Those meetingpolicy criteria
Prole of those mostlikely to purchase
Prole of those mostlikely to remain loyal(for example, retain policyfor more than 10 years)
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 11/22
Data MInIng In the Insurance InDustry
9
Aim Retention Campaigns at Those Most Likely to Switch Firms
Database segmentation and more advanced modeling techniques enable analysts
to more accurately choose whom to target or retention campaigns. Current
policyholders that are likely to switch can be identied through predictive modeling.
A logistic regression model is a traditional approach, and those policyholders who
have larger predicted probabilities o switching are the target group.
Identiying the target group may be improved by modeling the behavior o
policyholders. By including nonlinear terms and more interaction terms, neural
network models can generate more accurate data on the probability o policyholders
switching. Additionally, decision tree models may provide more accurate identication
bydividing(segmenting)thepolicyholdersintomorehomogeneousgroups.Greater
accuracy in identiying the target group can reduce costs and has the potential or
greatly improving the results o a retention campaign.
3.4 Performing Sophisticated Campaign Management
Developing a customer relationship has a long-standing tradition in business.
Small rms and many retailers are able to relate to their customers individually.
However,asorganizationsgrowlarger,marketingdepartmentsoftenbegintothink
in terms o product development instead o customer relationship development and
maintenance. It is not unusual or the sales and marketing units to ocus on how
ast the rm can bring a mass-appeal product to market rather than how they might
better serve the needs o the individual customer.
Ultimately, the diculty is that as markets become saturated, the eectiveness
o mass marketing slows or halts completely. Fortunately, advanced data mining
technology enables insurance companies to return their marketing ocus to the
aspects o individual customer loyalty. Creative, data-driven, scientic marketing
strategies are now paving the way back to the customer relationship management o
simpler, ecient economies, while on a much grander, comprehensive scale.
A Customer-Centric Focus
Many leading insurance companies are making an eort to move away rom the
product-oriented architectures o the past and toward a customer-centric ocustobetterservetheircustomers.Dataminingtechnologycanbeutilizedtobetter
understand customers’ needs and desires. Analysis o marketing campaigns
provides in-depth eedback and serves as the oundation o uture campaign
development.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 12/22
10
Data MInIng In the Insurance InDustry
Marketing – Another Frontier of Automation
To know your customers in detail - their needs, desires and responses - is now
possible through the power o data mining methodologies and supporting computer
technologies. Campaign management solutions consist o graphical tools that enable
marketerstoanalyzecustomerdata,analyzepreviousmarketingcampaigns,design
new campaigns, assess the new campaigns prior to implementation, monitor the
campaigns as they are presented and evaluate the eectiveness o the completed
campaigns.
Customer-centric marketing is accomplished by integrating data mining and
campaign management. The integration sets up a cyclical relationship in that data
mining analysts can develop and test customer behavior as required by marketers.
Then, marketers can use the models to predict customer behavior. By assessing
predicted customer behavior, marketing proessionals can urther rene the
marketing campaigns.
Campaign management tools should support the entire direct marketing lie cycle
- analysis, planning, execution and evaluation - not just one or two phases o the
lie cycle. Unortunately, many campaign management products contain separate
toolsembeddedinthesystemforquerying,updating,analyzingandextracting
data.Often,thesesystemsrequiresignicantcustomizationstoaccommodatethe
promotional campaign strategies o individual enterprises. The best solution is one
that integrates the major unctional areas o data access, data warehousing and data
mining, as well as campaign management. I the tools are not explicitly designed to
work together, problems o portability and throughput can arise. In most cases, a
workable piecemeal solution may be ar rom a good overall solution.
Effective Campaign Management
The integration o data access, data warehousing, data mining and campaign
managementtechnologiesenablesmarketingprofessionalstoutilizepre-established
data mining models within the context o their campaign management system.
Marketers are able to select rom a list o such models and apply the model to
selected target subsets identied using the campaign management system. The
scoring code is typically executed on the selected subset within the data mining
product, and the scored le1 is then returned to the marketing proessional, enabling
marketing to urther rene their target-marketing campaigns. This orm o integrationis commonly reerred to as “dynamic scoring” because it refects the real-time
execution o the scoring code.
1 Scored fle reers to a data set in which values or a target variable have been predicted by a model.Scored data sets consist o a set o posterior probabilities or each level o a (noninterval level)variable.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 13/22
Data MInIng In the Insurance InDustry
11
3.5 Detecting Fraudulent Claims
Obviously, raudulent claims are an ever-present problem or insurance rms, and
techniques or identiying and mitigating raud are critical or their long-term success.Quite oten, successul raud detection analyses, such as those rom a data mining
project, can provide a very high return on investment.
Better Fraud Detection Results in Lower Cost of Claims
Fraud is a thriving industry. For example, it is estimated that 10 percent o claims
are raudulent, costing the US property and casualty industry $30 billion per
year. The sheer magnitude o the raud problem implies that rms able to detect
raudulent claims, or better prevent raudulent claims, are in a position to oer more
competitively priced products, reduce costs and maintain long-term protability.
Just Random Chance or Is There a Pattern?
Fraudulent claims are typically not the biggest claims, because perpetrators are
wellawarethatthebigclaimsarescrutinizedmorerigorouslythanaverageclaims.
Perpetrators o raud use more subtle approaches. As a result, when searching
or raudulent claims, analysts must look or unusual associations, anomalies or
outlying patterns in the data. Specic analytical techniques adept at nding such
subtleties are social network link analysis, market basket analysis, cluster analysis
and predictive modeling. For example, AXA OYAK (a Turkish insurer) used SAS to
segment customer data by uncovering certain relationships between data sets,
which are red fags or raud-related losses. Using this technique, AXA OYAK
discovered that 5 percent o its claims payouts were raudulent, which can now be
corrected and prevented in the uture.
Figure 6: A series o cluster analysis diagrams that represent the dierence between
members o a cluster and the overall population.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 14/22
12
Data MInIng In the Insurance InDustry
3.6 Estimating Outstanding Loss Reserves
The settlement o claims is oten subject to delay. For example, in liability insurance,
the severity (magnitude) o a claim may not be known until years ater the claimis reported. In cases o an employer’s liability, there is even a delay involved in
reporting the claim. Such delays may result in non-normal distribution o claims;
specically, skewed distributions and long-tailed distribution across time and across
business classes.
Still the everyday running o the rm must continue, and an estimate o the claim
severity is oten used until the actual value o the settled claim is available. The
estimate can depend on the ollowing:
• Severityoftheclaim.
• Likelyamountoftimebeforesettlement.
• Effectsofnancialvariables,suchasinationandinterestrates.
Predicting Actual Settlement Values
A loss reserve, which is necessary or continued business operations, is developed
by estimating insurance claims. The accuracy o the loss reserve is important
because the unds set aside or paying claims typically cannot be invested in long-
term, higher-yielding assets. I the loss reserve is too small, the rm may experience
nancial problems. Conversely, i the loss reserve is too large, the rm may become
unprotable. Thus, the more accurate the estimation, the greater opportunity or
prot. The analysis o the distribution o claims across customers, geography andtime lead to better estimates o the loss reserve.
Dataminingtechnologycanbeutilizedtoestablishthedistributionofclaimsand
thepatternofpastclaimrunoffs.Thedataisanalyzedandmodeled,andwhen
a predictive model is developed, the current outstanding claims are scored.
Specically, the model parameters and the claims data are used to predict the
magnitude o the actual settlement value o the outstanding claims. This estimate o
the actual settlement value can be used to develop a loss reserve.
The estimate o the loss reserve generated rom a predictive model is based on the
assumption that the uture will be much like the past. I the model is not updatedover time, then the assumption becomes that the uture will be much like the distant
past. However, as more data becomes available, the predictive data mining model
can be updated and the assumption becomes that the uture will be much like the
recent past.
Data mining technology enables insurance analysts to compare models and
to assess them based on their perormance. When the newly updated model
outperformstheoldmodel,itistimetoswitchtothenewmodel.Giventhenew
technologies, analysts can now monitor predictive models and update as needed.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 15/22
Data MInIng In the Insurance InDustry
13
4. Implementing Data Mining Projects
Much has been written about the best way to implement data mining projects.
A common message ound in many o these works is that implementing a data
mining project must take into account real-world, practical challenges. A data-centric
approach is especially eective and can be divided into the ollowing unctional areas:
• Accessthedata.
• Warehousethedata.
• Analyzethedata.
• Reporttheresultsoftheanalysis.
• Exploittheresultsforbusinessadvantage.
4.1 Accessing the Data
Reliable, accurate data is a prerequisite or data mining. A complete data access
strategy should include the ollowing key elements:
• Accesstoanyoralltypesofdatasources.
• Accesstodatasourcesregardlessoftheplatformonwhichtheyreside.
• Preservationofthesourcedatathroughtheuseofsecurityroutines.
• Aneasy-to-useGUIthatdoesnotrequireextensiveknowledgeofeachdatatype
and provides the fexibility to meet the specic needs.
• Integrationwiththeexistingtechnologyratherthanaccessroutinesthatrequire
retooling o hardware and/or sotware or extensive, additional learning by users.
A properly designed and implemented data warehouse can help accomplish these
key elements o a data access strategy.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 16/22
14
Data MInIng In the Insurance InDustry
4.2 Warehousing the Data
A data warehouse enables insurance industry researchers to easily access data that
might be stored in various data tables or data sets across a wide variety o platorms. Through data warehouses, research analysts can merge and aggregate data into
subject areas. However, prior to analysis, data that contains errors, missing values
or other problems needs to be cleaned. One approach to cleaning data is to simply
delete cases that contain missing values. However, the results are biased because
the deleted data may have otherwise been an important part o various relationships.
Major data cleaning tasks - such as making variable names consistent, imputing
missing values, identiying errors, correcting errors and detecting outliers - can be
perormed relatively easily using data mining technology.
Gooddatawarehousingtoolscanvastlyimprovetheproductivityofthedatamining
team. Important results are obtained aster and oten at much lower cost.
4.3 Analyzing Data Using the SEMMA Methodology
Even ater data is merged and aggregated into subject areas, highly complex
relationships oten are not easily seen by visually inspecting the raw data or
by applying basic statistical analyses. Actual patterns may appear random or
goundetected.Additionally,linearmodelsmaypoorlycharacterizenonlinear
relationships. Modern data mining technology can overcome these limitations by
employing approaches such as the ollowing:
• SophisticatedGUIdataexplorationandplottingtoolstobetterdisplay
relationships among variables.
• Variableselectionmethodologiestoidentifythemostimportantvariablesto
include in models.
• Advancedmodelingtechniques,suchaslinearmodelswithinteractions.
• Nonlinearneuralnetworksandtreemodels.
• Assessmenttechniquestoassistanalystsinselectingthebestperformingmodel
based on prot and loss criteria.
Onceaccessed,thedatacanbeexploredusingGUIsthatutilizesophisticateddata
mining algorithms. For example, subsetting data can reveal important relationships
or marketing campaigns. Disaggregating along region and rm might reveal costly
anomalies o operations. Drilling down into the data might reveal missed prot
opportunities.
The actual data analyses or data mining projects involve selecting, exploring and
modeling large amounts o data to uncover hidden inormation that can then be used
or business advantage. The answer to one question oten leads to new and more
specic questions. Hence, data mining is an iterative process and the data mining
methodology should incorporate this iterative, exploratory approach.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 17/22
Data MInIng In the Insurance InDustry
15
To provide a predictable yet fexible path or the data mining analysis to ollow, SAS
developed a data mining analysis cycle known as SEMMA. This acronym stands
or the ve analysis steps that are ordinarily part o a data mining project. Those ve
steps are:
• Sample.
• Explore.
• Modify.
• Model.
• Assess.
Beginning with a representative sample, the SEMMA analysis cycle guides analysts
through the process o exploring data using visual and statistical techniques,
transorming data to uncover the most signicant predictive variables, modeling thevariables to predict outcomes and assessing the model by testing it with new data.
Thus, the SEMMA analysis cycle is a modern extension o the scientic method.
4.3.1 Sample
The rst step in a data mining analysis methodology is to create one or more data
tables by sampling data rom the data warehouse. The samples should be big
enough to contain the signicant inormation, yet small enough to process quickly.
This approach enables the most cost-eective perormance by using a reliable,
statistically representative sample o the entire database. Mining a representative
sample instead o the whole volume drastically reduces the processing time required
to get crucial business inormation.
I general patterns appear in the data as a whole, these will be traceable in a
representative sample. I a niche is so tiny that it is not represented in a sample
and yet so important that it infuences the big picture, it can be discovered using
summary methods.
4.3.2 Explore
Ater sampling the data, the next step is to explore them visually or numericallyor inherent trends or groupings. Exploration helps rene the discovery process.
I visual exploration does not reveal clear trends, analysts can explore the data
through statistical techniques, including actor analysis, correspondence analysis and
clustering. For example, new parents are oten more acutely aware o their need or
lie insurance, but may be seeking the most insurance or the least amount o money.
This group may be more likely to respond to direct mailings or term insurance.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 18/22
16
Data MInIng In the Insurance InDustry
4.3.3 Modify
Modiying the data reers to creating, selecting and transorming one or more
variables to ocus the model selection process in a particular direction or to augmentthe data or clarity or consistence.
Based on the discoveries in the exploration phase, analysts may need to modiy the
data to include inormation (e.g., grouping customers and signicant subgroups) or
to introduce new variables, such as a ratio obtained by comparing two previously
dened variables. Analysts may also need to look or outliers and reduce the number
o variables to narrow them down to the most signicant ones. In addition, because
data mining is a dynamic, iterative process, there oten is a need to modiy data when
the previously mined data changes in some way.
4.3.4 Model
Creating a data model involves using the data mining sotware to search
automatically or a combination o data that reliably predicts a desired outcome.
Ater the data has been accessed and modied, analysts can use data modeling
techniques to construct models that explain patterns in the data. Modeling
techniques in data mining include neural networks, tree-based models,
logistic models and other statistical models, such as time series analysis and
survival analysis.
Figure 7: A section o a model that predicts automobile claims using a neural network, decision tree and logistic regression model.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 19/22
Data MInIng In the Insurance InDustry
17
Each type o model has particular strengths, and is appropriate within specic data
mining situations depending on the data. For example, neural networks are good at
combining inormation rom many predictors without overtting, and thereore work
well when many o the predictors are partially redundant.
4.3.5 Assess
The next step in data mining is to assess the model to determine how well it
perorms. A common means o assessing a model is to apply it to a portion o the
data that was set aside during the sampling stage. I the model is valid, it should
work or this reserved sample as well as or the sample used to construct the model.
Figure 8: This is a ROC curve used to compare three claims models. The wider the
bow in the curve, the more accurate is the model. Curves such as this make model
comparison a simple matter o visual inspection.
Similarly, analysts can test the model against known data. For example, i one knows
which customers in a le had high retention rates and the model predicts retention,
analysts can check to see whether the model selects these customers accurately. In
addition, practical applications o the model, such as partial mailings in a direct mailcampaign, help prove its validity.
Although assessing the data models is the last step in the SEMMA methodology,
assessing the eectiveness o data models is oten not the nal step in an actual
implementation o SEMMA. Because SEMMA is a cycle, the internal steps are oten
perormed iteratively as needed within a particular data mining project.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 20/22
18
Data MInIng In the Insurance InDustry
4.4 Reporting the Results
Enterprise reporting capabilities are essential to making data useul by enabling users
to create and publish reports rom data warehouses and other inormation sources.Key eatures o a reporting system should include:
• Completesystemintegration.
• Simpliedwarehousereporting.
• Easeofusethroughgraphicaluserinterfaces.
Modern reporting tools, such as those ound in SAS Enterprise Miner, enable
business users to create, publish and print richly ormatted reports rom inormation
stored in their data warehouse. Through easy-to-use interaces, users have the ability
to create graphs, tables, charts and text within a single report rom their desktops.
4.5 Exploiting the Results for Business Advantage
The new inormation obtained rom data mining can be incorporated into an
executive inormation or online analytical processing and reporting system, and then
disseminatedasneededthroughouttheorganization.Therm’sdecisionmakerscan
use the data mining results to answer important business-related questions such as,
“How can we increase the ROI o our marketing campaigns?” or strategic planning
and action. By exploiting data mining results in this manner, rms can better prepare
or long-term growth and improve their opportunities or long-term prosperity.
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 21/22
Data MInIng In the Insurance InDustry
19
5. Summary
Insuranceisadata-richindustry;unfortunately,mostofthatdataisunderutilized.Thekeytogainingacompetitiveadvantageintheinsuranceindustryisfoundinanalyzing
this data and getting a greater insight into their business. Insurance rms can unlock
the intelligence contained in their operational applications - like policy administration,
claims management and CRM solutions - through modern data mining technology.
Data mining uses predictive modeling, database segmentation, cluster analysis,
neural networks and combinations thereo to quickly answer crucial business
questionswithgreateraccuracy.Newproductscanbedevelopedandmarketing
strategies can be implemented, enabling the insurance rm to transorm a wealth o
inormation into a wealth o predictability, stability and prots.
References
The Business Case for Data Mining in the Insurance Industry: Using Enterprise Miner
to Model Pure Premium and Establish Policy Rating Structures
SanfordGayle,SASInstituteInc.,Cary,NC(1999)
8/3/2019 Analytics for the Insurance Industry_SAS
http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 22/22
SAS Institute Inc. World Headquarters +1 919 677 8000 To contact your local SAS ofce, please visit: www..om/of
SAS and all other SAS Institute Inc product or service names are registered trademarks or trademarks o SAS Institute Inc in the USA and other countries ® indicates USA registration