Analytics for the Insurance Industry_SAS

22
WHITE PAPER D Mii i I Id Solving Business Problems Using SAS ® Enterprise Miner Software

Transcript of Analytics for the Insurance Industry_SAS

Page 1: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 1/22

WHITE PAPER

D Mii i I IdSolving Business Problems Using SAS® Enterprise Miner™ Software

Page 2: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 2/22

i

Data MInIng In the Insurance InDustry 

 Table of Contents

1. Introduction...............................................................................................12. Opportunities and Challenges for Insurance Firms .................................1

3. Using Data Mining in the Insurance Industry ..........................................3

3.1 Products and Pricing Optimization ........................................................3

3.2 Acquiring New Customers .....................................................................6

3.3 Retaining Existing Customers ................................................................8

3.4 Performing Sophisticated Campaign Management ...............................9

3.5 Detecting Fraudulent Claims ...............................................................11

3.6 Estimating Outstanding Loss Reserves ...............................................12

4. Implementing Data Mining Projects .........................................................13

4.1 Accessing the Data ..............................................................................13

4.2 Warehousing the Data..........................................................................14

4.3 Analyzing Data Using the SEMMA Methodology ..................................14

4.3.1 Sample ................................................................................................15

4.3.2 Explore ................................................................................................15

4.3.3 Modiy ..................................................................................................16

4.3.4 Model ..................................................................................................16

4.3.5 Assess .................................................................................................17

4.4 Reporting the Results ..........................................................................18

4.5 Exploiting the Results for Business Advantage....................................18

5. Summary .................................................................................................19

References .....................................................................................................19

Page 3: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 3/22

Data MInIng In the Insurance InDustry 

1. Introduction

Data mining can be dened as the process o selecting, exploring and modeling largeamounts o data to uncover previously unknown patterns. In the insurance industry,

data mining can help rms gain business advantage. For example, by applying

data mining techniques, companies can ully exploit data about customers’ buying

patterns and behavior – as well as gaining a greater understanding o their business

to help reduce raud, improve underwriting and enhance risk management.

 This paper discusses how insurance companies can benet by using modern data

mining methodologies and thereby reduce costs, increase prots, acquire new

customers, retain current customers and develop new products.

2. Opportunities and Challenges for Insurance Firms

It is likely that the impact o the recent nancial crisis will have a dramatic eect

on the insurance industry. Insurance companies will look at improving operational

ineciencies to combat a global sot market, plus the industry can expect aggressive

regulatory activities as a result o the crisis. This will include a renewed push or an

Optional Federal Charter, designed to regulate the US insurance industry at a ederal

level, plus the impending Solvency II legislation that will aect European insurers.

Other opportunities and challenges facing insurers include:

• Changesininformationtechnology.

• Globalizationandmarketconsolidation.

• Multichanneldistribution.

Changes in Information Technology

 As with other industries, the insurance industry has experienced many changes

in inormation technology over the years. Advances in hardware, sotware andtelecommunications have oered benets, such as reduced costs and real-time

processing , resulting in increased potential or prot. These advances have also

resulted in new challenges, particularly in the area o increased competition.

1

Page 4: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 4/22

Data MInIng In the Insurance InDustry 

 Technological innovations, such as data mining and data warehousing, have greatly

reduced the cost o storing, accessing and processing data. Business questions that

were previously impossible, impractical or unprotable to address due to the lack

o availability or reliability o data can now be answered using data mining solutions.

For example, a common business question is, “How can insurance rms retain their

best customers?” Through data mining technology, insurance rms can tailor rates

and services to meet the customer’s needs, and over time, more accurately correlate

rates to the customer behaviors that increase exposure.

Globalization and Market Consolidation

 TheemergenceoftheinsurancemarketsinEasternEurope,India,Brazilandother

developingcountriesprovidestheopportunityformatureinsuranceorganizationsto

not only diversiy geographically, but also identiy new target markets or products

and services. However, one o the biggest challenges in expanding into new globalmarkets is navigating each country’s distinct regulatory environment.

 The insurance industry has been in a consolidation mode or the past several years

and this is not expected to change in the near uture; in act, we may begin to see

some large acquisitions and the emergence o megacarriers.

Multichannel Distribution

 Technology is changing the traditional insurance selling and distribution model.

Insurance companies are beginning to implement multichannel integration strategies

as people are buying insurance via the Internet and insurance aggregators.

 To either survive or thrive, insurance rms will need sophisticated data warehousing,

data mining and reporting sotware that enables “deep dives” into their customer and

distribution data, allowing them to answer questions such as:

• “Whenisanagentmostlikelytoleave?”

• “Whenandhowaremysaleschannelsmosteffective?”

2

Page 5: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 5/22

Data MInIng In the Insurance InDustry 

3. Using Data Mining in the Insurance Industry 

Data mining methodology oten can improve upon traditional statistical approachesto solving business solutions. For example, linear regression may be used to solve

a problem because insurance industry regulators require easily interpretable models

and model parameters. Data mining oten can improve existing models by nding

additional, important variables, identiying interaction terms and detecting nonlinear

relationships. Models that predict relationships and behaviors more accurately lead

to greater prots and reduced costs.

Specically, data mining can help insurance rms in business practices such as:

• Optimizingproductsandpricing.

• Acquiringnewcustomers.

• Retainingexistingcustomers.

• Performingsophisticatedcampaignmanagement.

• Detectingfraudulentclaims.

• Estimatingoutstandinglossreserve.

3.1 Products and Pricing Optimization

 As a result o changing demographics, economic actors and customer buying

habits, it is critical that insurance companies identiy and monitor the varying needs

o their customers and adjust their product portolio. Problems with protability can

occur i rms do not oer the right policy, rate or customer segment at the right time.

For example, the most protable customer segment might be higher-risk customers,

which may command higher rates.

 An important problem in actuarial science concerns rate setting or the pricing o 

each policy. The goal is to set rates that refect the risk level o the policyholder by

establishing the break-even rate (premium) or the policy. The lower the risk, the

lower the rate.

3

Page 6: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 6/22

Data MInIng In the Insurance InDustry 

Identify Risk Factors that Predict Profits, Claims and Losses

 The critical question in rate making is the ollowing: “What are the risk actors or

variablesthatareimportantforpredictingthelikelihoodofclaimsandthesize

o a claim?” Although many risk actors that aect rates are obvious, subtle and

nonintuitive relationships can exist among variables that are dicult, i not impossible,

to identiy without applying more sophisticated analyses. Modern data mining models

can more accurately predict risk, thereore insurance companies can set rates more

accurately, which in turn results in lower costs and greater prots. For example, in

the automobile insurance industry, companies are beginning to use telematic devices

that take into consideration the driver’s driving habits, as well as other actors such

as environmental and weather-related risk inormation.

Creating Geographic Exposure Reports

Insurance rms also can augment their business and demographic databases with

sociogeographic data, which is also reerred to as spatial attribute data or latitude/ 

longitude data. The reason or augmenting existing data with socio¬geographic data

is that the social prole, including geographic location o potential customers, can

be an important risk actor in the rate-making model. For example, driving conditions

and the likelihood o accidents and auto thets vary across geographic regions.

Dierences in risk actors indicate dierences in the likelihood o claims, expected

claim amounts, and ultimately, rates.

Including purely geographic data in a data warehouse enables the insurance rm to

create digital maps. Business analysts can overlay (or map) the data, then assess

and monitor exposure by geographic region. Such data processing and data

mapping capabilities are not merely or the purpose o plotting the geographic data

or display. Instead, the data also can be included in rate making and other analytical

models. I an area o overexposure is identied, then the risk can be mitigated,

possibly by rate adjustment or reinsurance.

4

Page 7: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 7/22

Data MInIng In the Insurance InDustry 

Figure 1: A geospatial mapping o claims data

Improve Predictive Accuracy – Segment Databases

 To improve predictive accuracy, databases can be segmented into more

homogeneousgroups.Thenthedataofeachgroupcanbeexplored,analyzedand

modeled. Depending on the business question, segmentation can be done using

variables associated with risk actors, prots or behaviors. Segments based on these

types o variables oten provide sharp contrasts, which can be interpreted more

easily. As a result, actuaries can more accurately predict the likelihood o a claim and

the amount o the claim.

For example, one insurance company ound that a segment o the 20- to 25-year-

old male drivers had a noticeably lower accident rate than the entire group o 20- to

25-year-old males. What variable did this subgroup share that could explain the

dierence? Investigation o the data revealed that the members o the lower risk

subgroup drove cars that were signicantly older than the average and that the

driversoftheoldercarsspenttimecustomizingtheir“vintageautos.”Asaresult,

membersofthesubgroupwerelikelytobemorecautiousdrivingtheircustomized

automobiles than others in their age group.

5

Page 8: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 8/22

Data MInIng In the Insurance InDustry 

Figure 2: A data exploration diagram or the distribution o educational degrees. The

 shaded area represents the amount o claims fled. Notice that the higher the education

 level, the lower the percentage o claims. Claimants with a PhD had the lowest rate, while

those without high school diplomas had the highest rate.

3.2 Acquiring New Customers

 Another important business problem is the acquisition o new customers. Although

traditional approaches involve attempts to increase the customer base by simply

expanding the eorts o the sales department, sales eorts that are guided by more

quantitative data mining approaches can lead to more ocused and more successul

results.

Focusing Marketing Strategy to Reach Targeted Group

 A traditional sales approach is to increase the number o policyholders by simply

targeting those who meet certain policy constraints (illustrated in Figure 3). A

drawback to this approach is that much o the marketing eort may yield little return. At some point, sales become more dicult and greater marketing budgets lead to

lower and lower returns.

6

Page 9: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 9/22

Data MInIng In the Insurance InDustry 

7

Figure 3: Marketing to all those meeting policy criteria

Increasing a Marketing Campaign’s Return on Investment

In contrast to the traditional sales approach, data mining strategies enable analysts

torenethemarketingfocus.Forexample,thefocuscouldberenedbymaximizing

the lietime value o policyholders; that is, the prots expected rom policyholders

over an extended period o time. Thus as Figure 4 illustrates, the crucial marketingquestion becomes, “Who o those meeting the criteria are most likely to actually

purchase a policy?”

Figure 4: Marketing to those most likely to purchase

Because only the segmented group o those likely to purchase is targeted, the return

per unit o marketing eort is greater.

Can even better results be obtained? In other words, as more data is collected,

can better models be developed and can marketing eorts be ocused urther?

 Tosharpenthefocus,analystsintheinsuranceindustrycanutilizeadvanceddata

mining techniques that combine segmentations to group (or prole) the high lietime-

value customer and produce predictive models to identiy those in this group who are

likely to respond.

For example, perhaps the rst group or the marketing campaign is made up o those who meet the policy criteria, are likely to purchase and are likely to remain

loyal by not switching to another company (as illustrated in Figure 5). Segmenting

the universe o potential customers to ocus on specic groups can make marketing

campaigns more ecient and urther increase the return per unit o marketing eort.

Existing andpotential customers

Those meeting

policy criteria

 

Existing andpotential customers

Those meetingpolicy criteria

Prole of those mostlikely to purchase

 

Page 10: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 10/22

8

Data MInIng In the Insurance InDustry 

Figure 5: Marketing to those most likely to retain policies

Companies can increase response rates and protability by rst targeting those

prospects that have characteristics similar to those o the high lietime-value

customers. Moreover, additional data mining work could be perormed to identiy

the best time o day, the best season and best media or marketing to the

targeted group.

3.3 Retaining Existing Customers

 As acquisition costs increase, insurance companies are beginning to place a greater

emphasis on customer retention programs. Experience shows that a customer

holding two policies with the same company is much more likely to renew than is

a customer holding a single policy. Similarly, a customer holding three policies is

less likely to switch than a customer holding less than three. By oering quantity

discounts and selling bundled packages to customers, such as home and auto

policies, a rm adds value and thereby increases customer loyalty, reducing the

likelihood the customer will switch to a rival rm.

 Analyze at the Customer Level

Successfullyretainingcustomersrequiresanalyzingdataatthemostappropriate

level, the customer level. Unortunately, the insurance industry tends to be a laggard

andinmostcasescontinuestoanalyzeprotabilityatapolicylevel.

Using a data mining technique called association analysis, insurance rms can more

accurately select which policies and services to oer to which customers. With

this technique, insurance companies can perorm sequential (over time) market

basket analyses on customer segments. For example, what percentage o newauto insurance policyholders also purchases a homeowners insurance policy within

ve years?

 

Existing andpotential customers

Those meetingpolicy criteria

Prole of those mostlikely to purchase

Prole of those mostlikely to remain loyal(for example, retain policyfor more than 10 years)

Page 11: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 11/22

Data MInIng In the Insurance InDustry 

9

 Aim Retention Campaigns at Those Most Likely to Switch Firms

Database segmentation and more advanced modeling techniques enable analysts

to more accurately choose whom to target or retention campaigns. Current

policyholders that are likely to switch can be identied through predictive modeling.

 A logistic regression model is a traditional approach, and those policyholders who

have larger predicted probabilities o switching are the target group.

Identiying the target group may be improved by modeling the behavior o 

policyholders. By including nonlinear terms and more interaction terms, neural

network models can generate more accurate data on the probability o policyholders

switching. Additionally, decision tree models may provide more accurate identication

bydividing(segmenting)thepolicyholdersintomorehomogeneousgroups.Greater

accuracy in identiying the target group can reduce costs and has the potential or

greatly improving the results o a retention campaign.

3.4 Performing Sophisticated Campaign Management

Developing a customer relationship has a long-standing tradition in business.

Small rms and many retailers are able to relate to their customers individually.

However,asorganizationsgrowlarger,marketingdepartmentsoftenbegintothink

in terms o product development instead o customer relationship development and

maintenance. It is not unusual or the sales and marketing units to ocus on how

ast the rm can bring a mass-appeal product to market rather than how they might

better serve the needs o the individual customer.

Ultimately, the diculty is that as markets become saturated, the eectiveness

o mass marketing slows or halts completely. Fortunately, advanced data mining

technology enables insurance companies to return their marketing ocus to the

aspects o individual customer loyalty. Creative, data-driven, scientic marketing

strategies are now paving the way back to the customer relationship management o 

simpler, ecient economies, while on a much grander, comprehensive scale.

 A Customer-Centric Focus

Many leading insurance companies are making an eort to move away rom the

product-oriented architectures o the past and toward a customer-centric ocustobetterservetheircustomers.Dataminingtechnologycanbeutilizedtobetter

understand customers’ needs and desires. Analysis o marketing campaigns

provides in-depth eedback and serves as the oundation o uture campaign

development.

Page 12: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 12/22

10

Data MInIng In the Insurance InDustry 

Marketing – Another Frontier of Automation

 To know your customers in detail - their needs, desires and responses - is now

possible through the power o data mining methodologies and supporting computer

technologies. Campaign management solutions consist o graphical tools that enable

marketerstoanalyzecustomerdata,analyzepreviousmarketingcampaigns,design

new campaigns, assess the new campaigns prior to implementation, monitor the

campaigns as they are presented and evaluate the eectiveness o the completed

campaigns.

Customer-centric marketing is accomplished by integrating data mining and

campaign management. The integration sets up a cyclical relationship in that data

mining analysts can develop and test customer behavior as required by marketers.

 Then, marketers can use the models to predict customer behavior. By assessing

predicted customer behavior, marketing proessionals can urther rene the

marketing campaigns.

Campaign management tools should support the entire direct marketing lie cycle

- analysis, planning, execution and evaluation - not just one or two phases o the

lie cycle. Unortunately, many campaign management products contain separate

toolsembeddedinthesystemforquerying,updating,analyzingandextracting

data.Often,thesesystemsrequiresignicantcustomizationstoaccommodatethe

promotional campaign strategies o individual enterprises. The best solution is one

that integrates the major unctional areas o data access, data warehousing and data

mining, as well as campaign management. I the tools are not explicitly designed to

work together, problems o portability and throughput can arise. In most cases, a

workable piecemeal solution may be ar rom a good overall solution.

Effective Campaign Management

 The integration o data access, data warehousing, data mining and campaign

managementtechnologiesenablesmarketingprofessionalstoutilizepre-established

data mining models within the context o their campaign management system.

Marketers are able to select rom a list o such models and apply the model to

selected target subsets identied using the campaign management system. The

scoring code is typically executed on the selected subset within the data mining

product, and the scored le1 is then returned to the marketing proessional, enabling

marketing to urther rene their target-marketing campaigns. This orm o integrationis commonly reerred to as “dynamic scoring” because it refects the real-time

execution o the scoring code.

1 Scored fle reers to a data set in which values or a target variable have been predicted by a model.Scored data sets consist o a set o posterior probabilities or each level o a (noninterval level)variable.

Page 13: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 13/22

Data MInIng In the Insurance InDustry 

11

3.5 Detecting Fraudulent Claims

Obviously, raudulent claims are an ever-present problem or insurance rms, and

techniques or identiying and mitigating raud are critical or their long-term success.Quite oten, successul raud detection analyses, such as those rom a data mining

project, can provide a very high return on investment.

Better Fraud Detection Results in Lower Cost of Claims

Fraud is a thriving industry. For example, it is estimated that 10 percent o claims

are raudulent, costing the US property and casualty industry $30 billion per

year. The sheer magnitude o the raud problem implies that rms able to detect

raudulent claims, or better prevent raudulent claims, are in a position to oer more

competitively priced products, reduce costs and maintain long-term protability.

Just Random Chance or Is There a Pattern?

Fraudulent claims are typically not the biggest claims, because perpetrators are

wellawarethatthebigclaimsarescrutinizedmorerigorouslythanaverageclaims.

Perpetrators o raud use more subtle approaches. As a result, when searching

or raudulent claims, analysts must look or unusual associations, anomalies or

outlying patterns in the data. Specic analytical techniques adept at nding such

subtleties are social network link analysis, market basket analysis, cluster analysis

and predictive modeling. For example, AXA OYAK (a Turkish insurer) used SAS to

segment customer data by uncovering certain relationships between data sets,

which are red fags or raud-related losses. Using this technique, AXA OYAK

discovered that 5 percent o its claims payouts were raudulent, which can now be

corrected and prevented in the uture.

Figure 6: A series o cluster analysis diagrams that represent the dierence between

 members o a cluster and the overall population.

Page 14: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 14/22

12

Data MInIng In the Insurance InDustry 

3.6 Estimating Outstanding Loss Reserves

 The settlement o claims is oten subject to delay. For example, in liability insurance,

the severity (magnitude) o a claim may not be known until years ater the claimis reported. In cases o an employer’s liability, there is even a delay involved in

reporting the claim. Such delays may result in non-normal distribution o claims;

specically, skewed distributions and long-tailed distribution across time and across

business classes.

Still the everyday running o the rm must continue, and an estimate o the claim

severity is oten used until the actual value o the settled claim is available. The

estimate can depend on the ollowing:

• Severityoftheclaim.

• Likelyamountoftimebeforesettlement.

• Effectsofnancialvariables,suchasinationandinterestrates.

Predicting Actual Settlement Values

 A loss reserve, which is necessary or continued business operations, is developed

by estimating insurance claims. The accuracy o the loss reserve is important

because the unds set aside or paying claims typically cannot be invested in long-

term, higher-yielding assets. I the loss reserve is too small, the rm may experience

nancial problems. Conversely, i the loss reserve is too large, the rm may become

unprotable. Thus, the more accurate the estimation, the greater opportunity or

prot. The analysis o the distribution o claims across customers, geography andtime lead to better estimates o the loss reserve.

Dataminingtechnologycanbeutilizedtoestablishthedistributionofclaimsand

thepatternofpastclaimrunoffs.Thedataisanalyzedandmodeled,andwhen

a predictive model is developed, the current outstanding claims are scored.

Specically, the model parameters and the claims data are used to predict the

magnitude o the actual settlement value o the outstanding claims. This estimate o 

the actual settlement value can be used to develop a loss reserve.

 The estimate o the loss reserve generated rom a predictive model is based on the

assumption that the uture will be much like the past. I the model is not updatedover time, then the assumption becomes that the uture will be much like the distant

past. However, as more data becomes available, the predictive data mining model

can be updated and the assumption becomes that the uture will be much like the

recent past.

Data mining technology enables insurance analysts to compare models and

to assess them based on their perormance. When the newly updated model

outperformstheoldmodel,itistimetoswitchtothenewmodel.Giventhenew

technologies, analysts can now monitor predictive models and update as needed.

Page 15: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 15/22

Data MInIng In the Insurance InDustry 

13

4. Implementing Data Mining Projects

Much has been written about the best way to implement data mining projects.

 A common message ound in many o these works is that implementing a data

mining project must take into account real-world, practical challenges. A data-centric

approach is especially eective and can be divided into the ollowing unctional areas:

• Accessthedata.

• Warehousethedata.

• Analyzethedata.

• Reporttheresultsoftheanalysis.

• Exploittheresultsforbusinessadvantage.

4.1 Accessing the Data

Reliable, accurate data is a prerequisite or data mining. A complete data access

strategy should include the ollowing key elements:

• Accesstoanyoralltypesofdatasources.

• Accesstodatasourcesregardlessoftheplatformonwhichtheyreside.

• Preservationofthesourcedatathroughtheuseofsecurityroutines.

• Aneasy-to-useGUIthatdoesnotrequireextensiveknowledgeofeachdatatype

and provides the fexibility to meet the specic needs.

• Integrationwiththeexistingtechnologyratherthanaccessroutinesthatrequire

retooling o hardware and/or sotware or extensive, additional learning by users.

 A properly designed and implemented data warehouse can help accomplish these

key elements o a data access strategy.

Page 16: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 16/22

14

Data MInIng In the Insurance InDustry 

4.2 Warehousing the Data

 A data warehouse enables insurance industry researchers to easily access data that

might be stored in various data tables or data sets across a wide variety o platorms. Through data warehouses, research analysts can merge and aggregate data into

subject areas. However, prior to analysis, data that contains errors, missing values

or other problems needs to be cleaned. One approach to cleaning data is to simply

delete cases that contain missing values. However, the results are biased because

the deleted data may have otherwise been an important part o various relationships.

Major data cleaning tasks - such as making variable names consistent, imputing

missing values, identiying errors, correcting errors and detecting outliers - can be

perormed relatively easily using data mining technology.

Gooddatawarehousingtoolscanvastlyimprovetheproductivityofthedatamining

team. Important results are obtained aster and oten at much lower cost.

4.3 Analyzing Data Using the SEMMA Methodology

Even ater data is merged and aggregated into subject areas, highly complex

relationships oten are not easily seen by visually inspecting the raw data or

by applying basic statistical analyses. Actual patterns may appear random or

goundetected.Additionally,linearmodelsmaypoorlycharacterizenonlinear

relationships. Modern data mining technology can overcome these limitations by

employing approaches such as the ollowing:

• SophisticatedGUIdataexplorationandplottingtoolstobetterdisplay

relationships among variables.

• Variableselectionmethodologiestoidentifythemostimportantvariablesto

include in models.

• Advancedmodelingtechniques,suchaslinearmodelswithinteractions.

• Nonlinearneuralnetworksandtreemodels.

• Assessmenttechniquestoassistanalystsinselectingthebestperformingmodel

based on prot and loss criteria.

Onceaccessed,thedatacanbeexploredusingGUIsthatutilizesophisticateddata

mining algorithms. For example, subsetting data can reveal important relationships

or marketing campaigns. Disaggregating along region and rm might reveal costly

anomalies o operations. Drilling down into the data might reveal missed prot

opportunities.

 The actual data analyses or data mining projects involve selecting, exploring and

modeling large amounts o data to uncover hidden inormation that can then be used

or business advantage. The answer to one question oten leads to new and more

specic questions. Hence, data mining is an iterative process and the data mining

methodology should incorporate this iterative, exploratory approach.

Page 17: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 17/22

Data MInIng In the Insurance InDustry 

15

 To provide a predictable yet fexible path or the data mining analysis to ollow, SAS

developed a data mining analysis cycle known as SEMMA. This acronym stands

or the ve analysis steps that are ordinarily part o a data mining project. Those ve

steps are:

• Sample.

• Explore.

• Modify.

• Model.

• Assess.

Beginning with a representative sample, the SEMMA analysis cycle guides analysts

through the process o exploring data using visual and statistical techniques,

transorming data to uncover the most signicant predictive variables, modeling thevariables to predict outcomes and assessing the model by testing it with new data.

 Thus, the SEMMA analysis cycle is a modern extension o the scientic method.

4.3.1 Sample

 The rst step in a data mining analysis methodology is to create one or more data

tables by sampling data rom the data warehouse. The samples should be big

enough to contain the signicant inormation, yet small enough to process quickly.

 This approach enables the most cost-eective perormance by using a reliable,

statistically representative sample o the entire database. Mining a representative

sample instead o the whole volume drastically reduces the processing time required

to get crucial business inormation.

I general patterns appear in the data as a whole, these will be traceable in a

representative sample. I a niche is so tiny that it is not represented in a sample

and yet so important that it infuences the big picture, it can be discovered using

summary methods.

4.3.2 Explore

 Ater sampling the data, the next step is to explore them visually or numericallyor inherent trends or groupings. Exploration helps rene the discovery process.

I visual exploration does not reveal clear trends, analysts can explore the data

through statistical techniques, including actor analysis, correspondence analysis and

clustering. For example, new parents are oten more acutely aware o their need or

lie insurance, but may be seeking the most insurance or the least amount o money.

 This group may be more likely to respond to direct mailings or term insurance.

Page 18: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 18/22

16

Data MInIng In the Insurance InDustry 

4.3.3 Modify

Modiying the data reers to creating, selecting and transorming one or more

variables to ocus the model selection process in a particular direction or to augmentthe data or clarity or consistence.

Based on the discoveries in the exploration phase, analysts may need to modiy the

data to include inormation (e.g., grouping customers and signicant subgroups) or

to introduce new variables, such as a ratio obtained by comparing two previously

dened variables. Analysts may also need to look or outliers and reduce the number

o variables to narrow them down to the most signicant ones. In addition, because

data mining is a dynamic, iterative process, there oten is a need to modiy data when

the previously mined data changes in some way.

4.3.4 Model

Creating a data model involves using the data mining sotware to search

automatically or a combination o data that reliably predicts a desired outcome.

 Ater the data has been accessed and modied, analysts can use data modeling

techniques to construct models that explain patterns in the data. Modeling

techniques in data mining include neural networks, tree-based models,

logistic models and other statistical models, such as time series analysis and

survival analysis.

Figure 7: A section o a model that predicts automobile claims using a neural network, decision tree and logistic regression model.

Page 19: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 19/22

Data MInIng In the Insurance InDustry 

17

Each type o model has particular strengths, and is appropriate within specic data

mining situations depending on the data. For example, neural networks are good at

combining inormation rom many predictors without overtting, and thereore work

well when many o the predictors are partially redundant.

4.3.5 Assess

 The next step in data mining is to assess the model to determine how well it

perorms. A common means o assessing a model is to apply it to a portion o the

data that was set aside during the sampling stage. I the model is valid, it should

work or this reserved sample as well as or the sample used to construct the model.

Figure 8: This is a ROC curve used to compare three claims models. The wider the

 bow in the curve, the more accurate is the model. Curves such as this make model

comparison a simple matter o visual inspection.

Similarly, analysts can test the model against known data. For example, i one knows

which customers in a le had high retention rates and the model predicts retention,

analysts can check to see whether the model selects these customers accurately. In

addition, practical applications o the model, such as partial mailings in a direct mailcampaign, help prove its validity.

 Although assessing the data models is the last step in the SEMMA methodology,

assessing the eectiveness o data models is oten not the nal step in an actual

implementation o SEMMA. Because SEMMA is a cycle, the internal steps are oten

perormed iteratively as needed within a particular data mining project.

Page 20: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 20/22

18

Data MInIng In the Insurance InDustry 

4.4 Reporting the Results

Enterprise reporting capabilities are essential to making data useul by enabling users

to create and publish reports rom data warehouses and other inormation sources.Key eatures o a reporting system should include:

• Completesystemintegration.

• Simpliedwarehousereporting.

• Easeofusethroughgraphicaluserinterfaces.

Modern reporting tools, such as those ound in SAS Enterprise Miner, enable

business users to create, publish and print richly ormatted reports rom inormation

stored in their data warehouse. Through easy-to-use interaces, users have the ability

to create graphs, tables, charts and text within a single report rom their desktops.

4.5 Exploiting the Results for Business Advantage

 The new inormation obtained rom data mining can be incorporated into an

executive inormation or online analytical processing and reporting system, and then

disseminatedasneededthroughouttheorganization.Therm’sdecisionmakerscan

use the data mining results to answer important business-related questions such as,

“How can we increase the ROI o our marketing campaigns?” or strategic planning

and action. By exploiting data mining results in this manner, rms can better prepare

or long-term growth and improve their opportunities or long-term prosperity.

Page 21: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 21/22

Data MInIng In the Insurance InDustry 

19

5. Summary 

Insuranceisadata-richindustry;unfortunately,mostofthatdataisunderutilized.Thekeytogainingacompetitiveadvantageintheinsuranceindustryisfoundinanalyzing

this data and getting a greater insight into their business. Insurance rms can unlock

the intelligence contained in their operational applications - like policy administration,

claims management and CRM solutions - through modern data mining technology.

Data mining uses predictive modeling, database segmentation, cluster analysis,

neural networks and combinations thereo to quickly answer crucial business

questionswithgreateraccuracy.Newproductscanbedevelopedandmarketing

strategies can be implemented, enabling the insurance rm to transorm a wealth o 

inormation into a wealth o predictability, stability and prots.

References

The Business Case for Data Mining in the Insurance Industry: Using Enterprise Miner 

to Model Pure Premium and Establish Policy Rating Structures

SanfordGayle,SASInstituteInc.,Cary,NC(1999)

Page 22: Analytics for the Insurance Industry_SAS

8/3/2019 Analytics for the Insurance Industry_SAS

http://slidepdf.com/reader/full/analytics-for-the-insurance-industrysas 22/22

SAS Institute Inc. World Headquarters +1 919 677 8000 To contact your local SAS ofce, please visit:  www..om/of 

SAS and all other SAS Institute Inc product or service names are registered trademarks or trademarks o SAS Institute Inc in the USA and other countries ® indicates USA registration