Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by...

19
S.C. INSURANCE COMPANY Project 1 Individual Portion Amanda Fisher 6/4/2014

Transcript of Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by...

Page 1: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

S.C. INSURANCE COMPANY

Project 1 Individual Portion

Amanda Fisher

6/4/2014

Page 2: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

1

Contents

Table of Figures ............................................................................................................................................. 2

Executive Summary ....................................................................................................................................... 3

Introduction .................................................................................................................................................. 4

Simple Regression Analysis ........................................................................................................................... 5

Hypothesis Testing for Model Significance ................................................................................................... 7

Test for Overall Fit ......................................................................................................................................... 8

Residual Tests................................................................................................................................................ 9

Errors are Normally Distributed ................................................................................................................ 9

Errors Have Constant Variance ............................................................................................................... 10

Errors Are Independent .......................................................................................................................... 11

Unusual Data ............................................................................................................................................... 12

Conclusion ................................................................................................................................................... 13

Bibliography ................................................................................................................................................ 14

Appendix A .................................................................................................................................................. 15

Appendix B .................................................................................................................................................. 17

Page 3: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

2

Table of Figures

Figure 1 ......................................................................................................................................................... 6

Figure 2 ......................................................................................................................................................... 7

Figure 3 ......................................................................................................................................................... 7

Figure 4 ......................................................................................................................................................... 8

Figure 5 ......................................................................................................................................................... 9

Figure 6 ....................................................................................................................................................... 10

Figure 7 ....................................................................................................................................................... 10

Figure 8 ....................................................................................................................................................... 11

Figure 9 ....................................................................................................................................................... 12

Page 4: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

3

Executive Summary

This study was conducted for the S.C. Insurance Company to determine if there is a direct

relationship between smoking adults and life expectancy in the United States during the year 2010. It is

believed that S.C. Insurance Company expenses are affected by life expectancies per state, and needs to

be evaluated for marketing campaigns. Statisticians came in to evaluate the data, and software, Mega

stat, was used to provide information that could then be interpreted by the statisticians. Regression

analysis, hypothesis testing, and error evaluation were performed to verify the data.

The correlation for life expectancy in response to smoking adults was found to be a strong

negative correlation of -0.838. This means that the more smoking adults there are, the lower the life

expectancy per state will be. The hypothesis tests were conducted to verify the significance, which all

proved to be significant at 5 and 10 percent levels. Then error evaluation was conducted to ensure that

errors are normally distributed, have constant variance, and are independent of one another. All error

evaluations came back satisfactory, with no extreme outliers.

From the information, it is conclusive that 70 percent of life expectancy is explained by the

variable smoking adults. 30 percent of life expectancy is unexplained. Further testing should be

conducted to determine what additional factors affect life expectancy; examples are alcohol

consumption, hobbies, cancer rates, etc. It is also decided that marketing campaigns will not be focused

on smoking adults.

Page 5: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

4

Introduction

S.C. Insurance Company is a for profit company responsible for providing insurance to

Americans. Recently, S.C. has been trying to increase revenues by targeting specific clients through

marketing campaigns, and wants to forecast the next fiscal year to determine the best course of action

to increase revenues. The forecasting department at S.C. has determined life expectancies have an

impact on yearly expenses, based on data of 2010. A group of statisticians have been brought in to

review the situation; and it was determined that one of the possible explanations is smoking clients

could have a shorter life expectancy.

The statisticians have determined that a correlation regression analysis of the independent

variable (the response variable), smoking adults, will be tested against the dependent variable (the

predictor variable), life expectancy. Based on the Kaiser Family Foundation finding's 1, the average life

expectancy (in years) is determined by each state's own observations (State Health Facts, 2014).The

percentage of adults (18 years or older) that smokes are calculated per state as well (State Health Facts,

2014). The target population for samples in 2010 consists of persons living in households who have

a working cellphone, are aged 18 and older, and received 90 percent or more calls on cellphones.

Samples are chosen randomly for the 50 states from a set area, determined by area code and

response to phone calls (Behavioral Risk Factor Surveillance System, 2013).

The methods used for testing will be simple regression analysis, hypothesis testing for

regression significance and slope significance, normal probability analysis, constant variance, and

independent error analysis. The simple regression analysis will determine the correlation coefficient or

the relationship between life expectancy and smoking adults; hypothesis tests will determine

1 Statistics are taken from the ‘CDC’- the Center for Disease and Control

Page 6: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

5

significance of independent and dependent variables, while overall fit will test the regression for overall

significance, and the three assumptions of normality are evaluated through three separate error tests.

Simple Regression Analysis

Table 1

The simple regression analysis was performed using Microsoft Excel, with the add-in Mega stat,

for the bivariate data. The data set used for all analysis can be found in Appendix A. From this data, the

summary of the information can be located in Table 1. The S.C. Insurance statisticians concluded the

estimated regression equation is: Life Expectancy= 86.2535-38.2715 Smoking Adults. For each increase

in percentage of adult smokers, the average years of life expectancy decrease by 38.2715. The

statisticians predict that if smoking adults increase by 10% it will yield a life expectancy of 82.43 years;

causing health insurance premiums to rise and more expenses for coverage to occur.

The simple regression analysis above provides the intercept and slope variables necessary to

decree the simple regression equation. The fitted equation is ŷ= 86.2535-38.2715x. The error term is not

observable, but the regression assumptions accept that errors are normally distributed, errors have

constant variance, and errors are independent of each other. The slope (b1=-38.272) states that for each

Page 7: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

6

increase in percentage for smoking adults, the average life expectancy decreases by 38.272 years. The

intercept (b0=86.254) suggests that when the adult smoking percentage is zero, the life expectancy

would be higher. For representation of the data, see the scatterplot in Figure 1 below.

Figure 1

The scatterplot in Figure 1 shows that the two data sets vary together, and that there is a

negative correlation between the two. Table 1 lists the correlation coefficient r=-0.838, which is

demonstrated by the graph. To assess the fit, the coefficient of determination, r2=0.702, is evaluated.

The closer the r2 is to 1, the better the percent of variation is explained. 70% variation of life expectancy

is explained by smoking adults. 30% is attributed to unknown. To further assess the significance of the

model and intercept, hypothesis tests are performed.

Page 8: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

7

Hypothesis Testing for Model Significance

Figure 2

The hypothesis test above in Figure 2 is testing the intercept for significance for the model. It is

fair to conclude that the intercept is not equal to zero. The p-value (highlighted in yellow in Table 1) for

the intercept is 6.06E-61; this is less than the significance level of α= 5%. The intercept is significant for 5

and 10 percent significant levels.

Figure 3

The hypothesis test for the slope significance is shown in Figure 3. It is fair to conclude that the

slope is not equal to zero. The p-value (highlighted in yellow in Table 1) for the slope is 3.34E-14. It is

evident that the slope is significant at the 5 and 10 percent significance levels, since the p-value is so

much smaller. It is also conclusive that the slope is negative for this data set.

Page 9: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

8

Test for Overall Fit

Figure 4

The test for overall fit is shown in Figure 4; the F statistic reflects the mean squares ratio and the

larger the value, the more significant and better fit there is. Since F calculated is greater than F critical, it

is fair to conclude that the overall fit for this data set is significant. The p-value of this test is 3.34E-14,

shown in Table 1 highlighted in yellow, which is less than 5 percent significance. It is conclusive that

there is a goodness of fit for this model at the 5 and 10 percent significance levels.

Page 10: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

9

Residual Tests

Errors are Normally Distributed

Figure 5

From the normal probability plot of residuals above in Figure 5, it can be determined that there

is a linear relationship. The residuals seem to be consistent, indicating that errors have normal

distribution. The r2 value is close to 1, and there are few extreme outliers. In Figure 6 below, the

histogram shows the residuals in response to smoking adults. This Histogram shows a relatively normal

bell curve, with a few outliers. This is due to other variables that affect life expectancy; for example

there could be genetic health problems, poor life style, or many other factors. Therefore, it is conclusive

from these two Figures that there is constant variance for this data set.

Page 11: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

10

Figure 6

Errors Have Constant Variance

To determine whether there is constant variance, a plot of residuals and standard error was

produced for the life expectancy in years for the United States. If errors have constant variance, this

indicates that bias could exist in the data sets estimated variances and aren’t efficient for t values. See

Figure 7 below for variance test.

Figure 7

Based on the plot above in Figure 7, it is determined that no variance exists amongst the errors,

so there is not bias of the data and can be confirmed as heteroscedastic.

Page 12: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

11

Errors Are Independent

Autocorrelation is another error, where errors are not independent to one another. The most

efficient method for testing this is performing a Durbin-Watson test, and evaluating a runs test. The

Durbin-Watson method is a test based on a scale of 0-4, 2 being no correlation. If a value of less than 2 is

achieved, then there is positive correlation, and conversely if a value greater than 2 is achieved, there is

a negative correlation. The runs test is determined by reviewing the number of times a line crosses the

zero axis. If it’s fewer crosses then there is a positive correlation; if there is more than it is a negative

correlation.

Figure 8

From the Durbin-Watson test performed above in Figure 8, it is apparent that no

autocorrelation exists. If no Durbin-Watson table is available then a runs test is necessary.

Page 13: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

12

Figure 9

In Figure 9, the runs test compares the number of observations to the residuals. There are 26

observations that are on the positive side of standard errors, while there is 24 on the bottom half of the

standard error of zero. This indicates a very slight negative autocorrelation for the 50 observations of life

expectancy. It is close enough to where it is safe to say that there is no autocorrelation for the

observances.

Unusual Data

Leverage indicates observations that are far from the mean of the dependent variable. Based on

the table provided by Mega stat, found in Appendix B, there are 4 values with high leverage. California

and Utah have the lowest unusual values; they both have high life expectancy but average values fall

well below the mean. This means that there are other factors that affect life expectancy, since this

wasn’t a technical error. The high life expectancy can be attributed the Mormon life style that influences

those areas, as there is a large religious population cultivated out west (Cannon, 2010). California also

has bans on smoking in bars and public areas (McCarthy, 2014). West Virginia and Kentucky have the

Page 14: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

13

highest adult smoking rates and some of the lowest life expectancies. This is due to lack of bans in public

areas, leading to the rise in adult smoking rates (McCarthy, 2014).

The studentized residual values help reveal extreme outliers, above the -2 to 2 range. For this

data, there are three states that are unusual, above 2 but not close to 3. Minnesota, Mississippi, and

Utah have unusual studentized residuals. This means that there is a poorly predicted independent value

by the regression model or there is an unusual x value. Utah and Mississippi both have unusual x values,

as Utah’s high religion rate affects its smoking rate and Mississippi has a very high smoking rate as due

to lack of state wide bans (McCarthy, 2014). Minnesota is unusual because of the high life expectancy

rate; state wide bans are also present and have helped reduce public smoking.

Conclusion

From the regression analysis, hypothesis tests, and error evaluation provided by the statisticians

for S.C. Insurance Company, it is conclusive that there is a negative correlation between smoking adults

and life expectancy. If there is a 10% increase in smoking adults, then this could lead to a decrease in life

expectancy by 10%. 70% of the life expectancy is explained by the variable of smoking adults. 30% is

attributed to unexplained causes. S.C. Insurance Company should perform further statistical tests to

determine what other contributory factors affect life expectancy; such as obesity, income rate,

education level, and alcoholic consumption to see if these factors have any impact. It is conclusive to say

that potential clients that smoke could further impact the company’s expenses per year, and the

company should not market to this demographic.

Potential outliers could be problematic, as no two states are alike in their findings, and units

used could throw off data. Smoking adults per state were only offered as average percentages and not

specific numbers. This would be due to the fact that there is no way to verify specific numbers; surveys

could be conducted, but that does not ensure honest results.

Page 15: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

14

Bibliography

Behavioral Risk Factor Surveillance System. (2013, 08 15). Retrieved 05 31, 2014, from Centers of

Disease and Control Prevention: http://www.cdc.gov/brfss/data_documentation/index.htm

State Health Facts. (2014). Retrieved May 31, 2014, from The Henry J. Kaiser Family Foundation:

http://kff.org/other/state-indicator/smoking-adults/#

Cannon, M. W. (2010, April 13). UCLA Study Proves Mormons live longer. Retrieved June 3, 2014, from

Deseret News: http://www.deseretnews.com/article/705377709/UCLA-study-proves-Mormons-

live-longer.html?pg=all

McCarthy, J. (2014, March 13). In U.S., Smoking Rate Highest in Kentucky, Lowest in Utah. Retrieved June

3, 2014, from Gallup Well-being: http://www.gallup.com/poll/167771/smoking-rate-lowest-

utah-highest-kentucky.aspx?utm_source=rss&utm_medium=rss&utm_campaign=in-u-s-

smoking-rate-lowest-in-utah-highest-in-kentucky-smoking-rate-in-alaska-has-dropped-the-

most-since-2008

Page 16: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

15

Appendix A

Location Smoking Adults (%) Life Expectancy at Birth (years)

1 Alabama 0.238 75.4

2 Alaska 0.205 78.3

3 Arizona 0.171 79.6

4 Arkansas 0.250 76.0

5 California 0.126 80.8

6 Colorado 0.177 80.0

7 Connecticut 0.160 80.8

8 Delaware 0.197 78.4

9 Florida 0.177 79.4

10 Georgia 0.204 77.2

11 Hawaii 0.146 81.3

12 Idaho 0.164 79.5

13 Illinois 0.186 79.0

14 Indiana 0.240 77.6

15 Iowa 0.181 79.7

16 Kansas 0.194 78.7

17 Kentucky 0.283 76.0

18 Louisiana 0.248 75.7

19 Maine 0.203 79.2

20 Maryland 0.162 78.8

21 Massachusetts 0.164 80.5

22 Michigan 0.233 78.2

23 Minnesota 0.188 81.1

24 Mississippi 0.240 75.0

25 Missouri 0.239 77.5

26 Montana 0.197 78.5

27 Nebraska 0.197 79.8

28 Nevada 0.181 78.1

29 New Hampshire 0.172 80.3

30 New Jersey 0.173 80.3

31 New Mexico 0.193 78.4

32 New York 0.162 80.5

33 North Carolina 0.209 77.8

34 North Dakota 0.212 79.5

35 Ohio 0.233 77.8

36 Oklahoma 0.233 75.9

37 Oregon 0.179 79.5

38 Pennsylvania 0.214 78.5

39 Rhode Island 0.174 79.9

40 South Carolina 0.225 77.0

Page 17: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

16

41 South Dakota 0.220 79.5

42 Tennessee 0.249 76.3

43 Texas 0.182 78.5

44 Utah 0.106 80.2

45 Vermont 0.165 80.5

46 Virginia 0.190 79.0

47 Washington 0.172 79.9

48 West Virginia 0.282 75.4

49 Wisconsin 0.204 80.0

50 Wyoming 0.218 78.3

Page 18: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

17

Appendix B

Studentized

Studentized Deleted

Observation Life Expectancy at Birth (years) Predicted

Residual Leverage Residual Residual

1 75.40 77.14 -1.74 0.044 -1.953 -2.014

2 78.30 78.41 -0.11 0.021 -0.119 -0.118

3 79.60 79.71 -0.11 0.032 -0.121 -0.120

4 76.00 76.69 -0.69 0.061 -0.774 -0.771

5 80.80 81.43 -0.63 0.101 -0.729 -0.725

6 80.00 79.48 0.52 0.027 0.577 0.573

7 80.80 80.13 0.67 0.043 0.749 0.746

8 78.40 78.71 -0.31 0.020 -0.347 -0.344

9 79.40 79.48 -0.08 0.027 -0.088 -0.087

10 77.20 78.45 -1.25 0.020 -1.378 -1.391

11 81.30 80.67 0.63 0.063 0.717 0.713

12 79.50 79.98 -0.48 0.038 -0.532 -0.528

13 79.00 79.14 -0.14 0.022 -0.149 -0.148

14 77.60 77.07 0.53 0.047 0.596 0.592

15 79.70 79.33 0.37 0.025 0.414 0.410

16 78.70 78.83 -0.13 0.020 -0.142 -0.141

17 76.00 75.42 0.58 0.131 0.678 0.674

18 75.70 76.76 -1.06 0.058 -1.198 -1.203

19 79.20 78.48 0.72 0.020 0.791 0.788

20 78.80 80.05 -1.25 0.041 -1.400 -1.415

21 80.50 79.98 0.52 0.038 0.584 0.579

22 78.20 77.34 0.86 0.039 0.964 0.963

23 81.10 79.06 2.04 0.022 2.258 2.364

24 75.00 77.07 -2.07 0.047 -2.318 -2.434

25 77.50 77.11 0.39 0.046 0.441 0.437

26 78.50 78.71 -0.21 0.020 -0.237 -0.234

27 79.80 78.71 1.09 0.020 1.200 1.206

28 78.10 79.33 -1.23 0.025 -1.359 -1.371

29 80.30 79.67 0.63 0.031 0.699 0.695

30 80.30 79.63 0.67 0.030 0.741 0.738

31 78.40 78.87 -0.47 0.020 -0.516 -0.512

32 80.50 80.05 0.45 0.041 0.499 0.495

33 77.80 78.25 -0.45 0.022 -0.503 -0.499

34 79.50 78.14 1.36 0.023 1.505 1.526

35 77.80 77.34 0.46 0.039 0.517 0.513

36 75.90 77.34 -1.44 0.039 -1.603 -1.630

37 79.50 79.40 0.10 0.026 0.108 0.106

Page 19: Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by targeting specific clients through marketing campaigns , and wants to forecast the next

Life Expectancy in Response to Adult Smokers

18

38 78.50 78.06 0.44 0.024 0.483 0.480

39 79.90 79.59 0.31 0.029 0.339 0.336

40 77.00 77.64 -0.64 0.031 -0.714 -0.710

41 79.50 77.83 1.67 0.027 1.848 1.898

42 76.30 76.72 -0.42 0.060 -0.478 -0.474

43 78.50 79.29 -0.79 0.024 -0.873 -0.871

44 80.20 82.20 -2.00 0.152 -2.373 -2.499

45 80.50 79.94 0.56 0.037 0.626 0.622

46 79.00 78.98 0.02 0.021 0.020 0.020

47 79.90 79.67 0.23 0.031 0.255 0.252

48 75.40 75.46 -0.06 0.129 -0.071 -0.071

49 80.00 78.45 1.55 0.020 1.718 1.755

50 78.30 77.91 0.39 0.026 0.432 0.428

Durbin-Watson = 2.43