Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by...
Transcript of Project 1 - files.transtutors.com · Recently, S.C. has been trying to increase revenues by...
S.C. INSURANCE COMPANY
Project 1 Individual Portion
Amanda Fisher
6/4/2014
Life Expectancy in Response to Adult Smokers
1
Contents
Table of Figures ............................................................................................................................................. 2
Executive Summary ....................................................................................................................................... 3
Introduction .................................................................................................................................................. 4
Simple Regression Analysis ........................................................................................................................... 5
Hypothesis Testing for Model Significance ................................................................................................... 7
Test for Overall Fit ......................................................................................................................................... 8
Residual Tests................................................................................................................................................ 9
Errors are Normally Distributed ................................................................................................................ 9
Errors Have Constant Variance ............................................................................................................... 10
Errors Are Independent .......................................................................................................................... 11
Unusual Data ............................................................................................................................................... 12
Conclusion ................................................................................................................................................... 13
Bibliography ................................................................................................................................................ 14
Appendix A .................................................................................................................................................. 15
Appendix B .................................................................................................................................................. 17
Life Expectancy in Response to Adult Smokers
2
Table of Figures
Figure 1 ......................................................................................................................................................... 6
Figure 2 ......................................................................................................................................................... 7
Figure 3 ......................................................................................................................................................... 7
Figure 4 ......................................................................................................................................................... 8
Figure 5 ......................................................................................................................................................... 9
Figure 6 ....................................................................................................................................................... 10
Figure 7 ....................................................................................................................................................... 10
Figure 8 ....................................................................................................................................................... 11
Figure 9 ....................................................................................................................................................... 12
Life Expectancy in Response to Adult Smokers
3
Executive Summary
This study was conducted for the S.C. Insurance Company to determine if there is a direct
relationship between smoking adults and life expectancy in the United States during the year 2010. It is
believed that S.C. Insurance Company expenses are affected by life expectancies per state, and needs to
be evaluated for marketing campaigns. Statisticians came in to evaluate the data, and software, Mega
stat, was used to provide information that could then be interpreted by the statisticians. Regression
analysis, hypothesis testing, and error evaluation were performed to verify the data.
The correlation for life expectancy in response to smoking adults was found to be a strong
negative correlation of -0.838. This means that the more smoking adults there are, the lower the life
expectancy per state will be. The hypothesis tests were conducted to verify the significance, which all
proved to be significant at 5 and 10 percent levels. Then error evaluation was conducted to ensure that
errors are normally distributed, have constant variance, and are independent of one another. All error
evaluations came back satisfactory, with no extreme outliers.
From the information, it is conclusive that 70 percent of life expectancy is explained by the
variable smoking adults. 30 percent of life expectancy is unexplained. Further testing should be
conducted to determine what additional factors affect life expectancy; examples are alcohol
consumption, hobbies, cancer rates, etc. It is also decided that marketing campaigns will not be focused
on smoking adults.
Life Expectancy in Response to Adult Smokers
4
Introduction
S.C. Insurance Company is a for profit company responsible for providing insurance to
Americans. Recently, S.C. has been trying to increase revenues by targeting specific clients through
marketing campaigns, and wants to forecast the next fiscal year to determine the best course of action
to increase revenues. The forecasting department at S.C. has determined life expectancies have an
impact on yearly expenses, based on data of 2010. A group of statisticians have been brought in to
review the situation; and it was determined that one of the possible explanations is smoking clients
could have a shorter life expectancy.
The statisticians have determined that a correlation regression analysis of the independent
variable (the response variable), smoking adults, will be tested against the dependent variable (the
predictor variable), life expectancy. Based on the Kaiser Family Foundation finding's 1, the average life
expectancy (in years) is determined by each state's own observations (State Health Facts, 2014).The
percentage of adults (18 years or older) that smokes are calculated per state as well (State Health Facts,
2014). The target population for samples in 2010 consists of persons living in households who have
a working cellphone, are aged 18 and older, and received 90 percent or more calls on cellphones.
Samples are chosen randomly for the 50 states from a set area, determined by area code and
response to phone calls (Behavioral Risk Factor Surveillance System, 2013).
The methods used for testing will be simple regression analysis, hypothesis testing for
regression significance and slope significance, normal probability analysis, constant variance, and
independent error analysis. The simple regression analysis will determine the correlation coefficient or
the relationship between life expectancy and smoking adults; hypothesis tests will determine
1 Statistics are taken from the ‘CDC’- the Center for Disease and Control
Life Expectancy in Response to Adult Smokers
5
significance of independent and dependent variables, while overall fit will test the regression for overall
significance, and the three assumptions of normality are evaluated through three separate error tests.
Simple Regression Analysis
Table 1
The simple regression analysis was performed using Microsoft Excel, with the add-in Mega stat,
for the bivariate data. The data set used for all analysis can be found in Appendix A. From this data, the
summary of the information can be located in Table 1. The S.C. Insurance statisticians concluded the
estimated regression equation is: Life Expectancy= 86.2535-38.2715 Smoking Adults. For each increase
in percentage of adult smokers, the average years of life expectancy decrease by 38.2715. The
statisticians predict that if smoking adults increase by 10% it will yield a life expectancy of 82.43 years;
causing health insurance premiums to rise and more expenses for coverage to occur.
The simple regression analysis above provides the intercept and slope variables necessary to
decree the simple regression equation. The fitted equation is ŷ= 86.2535-38.2715x. The error term is not
observable, but the regression assumptions accept that errors are normally distributed, errors have
constant variance, and errors are independent of each other. The slope (b1=-38.272) states that for each
Life Expectancy in Response to Adult Smokers
6
increase in percentage for smoking adults, the average life expectancy decreases by 38.272 years. The
intercept (b0=86.254) suggests that when the adult smoking percentage is zero, the life expectancy
would be higher. For representation of the data, see the scatterplot in Figure 1 below.
Figure 1
The scatterplot in Figure 1 shows that the two data sets vary together, and that there is a
negative correlation between the two. Table 1 lists the correlation coefficient r=-0.838, which is
demonstrated by the graph. To assess the fit, the coefficient of determination, r2=0.702, is evaluated.
The closer the r2 is to 1, the better the percent of variation is explained. 70% variation of life expectancy
is explained by smoking adults. 30% is attributed to unknown. To further assess the significance of the
model and intercept, hypothesis tests are performed.
Life Expectancy in Response to Adult Smokers
7
Hypothesis Testing for Model Significance
Figure 2
The hypothesis test above in Figure 2 is testing the intercept for significance for the model. It is
fair to conclude that the intercept is not equal to zero. The p-value (highlighted in yellow in Table 1) for
the intercept is 6.06E-61; this is less than the significance level of α= 5%. The intercept is significant for 5
and 10 percent significant levels.
Figure 3
The hypothesis test for the slope significance is shown in Figure 3. It is fair to conclude that the
slope is not equal to zero. The p-value (highlighted in yellow in Table 1) for the slope is 3.34E-14. It is
evident that the slope is significant at the 5 and 10 percent significance levels, since the p-value is so
much smaller. It is also conclusive that the slope is negative for this data set.
Life Expectancy in Response to Adult Smokers
8
Test for Overall Fit
Figure 4
The test for overall fit is shown in Figure 4; the F statistic reflects the mean squares ratio and the
larger the value, the more significant and better fit there is. Since F calculated is greater than F critical, it
is fair to conclude that the overall fit for this data set is significant. The p-value of this test is 3.34E-14,
shown in Table 1 highlighted in yellow, which is less than 5 percent significance. It is conclusive that
there is a goodness of fit for this model at the 5 and 10 percent significance levels.
Life Expectancy in Response to Adult Smokers
9
Residual Tests
Errors are Normally Distributed
Figure 5
From the normal probability plot of residuals above in Figure 5, it can be determined that there
is a linear relationship. The residuals seem to be consistent, indicating that errors have normal
distribution. The r2 value is close to 1, and there are few extreme outliers. In Figure 6 below, the
histogram shows the residuals in response to smoking adults. This Histogram shows a relatively normal
bell curve, with a few outliers. This is due to other variables that affect life expectancy; for example
there could be genetic health problems, poor life style, or many other factors. Therefore, it is conclusive
from these two Figures that there is constant variance for this data set.
Life Expectancy in Response to Adult Smokers
10
Figure 6
Errors Have Constant Variance
To determine whether there is constant variance, a plot of residuals and standard error was
produced for the life expectancy in years for the United States. If errors have constant variance, this
indicates that bias could exist in the data sets estimated variances and aren’t efficient for t values. See
Figure 7 below for variance test.
Figure 7
Based on the plot above in Figure 7, it is determined that no variance exists amongst the errors,
so there is not bias of the data and can be confirmed as heteroscedastic.
Life Expectancy in Response to Adult Smokers
11
Errors Are Independent
Autocorrelation is another error, where errors are not independent to one another. The most
efficient method for testing this is performing a Durbin-Watson test, and evaluating a runs test. The
Durbin-Watson method is a test based on a scale of 0-4, 2 being no correlation. If a value of less than 2 is
achieved, then there is positive correlation, and conversely if a value greater than 2 is achieved, there is
a negative correlation. The runs test is determined by reviewing the number of times a line crosses the
zero axis. If it’s fewer crosses then there is a positive correlation; if there is more than it is a negative
correlation.
Figure 8
From the Durbin-Watson test performed above in Figure 8, it is apparent that no
autocorrelation exists. If no Durbin-Watson table is available then a runs test is necessary.
Life Expectancy in Response to Adult Smokers
12
Figure 9
In Figure 9, the runs test compares the number of observations to the residuals. There are 26
observations that are on the positive side of standard errors, while there is 24 on the bottom half of the
standard error of zero. This indicates a very slight negative autocorrelation for the 50 observations of life
expectancy. It is close enough to where it is safe to say that there is no autocorrelation for the
observances.
Unusual Data
Leverage indicates observations that are far from the mean of the dependent variable. Based on
the table provided by Mega stat, found in Appendix B, there are 4 values with high leverage. California
and Utah have the lowest unusual values; they both have high life expectancy but average values fall
well below the mean. This means that there are other factors that affect life expectancy, since this
wasn’t a technical error. The high life expectancy can be attributed the Mormon life style that influences
those areas, as there is a large religious population cultivated out west (Cannon, 2010). California also
has bans on smoking in bars and public areas (McCarthy, 2014). West Virginia and Kentucky have the
Life Expectancy in Response to Adult Smokers
13
highest adult smoking rates and some of the lowest life expectancies. This is due to lack of bans in public
areas, leading to the rise in adult smoking rates (McCarthy, 2014).
The studentized residual values help reveal extreme outliers, above the -2 to 2 range. For this
data, there are three states that are unusual, above 2 but not close to 3. Minnesota, Mississippi, and
Utah have unusual studentized residuals. This means that there is a poorly predicted independent value
by the regression model or there is an unusual x value. Utah and Mississippi both have unusual x values,
as Utah’s high religion rate affects its smoking rate and Mississippi has a very high smoking rate as due
to lack of state wide bans (McCarthy, 2014). Minnesota is unusual because of the high life expectancy
rate; state wide bans are also present and have helped reduce public smoking.
Conclusion
From the regression analysis, hypothesis tests, and error evaluation provided by the statisticians
for S.C. Insurance Company, it is conclusive that there is a negative correlation between smoking adults
and life expectancy. If there is a 10% increase in smoking adults, then this could lead to a decrease in life
expectancy by 10%. 70% of the life expectancy is explained by the variable of smoking adults. 30% is
attributed to unexplained causes. S.C. Insurance Company should perform further statistical tests to
determine what other contributory factors affect life expectancy; such as obesity, income rate,
education level, and alcoholic consumption to see if these factors have any impact. It is conclusive to say
that potential clients that smoke could further impact the company’s expenses per year, and the
company should not market to this demographic.
Potential outliers could be problematic, as no two states are alike in their findings, and units
used could throw off data. Smoking adults per state were only offered as average percentages and not
specific numbers. This would be due to the fact that there is no way to verify specific numbers; surveys
could be conducted, but that does not ensure honest results.
Life Expectancy in Response to Adult Smokers
14
Bibliography
Behavioral Risk Factor Surveillance System. (2013, 08 15). Retrieved 05 31, 2014, from Centers of
Disease and Control Prevention: http://www.cdc.gov/brfss/data_documentation/index.htm
State Health Facts. (2014). Retrieved May 31, 2014, from The Henry J. Kaiser Family Foundation:
http://kff.org/other/state-indicator/smoking-adults/#
Cannon, M. W. (2010, April 13). UCLA Study Proves Mormons live longer. Retrieved June 3, 2014, from
Deseret News: http://www.deseretnews.com/article/705377709/UCLA-study-proves-Mormons-
live-longer.html?pg=all
McCarthy, J. (2014, March 13). In U.S., Smoking Rate Highest in Kentucky, Lowest in Utah. Retrieved June
3, 2014, from Gallup Well-being: http://www.gallup.com/poll/167771/smoking-rate-lowest-
utah-highest-kentucky.aspx?utm_source=rss&utm_medium=rss&utm_campaign=in-u-s-
smoking-rate-lowest-in-utah-highest-in-kentucky-smoking-rate-in-alaska-has-dropped-the-
most-since-2008
Life Expectancy in Response to Adult Smokers
15
Appendix A
Location Smoking Adults (%) Life Expectancy at Birth (years)
1 Alabama 0.238 75.4
2 Alaska 0.205 78.3
3 Arizona 0.171 79.6
4 Arkansas 0.250 76.0
5 California 0.126 80.8
6 Colorado 0.177 80.0
7 Connecticut 0.160 80.8
8 Delaware 0.197 78.4
9 Florida 0.177 79.4
10 Georgia 0.204 77.2
11 Hawaii 0.146 81.3
12 Idaho 0.164 79.5
13 Illinois 0.186 79.0
14 Indiana 0.240 77.6
15 Iowa 0.181 79.7
16 Kansas 0.194 78.7
17 Kentucky 0.283 76.0
18 Louisiana 0.248 75.7
19 Maine 0.203 79.2
20 Maryland 0.162 78.8
21 Massachusetts 0.164 80.5
22 Michigan 0.233 78.2
23 Minnesota 0.188 81.1
24 Mississippi 0.240 75.0
25 Missouri 0.239 77.5
26 Montana 0.197 78.5
27 Nebraska 0.197 79.8
28 Nevada 0.181 78.1
29 New Hampshire 0.172 80.3
30 New Jersey 0.173 80.3
31 New Mexico 0.193 78.4
32 New York 0.162 80.5
33 North Carolina 0.209 77.8
34 North Dakota 0.212 79.5
35 Ohio 0.233 77.8
36 Oklahoma 0.233 75.9
37 Oregon 0.179 79.5
38 Pennsylvania 0.214 78.5
39 Rhode Island 0.174 79.9
40 South Carolina 0.225 77.0
Life Expectancy in Response to Adult Smokers
16
41 South Dakota 0.220 79.5
42 Tennessee 0.249 76.3
43 Texas 0.182 78.5
44 Utah 0.106 80.2
45 Vermont 0.165 80.5
46 Virginia 0.190 79.0
47 Washington 0.172 79.9
48 West Virginia 0.282 75.4
49 Wisconsin 0.204 80.0
50 Wyoming 0.218 78.3
Life Expectancy in Response to Adult Smokers
17
Appendix B
Studentized
Studentized Deleted
Observation Life Expectancy at Birth (years) Predicted
Residual Leverage Residual Residual
1 75.40 77.14 -1.74 0.044 -1.953 -2.014
2 78.30 78.41 -0.11 0.021 -0.119 -0.118
3 79.60 79.71 -0.11 0.032 -0.121 -0.120
4 76.00 76.69 -0.69 0.061 -0.774 -0.771
5 80.80 81.43 -0.63 0.101 -0.729 -0.725
6 80.00 79.48 0.52 0.027 0.577 0.573
7 80.80 80.13 0.67 0.043 0.749 0.746
8 78.40 78.71 -0.31 0.020 -0.347 -0.344
9 79.40 79.48 -0.08 0.027 -0.088 -0.087
10 77.20 78.45 -1.25 0.020 -1.378 -1.391
11 81.30 80.67 0.63 0.063 0.717 0.713
12 79.50 79.98 -0.48 0.038 -0.532 -0.528
13 79.00 79.14 -0.14 0.022 -0.149 -0.148
14 77.60 77.07 0.53 0.047 0.596 0.592
15 79.70 79.33 0.37 0.025 0.414 0.410
16 78.70 78.83 -0.13 0.020 -0.142 -0.141
17 76.00 75.42 0.58 0.131 0.678 0.674
18 75.70 76.76 -1.06 0.058 -1.198 -1.203
19 79.20 78.48 0.72 0.020 0.791 0.788
20 78.80 80.05 -1.25 0.041 -1.400 -1.415
21 80.50 79.98 0.52 0.038 0.584 0.579
22 78.20 77.34 0.86 0.039 0.964 0.963
23 81.10 79.06 2.04 0.022 2.258 2.364
24 75.00 77.07 -2.07 0.047 -2.318 -2.434
25 77.50 77.11 0.39 0.046 0.441 0.437
26 78.50 78.71 -0.21 0.020 -0.237 -0.234
27 79.80 78.71 1.09 0.020 1.200 1.206
28 78.10 79.33 -1.23 0.025 -1.359 -1.371
29 80.30 79.67 0.63 0.031 0.699 0.695
30 80.30 79.63 0.67 0.030 0.741 0.738
31 78.40 78.87 -0.47 0.020 -0.516 -0.512
32 80.50 80.05 0.45 0.041 0.499 0.495
33 77.80 78.25 -0.45 0.022 -0.503 -0.499
34 79.50 78.14 1.36 0.023 1.505 1.526
35 77.80 77.34 0.46 0.039 0.517 0.513
36 75.90 77.34 -1.44 0.039 -1.603 -1.630
37 79.50 79.40 0.10 0.026 0.108 0.106
Life Expectancy in Response to Adult Smokers
18
38 78.50 78.06 0.44 0.024 0.483 0.480
39 79.90 79.59 0.31 0.029 0.339 0.336
40 77.00 77.64 -0.64 0.031 -0.714 -0.710
41 79.50 77.83 1.67 0.027 1.848 1.898
42 76.30 76.72 -0.42 0.060 -0.478 -0.474
43 78.50 79.29 -0.79 0.024 -0.873 -0.871
44 80.20 82.20 -2.00 0.152 -2.373 -2.499
45 80.50 79.94 0.56 0.037 0.626 0.622
46 79.00 78.98 0.02 0.021 0.020 0.020
47 79.90 79.67 0.23 0.031 0.255 0.252
48 75.40 75.46 -0.06 0.129 -0.071 -0.071
49 80.00 78.45 1.55 0.020 1.718 1.755
50 78.30 77.91 0.39 0.026 0.432 0.428
Durbin-Watson = 2.43