elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and...

54
Student name: Steve Tawil Student number: n10449281 Title: Difference in resting heart rate between those with and without diabetes Course Code: CS47 Data set: Framingham_16_519_143.RData Word Count: Question 1: 437 Question 2: 687 Question 3: 979 Question 4: 1373

Transcript of elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and...

Page 1: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Student name: Steve Tawil

Student number: n10449281

Title: Difference in resting heart rate between those with and without diabetes

Course Code: CS47

Data set: Framingham_16_519_143.RData

Word Count:

Question 1: 437

Question 2: 687

Question 3: 979

Question 4: 1373

Unit title: Difference between resting heart rate and diabetes

Unit Code: PUB561

Unit Coordinator: Darren Wraith

Page 2: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

ContentsQuestion 2.............................................................................................................................................3

Analytical plan...................................................................................................................................4

Study design:.....................................................................................................................................4

Variables:...........................................................................................................................................4

Hypotheses:.......................................................................................................................................5

Univariate analysis:............................................................................................................................5

Those with and without diabetes..................................................................................................5

Bivariate analysis:..............................................................................................................................6

Statistical Tests and Assumptions:.....................................................................................................6

Significance levels:.............................................................................................................................6

Analysis..................................................................................................................................................6

Univariate analysis:............................................................................................................................7

Resting heart rate..........................................................................................................................7

Those with and without diabetes..................................................................................................8

Bivariate analysis:..............................................................................................................................9

Statistical tests and assumptions:....................................................................................................11

Summary:........................................................................................................................................12

Appendices:.........................................................................................................................................13

Question 3...........................................................................................................................................14

Analytical plan.................................................................................................................................14

Study design:...............................................................................................................................14

Variables:.....................................................................................................................................14

Hypothesis:..................................................................................................................................14

Univariate analysis:......................................................................................................................15

Bivariate analysis:........................................................................................................................15

Statistical Tests and Assumptions:...............................................................................................16

Significance:.................................................................................................................................17

Analysis................................................................................................................................................17

Univariate analysis:..........................................................................................................................17

Bivariate analysis:............................................................................................................................20

Statistical tests and Assumptions:...................................................................................................21

Summary:........................................................................................................................................22

Appendices..........................................................................................................................................24

Question 4...........................................................................................................................................27

Analytical plan.................................................................................................................................27

Page 3: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Variables:.....................................................................................................................................27

Hypothesis:..................................................................................................................................27

Univariate Analysis:.....................................................................................................................28

Normality Check Rules:................................................................................................................28

Bivariate Analysis:........................................................................................................................29

Multiple Linear Regression..........................................................................................................30

Statistical Significance:.................................................................................................................30

Analysis:...............................................................................................................................................31

Univariate analysis...........................................................................................................................31

Bivariate Analysis:............................................................................................................................35

Statistical tests and Assumptions:...................................................................................................39

Summary:........................................................................................................................................42

Appendices:.........................................................................................................................................44

Question 2

Page 4: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Is there a difference in resting heart rate between those with and without diabetes?

Analytical plan

Study design:

This study is an observational cross-sectional study design involving 1200 participants

showing the differences in heartrate between people who do and don’t have diabetes.

Variables:

Independent variable: People with and without diabetes – Categorical, Dichotomous

(binary). I have recoded the variable and this was done in R Commander by going to the

‘data’ tab, then the ‘manage variables in active data set’ tab, then ‘convert numeric

variables to factors…’ tab and finally, we put in the appropriate variables.

+ DIABETES <- factor(DIABETES, labels=c('No','Yes'))

+ })

Possible values include:

Yes (with diabetes) (1)

No (without diabetes) (0)

Dependent variable: Resting heart rate – Continuous

Possible values include:

Numerical values between 44 and 120

Page 5: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Hypotheses:

H0: There is no difference in resting heartrate between those with and without diabetes. The mean heart rate of people with diabetes is equal to the mean heart rate of people without diabetes.

H1: There is a difference in resting heart rate between those with and without diabetes. The two means are different.

Univariate analysis:

Resting heart rate

Numerical summary: A numerical summary will be produced including mean, standard deviation, IQR, median, minimum and maximum values. Skewness and Kurtosis will be calculated, and normality of distribution will be assessed.

Graphical summary: A histogram will be used to show the distribution of the scores.

Those with and without diabetes

Numerical summary: A frequency distribution table will be used to present the number (and %) of people with and without diabetes

Graphical summary: A bar graph will be used to show the number of people with and without diabetes.

Page 6: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Bivariate analysis:

Numerical summary: A numerical summary will be produced including mean, standard deviation, IQR, median, minimum and maximum values. Skewness and Kurtosis will be calculated, and normality of distribution will be assessed for both groups (with/without diabetes).

Graphical summary: A side-by-side box and whisker plot will be used to compare the centre and variability of the groups (with/without diabetes) and also histogram will be used to show the difference in the two groups (with and without diabetes) with heartrate

Statistical Tests and Assumptions:

An independent sample (two samples) t-test will be used to compare the values of the two groups (with/without diabetes). This test will be used because it compares the means of two independent groups in order to determine if there is statistical evidence that the associated population means are different.

Assumptions:

1. Results are normally distributed2. Diabetes in each group must be normally distributed or n>30 in each group. If

assumption is not met, then use Mann-Whitney U test as it is the non-parametric version of two-sample t-test

3. The variance in diabetes is the same for both groups. This will be checked using Levene’s test

Significance levels:

P<0.05 will be used to indicate statistical significance

Analysis

Page 7: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Univariate analysis:

Resting heart rate

Numerical summary:

Data relating to resting heart rate was collected from 1199 of the 1200 participants (1 missing). The lowest resting heart rate was 44 and the highest resting heart rate was 120. The mean resting heart rate for the sample was 77.5 (SD=12.5), as shown below.

Graphical summary: Fig 1.1

The histogram below (Fig 1.1) shows the frequency heart rate of the participants. The lowest resting heart rate was 44 and the highest resting heart rate was 120, as shown below. The histogram below is bell shaped.

Number of Subjects: 1199Amount of Missing Data: 1Median: 76Range: 76Interquartile Range: 15Mean: 77.54629Standard Deviation: 12.47432Mean ± 3 SD: 40.12333 (-)

114.96925(+)Minimum Value: 44Maximum Value: 120Skewness Coefficient: 0.3687557Kurtosis Coefficient: -0.02237312Is data normally distributed? Not violating the

assumption of normalities.

Page 8: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Those with and without diabetes

Numerical Summary: Fig 1.2

Out of the data that was collected, 90 (7.5%) participants had diabetes and 1110 (92.5%) don’t have diabetes with a total of 1200 (100%) participants all together.

Graphical summary:

Diabetes %Yes 90 7.5%No 1110 92.5%Total 1200 100%

Page 9: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Altogether there are 1200 participants. There are 1110 (92.5%) participants that do not have

diabetes and 90 (7.5%) participants that have diabetes, however, there is 1 missing value

from the “Diabetes” column which means there are 1199 participants that do/don’t have

diabetes.

As shown in the fig 1.3 below, there are 1100 (92.5%) that don’t have diabetes and 90

(7.5%) participants that do have diabetes.

Bivariate analysis:

For participants with diabetes, the median heart rate is 82 (IQR: 16.75) For participants without diabetes, the median heart rate is 75 (IQR: 16.00). There are more participants that don’t have diabetes compared to the people that do have diabetes. There appears to be a difference of approximately seven between the medians of the two groups with “yes” respondents indicating lower diabetes that “no” respondents. This is all shown in the table and the two graphs below.

Numerical summary:

Diabetes “yes” Diabetes “no”

Page 10: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Number of subjects:

90 1110

Amount of NA 0 1Median 82 75Range 58 76IQR 16.75 16.00Mean 83.38889 77.07214SD 13.29923 12.28998Mean ± 3 SD 123.28658 (+), 43.4912 (-) 113.94208 (+), 40.2022 (-)Minimum 54 44Maximum 112 120Skewness 0.1822640 0.3716817Kurtosis -0.37699621 0.01815067Is data normally distributed?

No – Max does not = + 3 SD

No – Min does not = - 3 SD

Graphical summary: fig 1.4

Page 11: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Fig 1.5 Frequency of participants with and without diabetes

Statistical tests and assumptions:

The independent sample (two samples) t-test was used to compare the means of the two groups (people with/without diabetes). I have decided to use this test because it compares the means of two independent groups in order to determine if there is statistical evidence that the associated population means are different. The histogram does have a bell-shaped curve which means that the data is normally distributed.

The second assumption; Diabetes in each group must be normally distributed or n>30 in each group. If assumption is not met, then use Mann-Whitney U test as it is the non-parametric version of two-sample t-test. We can obviously see that n>30 because as shown in the data set, there are 1200 participants.

The final assumption; The variance in diabetes is the same for both groups. This will be checked using Levene’s test. The equal variance was checked by using Levene’s test. Once Levene’s test was completed, it gave us a p value of 0.2903 which shows that there is no equal variance.

Page 12: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

The assumptions of this test (Normally distributed, n>30 and Equal variance) were met. The two-sample t-test produced a result of (p = 0.000003518) with 95% confidence. Therefore, the data rejects the null hypothesis and supports the alternative hypothesis.

Table: Levene’s test: DIABETES~ HEARTRATE

Levene’s testdf F Value P Value1 1.1191 0.2903

Independent sample (two samples) t-test: DIABETES~ HEARTRATE

Independent sample (two sample) t-test 95% confidenceP Value df t-value Lower -

LimitUpper –

Limit0.0000315 101.72 -4.3575 -9.192171 -3.441332

Summary:

A cross sectional study was achieved gathering data of randomly selected participants who do and don’t have diabetes and their resting heart rate. Participants included 1110 (92.5%) that do not have diabetes and 90 (7.5%) that do have diabetes, however there was 1 missing participant from the “diabetes” column which means that there are 1199 participants that do or don’t have diabetes. The median heart rate score for the total sample was 76bpm. A large number of participants do not have diabetes and only a small number of participants have diabetes.

An independent sample (two sample) t-test was used to compare the means of the two groups (with/without diabetes). I used Levene’s test to see if there were equal variances between the two groups. The p-value that was given is (p = 0.2903). This shows that there is no equal variance. The independent samples test gave me a p value of (p = 0.0000315) which means it is statistically significant (p<0.05) which means that that null hypothesis can be rejected and the alternative hypothesis, which is “There is a difference in resting heart rate between those with and without diabetes” can be supported.

Page 13: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Therefore, there is a difference in resting heart rate between those with and without diabetes. The clinical implication is that doctors need to be aware of patients with diabetes because they could have an increased heart rate.

Appendices:

Question 3

Are there differences in high density lipoprotein cholesterol (HDLC) for people who are underweight, normal, overweight or obese?

Page 14: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Analytical plan

Study design:

This study is an observational cross-sectional study design involving 1200 participants

showing the differences in high density lipoprotein cholesterol (HDLC) for participants who

are underweight, normal, overweight or obese.

Variables:

Independent variable: High density lipoprotein cholesterol (HDLC) – Continuous variable,

where responses range between 11 and 189. This is labelled as ‘HDLC’ in the data set.

Dependent variable: Participants weight – Categorical with four possible categories which

are; underweight, normal, overweight, or obese. To categorize participants into these four

groups, I had to create a new variable based on the existing variable BMI. This was done by

going to ‘data’ > ‘manage variables in active data set’ > ‘recode variables’ and selecting

‘BMI’. In the ‘Enter recode directives’, the following codes were used – 0 : 19.999 =

“underweight”; 19.999 : 24.999 = “normal”; 24.999 : 29.999 = "overweight"; 29.999 : 99.999

= "obese".

Hypothesis:

H0: There is no difference in high density lipoprotein cholesterol (HDLC) for people who are

underweight, normal, overweight or obese

Page 15: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

H1: There is a difference in high density lipoprotein cholesterol (HDLC) for people who are

underweight, normal, overweight or obese.

Univariate analysis:BMI

Numerical summary: A frequency distribution table will be used to present the number (and

%) of people who are underweight, normal, overweight, or obese.

Graphical Summary: A bar graph will be produced to show the number of people who are

underweight, normal, overweight, or obese.

HDLC

Numerical summary: A numerical summary will be produced including mean, standard deviation, IQR, median, minimum and maximum values. Skewness and Kurtosis will be calculated, and normality of distribution will be assessed for both groups.

Graphical summary: For the graphical summary, we will be using a histogram to show the

distribution of the scores.

Bivariate analysis:Numerical Summary: A frequency distribution table will be presented.

Graphical Summary: For the graphical summary, I will be using a Box and whisker plot and a

scatterplot to show the spread of HDLC across the 4 BMI groups.

Page 16: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Statistical Tests and Assumptions:

An ANOVA test ill be used to study the variance of the data. The ANOVA test compares the

means of two or more group means for statistical significance.

The assumptions for this ANOVA test and Tukey test are;

1. Equal Variance – This will be checked using Levene’s test

2. Independence of observations

3. Distribution of the responses within one group are normally distributed n>30.

Assumptions for Levene’s test are;

1. Independent observations

2. Equal variance

Levene’s testdf F Value P Value3 0.7545 0.5198

Post-hoc test graph

Page 17: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Significance:

P<0.05 will be used to indicate statistical significance

Analysis

Univariate analysis:

Page 18: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

BMI

There were 1200 responses in this set of data, however, there were 5 missing responses in the “BMI” column which means that there were 1195 responses. The data recorded, 68 (5.69%) were underweight, 476 (39.83%) were normal, 496 (41.51%) were overweight, and 155 (12.97%) were obese. This is shown in the numerical and graphical summaries below…

Numerical Summary:

I have used a frequency distribution table below because it shows how many participants in each group. For example, there are 68 (5.69%) participants who are underweight, 476 (39.83%) participants who are normal, 496 (41.51%) participants are overweight and finally, there are 155 (12.97%) who are obese. This gives us a total o1f 1195 (100%) with 5 missing values.

Underweight Normal Overweight Obese TotalNumber of

participants68 476 496 155 1195

% 5.69% 39.83% 41.51% 12.97% 100%

Graphical Summary

I used a bar graph for the BMI graphical summary as it compares the results graphically. As that was said above, 68 (5.69%) participants who are underweight, 476 (39.83%) participants who are normal, 496 (41.51%) participants are overweight and finally, there are 155 (12.97%) who are obese. This gives us a total of 1195 (100%) with 5 missing values.

Page 19: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

HDLC

Numerical summary:

The numerical summary below, shows a table of Numerical summary: A numerical summary will be produced including mean, standard deviation, IQR, median, minimum and maximum values. Skewness and Kurtosis will be calculated, and normality of distribution will be assessed for both groups. The minimum value is 11 and the maximum value is 189.

Graphical summary:

Number of Subjects: 1122Amount of Missing Data: 78Median: 48Range: 178Interquartile Range: 19Mean: 49.52317Standard Deviation: 16.4995Mean ± 3 SD: 0.02467(-)

99.02167(+)Minimum Value: 11Maximum Value: 189Skewness Coefficient: 1.355518Kurtosis Coefficient: 6.039377Is data normally distributed? Not violating the

assumption of normalities.

Page 20: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

I have used a histogram to represent the data of high-density lipoprotein cholesterol. As shown below, the graph has bell-shaped curve which means that the data could be normally distributed.

Bivariate analysis:

For the bivariate analysis, I used a box and whisker plot to compare the high-density lipoprotein cholesterol (HDLC) for participants who are underweight, normal, overweight or obese. As you can see from the graph below, the results are a bit similar to each other, not too much of a difference. The ‘normal’ and ‘overweight’ responses have a very similar result. The results of HDLC range from 11 to 189.

Page 21: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Statistical tests and Assumptions:

An ANOVA test was used to test the variance between the means of the two groups. The

assumptions for an ANOVA test are the same as the t-test assumptions. Once the ANOVA

test was done, it gave me a p value of 0.000000218 which means that I reject the null

hypothesis and go on to conducting a post-hoc test because the ANOVA test showed a

statistically significant p value (p<0.05), however, it doesn’t show which specific groups

differed, whereas a post-hoc test does.

Post-hoc test showed that participants of obese or normal were on average 5.961 (95% Cl -

9.9295 - -19923). The participants of overweight or normal were on average 4.302 (95% Cl -

7.04 - -1.57). The participants of underweight or obese were on average 10.667 (95% Cl

4.3131 – 17.0202). Finally, the participants of underweight or overweight were on average

9.007 (95% Cl 3.3416 – 14.6732). The underweight or normal group had a mean difference

of 4.706 (p = 0.143) and the overweight or obese group had a mean difference of 1.659 (p =

0.694). Therefore, these two groups failed to meet the 0.05 significance level.

Levene’s Test was used to check for equal variance. The p value that was given was 0.5198,

this means that the p value is bigger than 0.05. This means that we can assume equal

variance. In R commander, Levene’s test is found under; Statistics Variances Levene’s

test

Page 22: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

The first assumption, which is; Equal variance. Levene’s test was used to check for equal

variance. The p value that was given was 0.5198, this means that the p value is bigger than

0.05. This means that we can assume equal variance.

The second assumption, which is; Independence of observation. We know that this

assumption is correct, because the responses are all independent.

The final assumption; the distribution of the responses within one group are normally

distributed. We can see that the data is normally distributed because as can be seen in the

histogram above, it is bell shaped which means it is normally distributed.

Summary:

This study is an observational cross-sectional study design involving 1200 participants

showing the differences in high density lipoprotein cholesterol (HDLC) for participants who

are underweight, normal, overweight or obese. Of the 1200 participants, there were 5

missing responses in the “BMI” column which means that there were 1195 responses. The

data recorded, 68 (5.69%) were underweight, 476 (39.83%) were normal, 496 (41.51%) were

overweight, and 155 (12.97%) were obese. The responses range between 11 and 189.

The bivariate analysis involved a boxplot comparing the HDLC across the 4 groups. The

graph showed that the data was similar in spread, as all the groups had a similar range from

the top of the whisker to the bottom. The group which showed the most difference was the

normal.

A one-way ANOVA test was produced for the BMI groups. Once the ANOVA test was done, it

gave me a p value of 0.000000218 which means that I reject the null hypothesis and I went

on to conducting a post-hoc test. The post-hoc test shows which specific groups differed,

Page 23: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

whereas, the ANOVA test tells you whether you have an overall difference between the

groups.

Once the post-hoc test was run, the results showed that the p value for the underweight or

normal group (p = 0.143) and the p value for overweight or obese group (p = 0.694) were

both >0.05 which means that they are not statistically significant, whereas the other 2

groups were both statistically significant with p values <0.05 (p = <0.001).

Levene’s Test was used to check for equal variance. The p value that was given was 0.5198,

this means that the p value is bigger than 0.05. This means that we can assume equal

variance. In R commander, Levene’s test is found under; Statistics Variances Levene’s

test.

The assumptions for Levene’s test, which are; independent observations and equal variance.

These two assumptions were both met. We know that the assumption; independence of

observation, is correct, because the responses are all independent. We also know that the

assumption; equal variance, was met because the p value was larger than 0.05. The clinical

implications of high-density lipoprotein cholesterol are an increased risk of coronary heart

disease.

Appendices

Means, IQR, Min, Max, Skewness and Kurtosis R commander output

Page 24: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Recode data for BMI_4grps_16_519_143 R output

Levene’s test R commander output ~ “mean”

R commander ANOVA test output

Page 25: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Levene’s test R commander output ~ “Median”

Post-hoc test R commander output

Page 26: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Question 4Are age and casual serum glucose levels significant predictors of a person’s systolic blood

pressure?

Analytical planThis study is an observational cross-sectional study design involving 1200 participants

showing the significant predicators of a person’s systolic blood pressure.

Variables:

Independent variable: Age – Continuous

Independent variable: Casual Serum Glucose levels – Continuous

The variables for this variable did not need to be coded using the coding manual.

Dependent variable: Systolic Blood Pressure – Continuous

The variables for this variable did not need to be coded using the coding manual.

Hypothesis:

H0: Combined, age and casual serum glucose levels do not explain variation in systolic blood

pressure.

H1: Combined, age and casual serum glucose levels do explain variation in systolic blood

pressure.

Page 27: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

H0: B0 ≠ 0

H1: B0 = 0

H0: Age does not explain variation in systolic blood pressure.

H1: Age does explain variation in systolic blood pressure.

H0: Causal serum glucose levels does not explain systolic blood pressure.

H1: Casual serum glucose levels does explain systolic blood pressure.

Univariate Analysis:

Age

Numerical Summary will be presented with the mean or standard deviation OR median or

interquartile range after checking normality.

The graphical summary will be present as a box plot.

Casual serum glucose levels

Numerical Summary will be presented with the mean or standard deviation OR median and

interquartile range after checking normality.

Graphical summary will be presented as a box plot.

Systolic Blood Pressure

Numerical Summary will be presented with the mean or standard deviation OR median or

interquartile range after checking normality.

Page 28: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Graphical summary will be present as a box plot.

Normality Check Rules:

Median is within 10% of the mean;

Mean ± 3 SD approximates the min and max values (subjective);

Skewness coefficient is between -2 and +2*;

Kurtosis coefficient is between -2 and +2.

Bivariate Analysis:

Age : Systolic Blood Pressure

Numerical Summary – Pearson’s correlation coefficient test will be run accessing if there is a

linear relationship, a positive or negative relationship and a significant p-value;

Graphical Summary – a scatterplot will be generated

Casual Serum Glucose Levels : Systolic Blood Pressure

Numerical Summary – Pearson’s correlation coefficient test will be run accessing if there is a linear

relationship, a positive or negative relationship and a significant p-value;

Graphical Summary – a scatterplot will be generated

Statistical Test and Assumptions:

Page 29: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Multiple Linear Regression will be used to determine the model of the data. This model will

then be used to test if the data is statistically significant and useable in a clinical sense.

Multiple linear regression has been selected due to the multiple variables involved in this

scenario.

The proposed model (using variable names instead of x and y) would be:

SBP = b0 + b1 x AGE + b2 x GLUCOSE

The assumptions of Multiple Linear Regression are:

1. Assumes that the values of the dependent variable are independent from each other

2. Assumes that the residuals are normally distributed around zero with constant

3. Assumes that the dependent variables and independent variables are linearly related

.Multiple Linear Regression

Systolic Blood Pressure and Age and Glucose

I used Pearsons correlation test to determine what relationship exists between the variables

age and systolic blood pressure. The results from Pearsons Correlation test indicate that

there is a linear relationship between Age and Systolic Blood Pressure as the p-value is

<0.05. In this case, I will be rejecting the null hypothesis concluding that these variables are

significantly correlated. In the scatterplot below, it is clear that this relationship between the

two variables is a positive relationship as the line of least squares is following an upward

trend. This positive relationship indicates that there is an individual’s systolic blood pressure

is associated with the age of the individual. In other words, as a participant gets older, their

systolic blood pressure will also increase.

Page 30: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Statistical Significance:

P < 0.05 will be used to indicate statistical significance

Analysis:

Univariate analysis.

Numerical Summary:

Age

As shown in the numerical summary below, there were 1200 participants with 0 Missing

participants. The mean is 60.44333 and the median is 59. The standard deviation is 8.2758.

Fig. 1.1 Numerical summary table showing the

Number of Subjects: 1200Amount of Missing Data: 0Median: 59Range: 36Interquartile Range: 14Mean: 60.44333Standard Deviation: 8.2758Mean ± 3 SD: 35.61593 (-)

85.27073(+)Minimum Value: 45Maximum Value: 81Skewness Coefficient: 0.333865Kurtosis Coefficient: -0.891805Is data normally distributed? Yes, the mean ± 3

SD are nearly equal

Page 31: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Graphical Summary:

There were 1200 participants aged between 45 and 81 as shown in the box plot below.

Fig. 1.1 Histogram showing the frequency of participants age

Casual serum glucose levels’

Numerical Summary:Number of Subjects: 1200Amount of Missing Data: 201Median: 84Range: 432Interquartile Range: 20.5Mean: 91.2983Standard Deviation: 34.49212Mean ± 3 SD: -12.17806 (-)

194.77466 (+)Minimum Value: 46Maximum Value: 478Skewness Coefficient: 5.636286Kurtosis Coefficient: 45.16248Is data normally distributed? No

Page 32: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Graphical Summary:

There were 1200 participants that had a casual serum glucose level between 46 and 478.

The whiskers in the box plot below are longer than the box.

Fig. 1.2 Histogram showing the frequecy of glucose levels of particpants

Systolic Blood Pressure

Numerical Summary:

As shown in the numerical summary below, there were 1200 participants with 0 missing.

The median is 136.25 and the mean being 140.2413. This data is not normally distributed

due to the fact that the mean ± 3 SD does not equal the minimum and maximum values.

Page 33: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Graphical Summary:

As shown in the box plot below, there were 1200 participants that had a systolic blood

pressure between 46 and 478.

Fig. 1.3 histogram showing the frequency of participants with systolic blood pressure

Number of Subjects: 1200Amount of Missing Data: 0Median: 136.25Range: 168Interquartile Range: 31.625Mean: 140.2413Standard Deviation: 23.59538 Mean ± 3 SD: 69.45516(-)

211.02744 (+)Minimum Value: 46Maximum Value: 478Skewness Coefficient: 5.636286Kurtosis Coefficient: 45.16248Is data normally distributed? No

Page 34: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Bivariate Analysis:The bivariate analysis included Pearson’s correlation coefficient test to be run accessing if

there is a linear relationship, a positive or negative relationship and a significant p-value for

the numerical summary. From the results of Pearsons correlation test of AGE and SYSBP, it

shows that the null hypothesis can be rejected due to the p value being < 2.2e-16

Age : Systolic Blood Pressure

Numerical Summary:

In the numerical summary below, the adjusted R2 value is 0.1375. The F-statistic is 192.2 on

1 and 1198 DF. The p-value, as you can see, is very small (p = < 2.2e-16) and this means that

the null hypothesis us unlikely to be supported.

Pearson’s correlation test : AGE and SYSBP

P-value < 2.2e-16

DF 1198

T-value 13.864

Correlation value 0.3718398

Page 35: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

95% confidence intervals Lower: 0.3220264

Upper: 0.4196001

Graphical Summary:

For the graphical summary, I included two scatterplots showing the correlation between the

independent and dependent variables. The scatterplots show that age and systolic blood

pressure had a significant relationship, while glucose and systolic blood pressure wasn’t

significant. The line of best fit showed that age and systolic blood pressure has a significant

relationship.

Fig. 1.4 Scatterplot comparing age and systolic blood pressure

Page 36: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Casual Serum Glucose Levels : Systolic Blood Pressure

Numerical Summary:

In the numerical summary below, the adjusted R2 value is 0.02181. The F-statistic is 23.25 on

1 and 997 DF. The p-value, as you can see, is (p = 0.000001642) and this means that the null

hypothesis is going to be rejected because p value is <0.05. The p-value being (p =

0.000001642) also means that glucose and systolic blood pressure are significantly

correlated.

Pearson’s correlation test : GLUCOSE and SYSBP

P-value 0.000001642

DF 997

T-value 4.8221

Correlation value 0.1509672

Page 37: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

95% confidence intervals Lower: 0.08978376

Upper: 0.21101543

Graphical Summary:

In the box plot below, the line of best fit shows that glucose and systolic blood pressure had

a significant relationship. Glucose levels ranged between 46 and 478 and systolic blood

pressure ranged between. In the scatter plot below, it shows that glucose and systolic blood

pressure both have positive relationship. This is due to the fact that the line of best fit was

going in an upward direction.

Fig. 1.5 Scatterplot comparing glucose levels and systolic blood pressure

Page 38: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Statistical tests and Assumptions:

The main statistical test for this question was multiple linear regression. The assumptions

for this test are; The dependent variable and independent variable are linearly related;

Assumes that the residuals are normally distributed and Assumes that the values of the

dependent variable are independent from each other.

The first assumption, which is; The dependent variable and independent variable are linearly

related. This assumption is supported by the scatterplots in the bivariate analysis. The two

graphs both showed a positive linear relationship.

The second assumption, which is; Assumes that the residuals are normally distributed. This

assumption was tested, and it was supported because in “Normal Q-Q” graph below, it

shows that the residuals are normally distributed around zero with constant.

The third assumption, which is; Assumes that the values of the dependent variable are

independent from each other. This is correct as it was checked via the study design.

To check the first hypothesis the F-statistic (ANOVA) was used because it provides a

summary of the significance of the whole model. The p-value was very small (p = < 2.2e-16).

This shows that the null hypothesis can be rejected, and it is unlikely.

Page 39: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

The row called “intercept” in the R commander output signifies the values for b0. The row

“Age” signifies the values for b1 and “glucose” signifies b2. The p-value for the “Age” column

is < 2.2e-16. This means that hypothesis 2 and 3 are rejected. The p values of each

independent variable must be assessed to confirm statistical significance.

The p-value for AGE is < 2.2e-16 so I would reject the null hypothesis and conclude that the

combination of glucose level influence systolic blood pressure (f2, 996) = 93.97, p < 2.2e-16).

The adjusted R2 = 0.157 which means that there is 15.7% of the variation in systolic blood

pressure.

In the “intercepts” row in the coefficients table, the p-value is < 2.2e-16 which means that

we can reject the H0 for b0. For age, there is evidence to reject H0 and conclude that age

does explain variation in Systolic Blood Pressure (t = 12.96, p = < 2e-16).

For Glucose, there is also evidence to reject H0 because p < 0.001. Therefore, we can say

that glucose does explain variation in Systolic Blood Pressure (t = 3.98, p = 0.0000738)

The fitted model is: SBP = 70.30 + 1.03 x age + 0.08 x Glucose

To check the model validity, correlation matrix was used to see if the residual were normally

distributed.

Pearson’s correlations gave us an estimate of the correlation between each of the variables,

however, Holm’s method gave us the adjusted p-values. From the R commander output,

there appears to be an insignificant association.

We then used VIF (variation-inflation factors) to further check the correlation between the

variables. From the VIF R commander output, all the values are < 5. This means that there is

multicollinearity between the independent variables.

Page 40: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

The results of the confidence interval display the 95% confidence interval for age is

0.3220264 - 0.4196001. The 95% confidence interval for glucose is 0.08978376 -0.21101543.

Summary:

Page 41: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

This study is an observational cross-sectional study design involving 1200 participants

checking if age and casual serum glucose levels are significant predictors of a person’s

systolic blood pressure. Of the 1200 participants, there were 0 missing responses in the

“Age” column which means that there were 1200 responses. The minimum age was 45 and

the maximum was 81. There were 201 participants missing from the “glucose” column. The

minimum glucose level of the participants is 46 and 478 Finally, there are 0 missing

participants in the “Systolic Blood Pressure” column. The minimum systolic blood pressure

of the participants is 86 and 254.

The first assumption, which is; The dependent variable and independent variable are linearly

related. This assumption is supported by the scatterplots in the bivariate analysis. The two

graphs both showed a positive linear relationship. The second assumption, which is;

Assumes that the residuals are normally distributed. This assumption was tested, and it was

supported because in “Normal Q-Q” graph below, it shows that the residuals are normally

distributed around zero with constant. The third assumption, which is; Assumes that the

values of the dependent variable are independent from each other. This is correct as it was

checked via the study design.

The main statistical test for this question was multiple linear regression. The assumptions

for this test are; The dependent variable and independent variable are linearly related;

Assumes that the residuals are normally distributed and Assumes that the values of the

dependent variable are independent from each other.

The results of the confidence interval display the 95% confidence interval for age is

0.3220264 - 0.4196001. The 95% confidence interval for glucose is 0.08978376 -0.21101543.

The p-value for AGE is < 2.2e-16 so I would reject the null hypothesis and conclude that the

combination of glucose level influence systolic blood pressure (f2, 996) = 93.97, p < 2.2e-16).

Pearson’s correlations gave us an estimate of the correlation between each of the variables,

however, Holm’s method gave us the adjusted p-values. From the R commander output,

Page 42: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

there appears to be an insignificant association. We then used VIF (variation-inflation

factors) to further check the correlation between the variables. From the VIF R commander

output, all the values are < 5. This means that there is multicollinearity between the

independent variables. The clinical implications of systolic blood pressure is that there is a

high risk associated with isolated systolic blood pressure, which is much more common in

the elderly than in young adults.

Appendices:

Appendix 1

Page 43: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Appendix 2

Appendix 3

Page 44: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Appendix 4

Appendix 5

Page 45: elitehomework.com€¦  · Web viewThe first assumption, which is; The dependent variable and independent variable are linearly related. This assumption is supported by the scatterplots

Appendix 6