Introduction to Statistical Analysis Using Graphpad Prism 6

Post on 15-Jul-2015

433 views 7 download

Transcript of Introduction to Statistical Analysis Using Graphpad Prism 6

Graphpad Prism 6

Introduction

drtamil@gmail.com

Download & Install

• You can download and install 30 days

evaluation version from

http://www.graphpad.com/demos/

• Upon installation, do not start the evaluation • Upon installation, do not start the evaluation

period unless you are ready to start using it

immediately.

drtamil@gmail.com

Uniqueness of Prism

• Prism caters for analysis and graphs for scientific

publication, especially for laboratory and

biomedical research.

• Data are usually entered and manipulated using • Data are usually entered and manipulated using

spreadsheet such as Microsoft Excel. Data

needed for analysis are copied into specific tables

within Prism.

• Specific analysis requires specific tables. So you

must know exactly what analysis is required.

drtamil@gmail.com

The 6 tables within Prism.

drtamil@gmail.com

XY Data Tables

drtamil@gmail.com

Column tables

drtamil@gmail.com

Grouped tables

drtamil@gmail.com

Contingency tables

drtamil@gmail.com

Survival tables

drtamil@gmail.com

Parts of whole tables

drtamil@gmail.com

Choosing the appropriate

statistical testsstatistical tests

Use these tables to choose the

appropriate statistical tests.

drtamil@gmail.com

Parametric Statistical Tests

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the Paired t TestQuantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

Regresssion

drtamil@gmail.com

Non-parametric Statistical Tests

Qualitative

Dichotomus

Quantitative Data not normally distributed Wilcoxon Rank Sum

Test or U Mann-

Whitney Test

Qualitative Quantitative Data not normally distributed Kruskal-Wallis One Qualitative

Polinomial

Quantitative Data not normally distributed Kruskal-Wallis One

Way ANOVA Test

Quantitative Quantitative Repeated measurement of the

same individual & item

Wilcoxon Rank Sign

Test

Quantitative -

continous/ordina

l

Quantitative -

continous

Data not normally distributed Spearman/Kendall

Rank Correlation

drtamil@gmail.com

Statistical Tests for Qualitative Data

Variable 1 Variable 2 Criteria Type of Test

Qualitative Qualitative Sample size > 20 dan no

expected value < 5Chi Square Test (X

2)

Qualitative Qualitative Sample size > 30 Proportionate TestQualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 30 Proportionate Test

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 40 but with at

least one expected value < 5X

2 Test with Yates

Correction

Qualitative Quantitative Normally distributed data Student's t TestQualitative

Dichotomus

Qualitative

Dichotomus

Sample size < 20 or (< 40 but

with at least one expected

value < 5)

Fisher Test

Qualitative Quantitative Data not normally distributed Wilcoxon Rank Sum

drtamil@gmail.com

Prism Hands-on Exercise

http://drtamil.me

drtamil@gmail.com

URL for data & submit answers

• Data -https://drive.google.com/file/d/0B_0qI7iLxVpmVWNXMnV3WWZMSWM/view?usp=sharing

• The analysis required http://drtamil.me/2015/02/04/uninottichallengehttp://drtamil.me/2015/02/04/uninottichallenge/ password tcr1 (exercise done at teaching computer room 1 – tcr1)

• Submit answers at this link https://docs.google.com/forms/d/1o_L7ZjXF9Q1PON2zDs_VwkKsLCHT4v-8WruXhCiVq2Q/viewform

drtamil@gmail.com

Data – Factors Related to SGA

drtamil@gmail.com

A study to identify factors that can cause small for gestational

age (SGA) was conducted. Among the factors studied were the

mothers’ body mass index (BMI). It is believed that mothers with

lower BMI were of higher risk to get SGA babies.

• 1. Create a new variable mBMI (Mothers’ Body Mass Index) from the mothers’ HEIGHT (in metre) & WEIGHT (first trimester weight in kg). mBMI = weight in kg/(height in metre)2. Calculate the following for mBMI;

– Mean

– Standard deviation

• 4. Conduct the appropriate statistical test to test whether there is any association between BMI and OUTCOME.

• 5. Conduct the appropriate statistical test to find any association between OBESCLAS (Underweight/Normal/Overweight) and BIRTHWGT.

• 6. Assuming that both variables mBMI & – Standard deviation

• 2. Create a new variable OBESCLAS (Classification of Obesity) from mBMI. Use the following cutoff point;

– <20 = Underweight

– 20 – 24.99 = Normal

– 25 or larger = Overweight

– Create a frequency table for OBESCLAS.

• 3. Conduct the appropriate statistical test to test whether there is any association between OBESCLAS (Underweight/ Normal/Overweight) and OUTCOME.

• 6. Assuming that both variables mBMI & BIRTHWGT are normally distributed, conduct an appropriate statistical test to prove the association between the two variables.

– Demonstrate the association using the appropriate chart. Determine the coefficient of determination.

• 7. Conduct Simple Linear Regression using BIRTHWGT as the dependent variable. Try to come out with a formula that will predict the baby’s birthweight based on the mother’s BMI.

– y = a + bx

drtamil@gmail.com

Online form for answers

drtamil@gmail.com

Exercise 1 & 2

• 1. Create a new variable mBMI (Mothers’ Body Mass Index) from the mothers’ HEIGHT (in metre) & WEIGHT (first trimester weight in kg). mBMI = weight in kg/(height in metre)2. Calculate the following for mBMI;– Mean

– Standard deviation– Standard deviation

• 2. Create a new variable OBESCLAS (Classification of Obesity) from mBMI. Use the following cutoff point;– <20 = Underweight

– 20 – 24.99 = Normal

– 25 or larger = Overweight

– Create a frequency table for OBESCLAS.

drtamil@gmail.com

Compute BMIDrag down to

fill up the cells

drtamil@gmail.com

Recode BMI into OBESCLAS

• Type

=IF(F2<20,"Underweigh

t",IF(F2>25,"Overweight

","Normal")) in cell G2

and press Enter.and press Enter.

• Then drag down cell G2

until G101 to fill up the

rest of the cells.

drtamil@gmail.com

Recode BMI into OBESCLAS

drtamil@gmail.com

Recode BMI into 1,2 or 3

• We should also recode BMI into numeric OBESCLAS2 for import into Prism. Prism doesn’t accept string data.accept string data.

• =IF(F2<20,“1",IF(F2>25,“3",“2")) in cell H2 and press Enter.

• Then drag down cell H2 until H101 to fill up the rest of the cells.

drtamil@gmail.com

Recode BMI into 1,2 or 3

drtamil@gmail.com

Recode BMI into OBESCLAS

• If typing logical command is not your forte, you can

just select all data, then sort the data according to

the BMI. Then drag and fill values 1, 2 or 3 beside it.

drtamil@gmail.com

Add Column Freq with Value of 1

• Just add another

column with the

variable name “FREQ”

and fill it with value of 1

from I2 to I100.from I2 to I100.

• This will help with the

pivot table exercise

later.

drtamil@gmail.com

Import Excel Data Into Prism

• Select all the data from

Excel. Copy.

• Open Prism, select

“Columns”, “Enter

replicate values..” &

click “Create”

drtamil@gmail.com

Paste Into Prism

• Click the cell between

“Group A” and row Y

and paste.

drtamil@gmail.com

Checking Normality

• Click on the “Analyze”

button.

• Select “Column

Statistics”.Statistics”.

• Select the variables

with continuous data.

• Then click “OK”.

drtamil@gmail.com

Click on the following;

• Test if the

values from

a Gaussian

distribution.

drtamil@gmail.com

Only Height is normally distributed

drtamil@gmail.com

But for the purpose of today’s exercise, we are going to ASS-U-ME that all

these continuous variables are normally distributed.

Question 1 – BMI

• Column Statistics also

generates the Mean &

S.D.;

– Mean 24.49

– S.D. 4.769

drtamil@gmail.com

Frequency Distribution

• Go back to the data by clicking on the data table on left side of screen. Then click on the “Analyze” button the “Analyze” button again.

• Select “Frequency Distribution”

• Tick on OBESCLAS2. Then click on “OK”.

drtamil@gmail.com

Frequency Distribution

• Then click on OK again.

You will get the

following frequency

distribution table.

drtamil@gmail.com

Question 2 – Obese Classification

• UW – 17%

• N – 40%

• OW – 43%

drtamil@gmail.com

Exercise 3

• 3. Conduct the appropriate statistical test to test whether there is any association between OBESCLAS

SGA Normal TOTAL

UnderW

Normal

OverW

TOTAL 50 50 100between OBESCLAS (Underweight/Normal/Overweight) and OUTCOME.

• Therefore most suitable analysis is Pearson Chi-square.

TOTAL 50 50 100

drtamil@gmail.com

Variable 1 Variable 2 Criteria Type of Test

Qualitative Qualitative Sample size > 20 dan no

expected value < 5Chi Square Test (X2)

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 30 Proportionate Test

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 40 but with at

least one expected value < 5X

2 Test with Yates

Correction

Qualitative Quantitative Normally distributed data Student's t TestQualitative

Dichotomus

Qualitative

Dichotomus

Sample size < 20 or (< 40 but

with at least one expected

value < 5)

Fisher Test

Qualitative Quantitative Data not normally distributed Wilcoxon Rank Sum

Pivot Table in Excel

• Click on “Insert”, “Pivot

Table” in Excel.

drtamil@gmail.com

• Select all your earlier

Excel data.

Pivot Table

• On the right side of the screen, pull FREQ into values, OBESCLAS into row labels and OUTCOME into column labels.

• Now select the created contingency table (excluding the “Grand Total”), and copy it using Ctrl-C.

drtamil@gmail.com

Paste Pivot Table Into Prism

• Click “New”, “New Data Table”.

drtamil@gmail.com

Table”.

• Select “Contingency”, “Start with an empty table”.

• Then paste the pivot table into Prism.

The Pasted Pivot Table

drtamil@gmail.com

Chi-Square Analysis

• Click on “Analyze”, “Contingency

table analysis”, then “Chi-

square”, then OK again twice.

drtamil@gmail.com

Chi-Square Results from Prism

Normal Overweight Underweight0

20

40

60

Contingency

Fre

qu

en

cy

Normal

SGA

• Prism only states that there is a significant association (p < 0.0001) between mother’s weight classification and small for gestational age.

• But it doesn’t show which group has the higher rate of SGA.

drtamil@gmail.com

Normal Overweight Underweight

Mothers' Weight Classification

Combine Results From Excel & Prism

• There is a significant difference (p<0.0001) of SGA rates

between underweight, normal and overweight mothers.

• Underweight mothers has a higher rate (94%) of SGA,

compared to normal mothers (58%) and overweight

mothers (26%).

drtamil@gmail.com

Underweight vs Normal?

drtamil@gmail.com

• There is a significant difference (p<0.01) of SGA rates between underweight and normal mothers.

• Underweight mothers has a significantly higher rate (94%) of SGA, compared to normal mothers (58%).

Question 3

drtamil@gmail.com

Question 3

drtamil@gmail.com

Exercise 4

• 4. Conduct the appropriate statistical test to test whether there is any association between BMI and OUTCOME.

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the Paired t TestBMI and OUTCOME.

• Basically we are comparing the mean BMI of SGA mothers against BMI of Normal mothers.

• Therefore the appropriate test is Student’s t-test.

drtamil@gmail.com

Quantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

Regresssion

Copy BMI Column Into Prism

• Click “New”, “New Data Table”.

drtamil@gmail.com

Table”.

• Select “Column”, “Enter replicate values into stacked columns”.

• Then paste the BMI of SGA mothers into column A & BMI of Normal mothers into column B.

The Pasted BMI Data

drtamil@gmail.com

Student’s T-Test

• Click on “Analyze”, “Column

analysis”, then “t-tests”, then

OK again.

drtamil@gmail.com

• Tick “Unpaired”, “Yes, parametric”, then “equal SDs”, then OK again.

T-Test Results from Prism

• Prism states that there is a significant mean difference of BMI (p < 0.0001) between SGA mother’s (22.52) and normal mothers (26.46). normal mothers (26.46). Therefore mean BMI of SGA mothers is significantly lower than the normal mothers.

• And it also proves that there is equal variances of the two means.

drtamil@gmail.com

BM

I

drtamil@gmail.com

Question 4

drtamil@gmail.com

Question 4

drtamil@gmail.com

Exercise 5

• 5. Conduct the appropriate statistical test to find

any association between OBESCLAS

(Underweight/Normal/Overweight) and

BIRTHWGT.BIRTHWGT.

• Basically we are comparing the mean

BIRTHWEIGHT of underweight mothers, normal

weight mothers and overweight mothers.

• Therefore the appropriate test is Analysis of

Variance (ANOVA).

drtamil@gmail.com

Sort Excel Data By

BMI To Facilitate

Copy & Paste

drtamil@gmail.com

Copy Birth Weight Column Into Prism

• Click “New”, “New Data Table”.

• Select “Column”, “Enter replicate

drtamil@gmail.com

• Select “Column”, “Enter replicate values into stacked columns”.

• Then paste the babies’ birth weight of underweight mothers into column A, babies’ birth weight of normal weight mothers into column B & babies birth weight of overweight mothers in column C.

The Pasted Birth Weight Data

drtamil@gmail.com

ANOVA

• Click on “Analyze”, “Column

analysis”, then “One-way

ANOVA”, then OK again.

drtamil@gmail.com

• Tick “No matching”, “Yes, ANOVA”, then click “MultipleComparison” tab. Click OK

ANOVA – post hoc

drtamil@gmail.com

Click OK

ANOVA Results from Prism

• Prism states that there is a significant mean difference of mean birth weight (p < 0.0001) between underweight mothers’ (2.187), normal mothers ‘(2.768) & overweight mothers’(3.245).

• Unfortunately it also proves that there is unequal variances of the three means. So it fails the homogeneity of variances assumption.

drtamil@gmail.com

ANOVA Results – post hoc

• Post-hoc tests indicate there is significant difference of birth weight between ALL the three groups. Underweight mothers’ have the lowest mean birth weight of 2.187kg.

drtamil@gmail.com

3

4

5

ANOVA

h w

eig

ht

drtamil@gmail.com

Underweight Normal Overweight0

1

2

Compare Babies Birth Weight byMother's Weight

Bir

th

Question 5

drtamil@gmail.com

Question 5

drtamil@gmail.com

Exercise 6

• 6. Assuming that both variables mBMI & BIRTHWGT are normally distributed, conduct an appropriate statistical test to prove the statistical test to prove the association between the two variables.–Demonstrate the association using the

appropriate chart. Determine the coefficient of determination.

drtamil@gmail.com

Pearson Correlation

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

• mBMI and birth weight are both normally distributed

continuous data. Since the aim is to measure the

strength and direction of the association between

these two continuous variable, therefore Pearson

Correlation is the most appropriate test.

drtamil@gmail.com

continous continous & Linear

Regresssion

Copy BMI & Birth Weight Into Prism

• Click “New”, “New Data

drtamil@gmail.com

• Click “New”, “New Data Table”.

• Select “XY”, “Enter and plot a single Y value for each point”.

• Then paste the BMI into column X & BIRTHWGT into column A.

The Pasted BMI & Birth weight Data

• BMI is coded as X since

it is the risk factor.

• Birth weight is coded as

Y since it is the outcome Y since it is the outcome

of interest.

• Risk factor first, then

Outcome.

• X comes first before Y.

• Capisce? (Understand?)

drtamil@gmail.com

Pearson’s Correlation

• Click on “Analyze”, “XY

analysis”, then “Correlation”,

then OK again.

drtamil@gmail.com

• Tick “Compute r between two selected data sets”, “Yes, Pearson correlation coefficients”, then “Two-tailed”, then OK again.

Correlation Results from Prism

• Prism states that there is a significant, positive & fair (r=0.4812) correlation between mothers’ BMI and babies’ birth weight. Therefore as BMI Therefore as BMI increases, the birth weight also increases.

• 23.15% (r2=0.2315) variability of the birth weight is determined by the variability of the mothers’ BMI.

drtamil@gmail.com

3

4

5

Scatter Diagram - BMI vs Birth weight

weig

ht

drtamil@gmail.com

0 10 20 30 40 500

1

2

BMI

Bir

th

Question 6

drtamil@gmail.com

Question 6

drtamil@gmail.com

Exercise 7

• 7. Conduct Simple Linear Regression using BIRTHWGT as the dependent variable. Try to come out with a formula that will predict the baby’s formula that will predict the baby’s birth weight based on the mother’s BMI. –y = a + bx

drtamil@gmail.com

Simple Linear Regression

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

• mBMI and birth weight are both normally distributed

continuous data. Since the aim is to come out with a

regression formula between these two continuous

variable, therefore Simple Linear Regression is the

most appropriate test.

drtamil@gmail.com

continous continous & Linear

Regresssion

Reuse BMI & Birth weight Data

• BMI is coded as X since

it is the risk factor.

• Birth weight is coded as

Y since it is the outcome Y since it is the outcome

of interest.

• Since the SLR uses the

same variables, we will

reuse the XY table from

Exercise 6.

drtamil@gmail.com

Simple Linear Regression

• Click on “SLR” icon, it is just above the “Analyze” icon.

drtamil@gmail.com

• Just change the range so that the line will start at the y axis (X=0).

• We can set the line to end at the maximum value (it is X=41 in this exercise).

Click OK

SLR Results from Prism

• Prism states that there is a

significant regression

coefficient (b=0.07323).

• The constant (a) is 1.081

• 23.15% (r2=0.2315)

variability of the birth variability of the birth

weight is determined by the

variability of the mothers’

BMI.

• BW = 1.081 + 0.073BMI

• For every increase of BMI of

1 unit, BW increases 0.07kg.

drtamil@gmail.com

3

4

5

Scatter Diagram - BMI vs Birth weight

weig

ht

drtamil@gmail.com

0 10 20 30 40 500

1

2

BMI

Bir

th

Question 7

drtamil@gmail.com

Question 7

drtamil@gmail.com

Slight difference of the constant value. Prism calculated

1.081 instead of 1.079. Maybe it was due to decimal

difference of the BMI upon import.

Question 7Question 7

drta

mil@

gm

ail.co

m

Birth weight

The End

TQ to Dr. Sue-Mian Then for

challenging me to teach

Graphpad Prism 6.

drtamil@gmail.com