Dr.S.Nishan Silva (MBBS)

Post on 05-Jan-2016

29 views 0 download

description

Research Statistics 2. Dr.S.Nishan Silva (MBBS). My weight. Plot as a function of time data was acquired:. Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16). Do not use curved lines to connect data points - PowerPoint PPT Presentation

Transcript of Dr.S.Nishan Silva (MBBS)

Dr.S.Nishan Silva

(MBBS)

day weight day weight day weight

1 140 31 143.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2

10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9

My weight

Plot as a function of time data was acquired:

139

140

141

142

143

144

145

146

0 10 20 30 40 50 60

Day

weigh

t (lbs

)

Do not use curved lines to connect data points – that assumes you know more about the relationship of the data than you really do

Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16)

day weight day weight day weight1 140 31 143.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2

10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9

Assume my weight is a single, random, set of similar data

0

5

10

15

20

25

Weight (lbs)

# o

f O

bse

rvat

ion

sMake a frequency chart (histogram) of the data

Create a “model” of my weight and determine averageWeight and how consistent my weight is

139

140

141

142

143

144

145

146

0 10 20 30 40 50 60

Day

weigh

t (lbs

)

0

5

10

15

20

25

Weight (lbs)

# o

f O

bse

rvat

ion

s

= measure of the consistency, or similarity, of weights

average143.11

s = 1.4 lbs

Inflection pt

s = standard deviation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

s

Am

pli

tud

e

Width is measuredAt inflection point =s

W1/2

Triangulated peak: Base width is 2s < W < 4s

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

s

Am

pli

tud

e

+/- 1s

Area +/- 2s = 95.4%

Area +/- 3s = 99.74 %

pp s~ 6

Pp = peak to peak – or – largest separation of measurements

Peak to peak is sometimesEasier to “see” on the data vs time plot

Area = 68.3%

139

140

141

142

143

144

145

146

0 10 20 30 40 50 60

Day

weigh

t (lbs

)

Peak topeak

pp s~ 6

139.5

144.9

s~ pp/6 = (144.9-139.5)/6~0.9

(Calculated s= 1.4)

0

5

10

15

20

25

Weight (lbs)

# o

f O

bse

rvat

ion

s

Inferential Statistics

Used to determine the likelihood that a conclusion based on data from a sample is true

Terms

p value: the probability that an observed difference could have occurred by chance

Standardised Normal distribution

• FormulaZ = X- µ

óZ – SNDX – variableµ Mean and ó varience

SND table of values

Regression and Correlation

• Correlation– To analyze the relationship between two

variables

• Regression– Dependant of the variable x on variable y– In this course we consider only two

- In real life, multiple variable interactions are possible.

Example : X = Height, Y = Body weight

Basic Linear regression Equation

• Equation: Y` = a + bx– b is the gradient, slope or regression

coefficient– a is the intercept of the line at Y axis or

regression constant– Y` is a value for the outcome– x is a value for the predictor (real x valye)

Correlation Coefficient

• Page 100 lower down

Correlation coefficient ranges from 0 to 1

Correlation coefficient ranges from 0 to 1

Finding the significance of “r”

• Simple correlation significance– http://www.biology.ed.ac.uk/archive/jdeacon/s

tatistics/table6.html#Correlation coefficient

• Pierson Product-moment coefficient– http://www.experiment-resources.com/pearso

n-product-moment-correlation.html

• Refferences– Best -

http://www.biology.ed.ac.uk/archive/jdeacon/statistics/tress11.html

– In detailhttp://www.statsdirect.com/help/regression_and_correlation/rcr.htm

Inferential Statistics – Page 102

• Sample statistics – “Generalized” to the entire population

• Formulate hypothesis

• ? Null Hypothesis

• Prove hypothesis

Types of Errors

Nodifference

Difference

Nodifference

TYPE IIERROR ()

Difference TYPE IERROR ()

Truth

Conclusion

Power = 1-The probability of a type 2 error)

confidence interval:

The range of values we can be reasonably certain includes the true value.

If the “probability” of the true value not being included is less than 5% we

reject the null hypothesis

Example

The Use of the Null Hypothesis

• Is the difference in two sample populations due to chance or a real statistical difference?

• The null hypothesis assumes that there will be no “difference” or no “change” or no “effect” of the experimental treatment.

• If treatment A is no better than treatment B then the null hypothesis is supported.

• If there is a significant difference between A and B then the null hypothesis is rejected...

Parametric tests

• T test Page 104

T Table

T-test

• T-test determines the probability that the null hypothesis concerning the means of two small samples is correct

• The probability that two samples are representative of a single population (supporting null hypothesis) OR two different populations (rejecting null hypothesis)

Use t-test to determine whether or not sample population A and B came from the same or different population

t = x1-x2 / sx1-sx2

x1 (bar x) = mean of A ; x2 (bar x) = mean of Bsx1 = std error of A; sx2 = std error of B

Example: Sample A mean =8Sample B mean =12Std error of difference of populations =1

12-8/1 = 4 std deviation units

Non Parametric test

• Chi Squared test – Page 108

– Test for Goodness of fit – Test of independence

Chi square

• Used with discrete values

• Phenotypes, choice chambers, etc.

• Not used with continuous variables (like height… use t-test for samples less than 30 and z-test for samples greater than 30)

• O= observed values

• E= expected values

http://course1.winona.edu/sberg/Equation/chi-squ2.gif

Interpreting a chi square

• Calculate degrees of freedom• # of events, trials, phenotypes -1• Example 2 phenotypes-1 =1• Generally use the column labeled 0.05 (which

means there is a 95% chance that any difference between what you expected and what you observed is within accepted random chance.

• Any value calculated that is larger means you reject your null hypothesis and there is a difference between observed and expect values.

How to use a chi square chart

http://faculty.southwest.tn.edu/jiwilliams/probab2.gif

T-test or Chi Square? Testing the validity of the null hypothesis

• Use the T-test (also called Student’s T-test) if using continuous variables from a normally distributed sample populations (ex. Height)

• Use the Chi Square (X2) if using discrete variables (if you are evaluating the differences between experimental data and expected or hypothetical data)… Example: genetics experiments, expected distribution of organisms.

Qualitative Analysis – Pages 113-114

• Phenomenology– Data collected using interviews, tapes etc– Analyzed as the researcher prefers– Describes using descriptive statistics

• Ethnography– Data collected using note taking, observation etc– Categorised– Relationships between patterns, identified

• Concurrent Analysis– Qualitative data is transformed to numerical data– Qualitative value may be lost

Using Excel(Example)

Microsoft Excel

• A Spreadsheet Application. It features calculation, graphing tools, pivot tables and a macro programming language called VBA (Visual Basic for Applications).

• There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007 are capable of performing a number of statistical analyses.

• Starting MS Excel: Double click on the Microsoft Excel icon on the desktop or Click on Start --> Programs --> Microsoft Excel.

• Worksheet: Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page. Each cell is referenced by its coordinates. For example, A3 is used to refer to the cell in column A and row 3. B10:B20 is used to refer to the range of cells in column B and rows 10 through 20.

Microsoft Excel

Creating Formulas: 1. Click the cell that you want to enter the formula, 2. Type = (an equal sign), 3. Click the Function Button, 4. Select the formula you want and step through the on-screen instructions.

xf

Opening a document: File Open (From a existing workbook). Change the directory area or drive to look for file in other locations.

Creating a new workbook: FileNewBlank Document

Saving a File: FileSave

Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range.

Microsoft Excel

• Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to enter in that format. For example, Excel will recognize jan 9 or jan-9 as 1/9/2007 and jan 9, 1999 as 1/9/1999. To enter today’s date, press Ctrl and ; together. Use a or p to indicate am or pm. For example, 8:30 p is interpreted as 8:30 pm. To enter current time, press Ctrl and : together.

• Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying and Ctrl+V for Pasting.

• Sorting: Data Sort Sort By …• Descriptive Statistics and other Statistical methods:

ToolsData Analysis Statistical method. If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Histograms in Excel

Select

Tools/Data Analysis

1

Choose Histogram

2

3

Input data range and bin range (bin range is a cell range containing the upper class boundaries for each class grouping)

Select Chart Output and click “OK”

Histograms in Excel(continued)

(

Microsoft Excel

Statistical and Mathematical Function: Start with ‘=‘ sign and then select function from function wizard .xf

Inserting a Chart: Click on Chart Wizard (or InsertChart), select chart, give, Input data range, Update the Chart options, and Select output range/ Worksheet.

Importing Data in Excel: File open FileType Click on File Choose Option ( Delimited/Fixed Width) Choose Options (Tab/ Semicolon/ Comma/ Space/ Other) Finish.

Limitations: Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases.

Computing the Mean

• Sum xi divide by n (or N for population mean)

• Excel– =AVERAGE(cellrange)

Computing the Mode

• Value that occurs most often in discretized data

• Excel– =MODE(cellrange)– Reports first value seen if tie

Computing the Median

• The middle value in sorted data

• Excel– =MEDIAN(cellrange)

Computing the Range

• Range is min to max values

• Excel– =MIN(cellrange)– =MAX(cellrange)

Computing the Standard Deviation

• Std. Dev. is Square-Root of Variance

• Excel– =STDEV(cellrange) - sample– =STDEVP(cellrange) - population– =VAR(cellrange) - sample– =VARP(cellrange) - population

Tables and Charts for Categorical Data: Univariate

DataCategorical

Data

Graphing Data

Pie Charts Pareto Diagram

Bar Charts

Tabulating Data

Summary Table

The Summary Table

Example: Current Investment Portfolio

Investment Amount Percentage Type (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

(Variables are Categorical)

Summarize data by category

Bar and Pie Charts

• Bar charts and Pie charts are often used for qualitative (category) data

• Height of bar or size of pie slice shows the frequency or percentage for each category

Bar Chart Example

Investor's Portfolio

0 10 20 30 40 50

Stocks

Bonds

CD

Savings

Amount in $1000's

Investment Amount PercentageType (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

Current Investment Portfolio

Pie Chart Example

Percentages are rounded to the nearest percent

Current Investment Portfolio

Savings

15%

CD 14%

Bonds 29%

Stocks

42%

Investment Amount PercentageType (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

Pareto Diagram Examplecu

mu

lative % in

vested

(line g

raph

)%

in

vest

ed i

n e

ach

cat

ego

ry

(bar

gra

ph

)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Stocks Bonds Savings CD

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Current Investment Portfolio

• Side by side bar charts

(continued)

Tabulating and Graphing Multivariate Categorical Data

Comparing Investors

0 10 20 30 40 50 60

S toc k s

B onds

CD

S avings

Inves tor A Inves tor B Inves tor C

Side-by-Side Chart Example• Sales by quarter for three sales territories:

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

http://www.bmj.com/bmj-series/statistics-notes

Best source for you…

BMJ Statistics notes…