BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?

Post on 28-Dec-2015

225 views 1 download

Tags:

Transcript of BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?

BIOSTAT - 2

• The final averages for the last 200 students who took this course are

Are you worried?

90 80 76 84 53 58 68 73 92 7079 63 82 80 93 80 73 50 74 50100 53 57 50 65 72 89 81 98 8678 51 52 92 61 61 57 84 81 5655 63 90 94 63 94 56 74 90 9890 85 82 59 51 54 57 81 86 7393 61 50 67 85 52 61 81 82 9481 75 50 81 69 73 68 91 65 7676 69 97 66 73 53 80 63 75 7498 77 60 59 57 90 91 85 83 5178 79 79 74 90 94 87 75 74 7955 63 89 87 71 53 67 54 77 5767 57 53 52 94 76 60 80 72 7464 63 69 66 92 83 51 95 65 9760 72 50 89 51 95 60 67 59 8482 87 68 68 90 79 92 95 83 6352 56 86 53 61 61 63 82 87 7186 54 73 88 92 70 79 91 79 8979 65 97 51 52 54 71 57 69 8474 65 52 90 71 83 79 85 89 57

BIOSTAT - 2

• Why not sort grades from highest to lowest [ordered array]

• Is this a more meaningful way to present the data?

100 92 87 82 79 74 69 63 57 5398 92 87 82 79 74 68 63 57 5398 91 87 82 79 73 68 63 57 5298 91 86 82 79 73 68 63 57 5297 91 86 81 78 73 68 63 57 5297 90 86 81 78 73 67 61 57 5297 90 86 81 77 73 67 61 57 5295 90 85 81 77 73 67 61 56 5295 90 85 81 76 72 67 61 56 5195 90 85 81 76 72 66 61 56 5194 90 85 80 76 72 66 61 55 5194 90 84 80 76 71 65 60 55 5194 90 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5093 89 83 79 74 70 65 59 54 5093 89 83 79 74 70 64 59 53 5092 89 83 79 74 69 63 59 53 5092 88 83 79 74 69 63 58 53 5092 87 82 79 74 69 63 57 53 50

BIOSTAT - 2

• Why not group the data into grades of A, B, C, D, and F [frequency distribution]

• That means we need to count the number of grades between 90 and 100, 80 and 89, etc.

• Go to “Tools”, “Data Analysis (might have go to Tools, Add-Ins, and click on the 2 Data Analysis modules), Histogram, and follow directions.

BIOSTAT - 2

• Input range: sweep all your data• Bin range: sweep the cell boundaries you

input somewhere on your spreadsheet – cell widths should normally be equal.

• Now click on Cumulative % and Chart Output [this will plot your histogram]

• OK

5060708090100

BIOSTAT - 2

• Output:

• Histogram does not look right?

Bin Frequency Cumulative %50 6 3.00%60 43 24.50%70 36 42.50%80 45 65.00%90 45 87.50%100 25 100.00%

More 0 100.00%

Histogram

0

50

Bin

Frequency

0.00%100.00%200.00%

Frequency

Cumulative %

BIOSTAT - 2

• Fix histogram by eliminating gaps between cells.

• Find “format data series” and “gap width”. How you do this depends on version of Excel you have. Note angle on labels for X-axis.

Histogram

0

50

Bin

Frequency

0.00%100.00%200.00%

Frequency

Cumulative %

BIOSTAT - 2

• Unfortunately grades of 50 were not included in cells 50-59. That’s because Excel counts based on the following

Actual Cell Bin Frequency Cumulative %< 50 50 6 0.03

> 50 - 60 60 43 0.245> 60 - 70 70 36 0.425> 70 - 80 80 45 0.65> 80 - 90 90 45 0.875> 90 - 100 100 25 1

More 0 1

Bins5060708090100

BIOSTAT - 2

• Following bins seem to work

100 92 87 82 79 74 69 63 57 5398 92 87 82 79 74 68 63 57 5398 91 87 82 79 73 68 63 57 5298 91 86 82 79 73 68 63 57 5297 91 86 81 78 73 68 63 57 5297 90 86 81 78 73 67 61 57 5297 90 86 81 77 73 67 61 57 5295 90 85 81 77 73 67 61 56 5295 90 85 81 76 72 67 61 56 5195 90 85 81 76 72 66 61 56 5194 90 85 80 76 72 66 61 55 5194 90 84 80 76 71 65 60 55 5194 90 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5093 89 83 79 74 70 65 59 54 5093 89 83 79 74 70 64 59 53 5092 89 83 79 74 69 63 59 53 5092 88 83 79 74 69 63 58 53 5092 87 82 79 74 69 63 57 53 50

Actual Grades Actual Cells Bin Frequency Cumulative %0-49 < 49.9 49.9 0 0

50 - 59 >49.9 - 59.9 59.9 45 0.22560 - 69 >59.9 - 69.9 69.9 38 0.41570 - 79 >69.9 - 79.9 79.9 42 0.62580 - 89 >79.9 - 89.9 89.9 42 0.83590 - 100 >89.9 - 100 100 33 1

More 0 1

BIOSTAT - 2

• Final frequency table and histogram

Histogram

0

50

Bin

Frequency

0.00%100.00%200.00%

Frequency

Cumulative %

Actual Grades Frequency Relative Frequency Percent50 - 59 45 0.225 22.5%60 - 69 38 0.19 19.0%70 - 79 42 0.21 21.0%80 - 89 42 0.21 21.0%90 - 100 33 0.165 16.5%Total = 200 1 100.0%

BIOSTAT - 2

• Other statistical software will do the same thing, but you should always try out a small test case of data just to make sure that data is being placed into the proper cells.

BIOSTAT - 2

• Some key decisions:– How many cells should you have [we had 5 cells in

this example]. In general, you would have between 5 and 25 cells. The more data you have, the more cells you would want to use.

– How do you determine the Bin Ranges? Most statistical software will determine these bin ranges for you, but they might not be “neat” numbers. In this case, if you did not input specific bin ranges, you would get

Bin Frequency50 6

62.5 4975 53

87.5 53More 39

BIOSTAT - 2

• Problems– Work problems 2.3.1and 2.3.5– Look at data for problems 2.3.6 and 2.3.9

BIOSTAT - 2

• Numerical Techniques:– Measures of Central Tendency [Location]

• Arithmetic Mean• Median• Mode

• Measures of Dispersion [Variability]– Range– Variance– Standard Deviation

Measures of Central Location…

• The arithmetic mean, a.k.a. average, shortened to mean, is the most popular & useful measure of central location.

• It is computed by simply adding up all the observations and dividing by the total number of observations:

Sum of the observationsNumber of observations

Mean =

Arithmetic Mean…

Population Mean Sample Mean

Measures of Central Location…

• The median is calculated by placing all the observations in order; the observation that falls in the middle is the median.

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)Sort them bottom to top, find the middle:0 0 5 7 8 9 12 14 22

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)Sort them bottom to top, the middle is thesimple average between 8 & 9:0 0 5 7 8 9 12 14 22 33median = (8+9)÷2 = 8.5

Measures of Central Location…

• The mode of a set of observations is the value that occurs most frequently.

• A set of data may have one mode (or modal class), or two, or more modes. If no values occur more than one time each, it is said that the data has no mode.

Measures of Variability…

• Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value?For example, two sets of

class grades are shown. The mean (=50) is the same in each case…

But, the red class has greater variability than the blue class.

Range…• The range is the simplest measure of variability,

calculated as:• Range = Largest observation – Smallest

observation• E.g.• Data: {4, 4, 4, 4, 50} Range = 46• Data: {4, 8, 15, 24, 39, 50} Range = 46

Variance…• Variance and its related measure, standard deviation, are

arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.

• Population variance is denoted by• (Lower case Greek letter “sigma” squared)

• Sample variance is denoted by• (Lower case “S” squared)

Statistical Symbols

Population

Sample

Size N n

Mean

Variance

Variance

• Population Variance:

• Sample Variance:

Sample Mean & Variance…Sample Mean

Sample Variance

Sample Variance (shortcut method)

Standard Deviation…

• The standard deviation is simply the square root of the variance, thus:

• Population standard deviation:

• Sample standard deviation:

Excel Computations from Previous Data

• Data:100 92 87 82 79 74 69 63 57 5398 92 87 82 79 74 68 63 57 5398 91 87 82 79 73 68 63 57 5298 91 86 82 79 73 68 63 57 5297 91 86 81 78 73 68 63 57 5297 90 86 81 78 73 67 61 57 5297 90 86 81 77 73 67 61 57 5295 90 85 81 77 73 67 61 56 5295 90 85 81 76 72 67 61 56 5195 90 85 81 76 72 66 61 56 5194 90 85 80 76 72 66 61 55 5194 90 84 80 76 71 65 60 55 5194 90 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5194 89 84 80 75 71 65 60 54 5093 89 83 79 74 70 65 59 54 5093 89 83 79 74 70 64 59 53 5092 89 83 79 74 69 63 59 53 5092 88 83 79 74 69 63 58 53 5092 87 82 79 74 69 63 57 53 50

Excel Computations from Previous Data

• Formulas:

• Results:

• Work Problem 2.5.7

Mean = =AVERAGE(A1:J20)Median = =MEDIAN(A1:J20)

Mode = =MODE(A1:J20) [Excel will show only one mode, if you have more than one mode]Variance = =VAR(A1:J20)Std. Dev. = =STDEV(A1:J20)

Mean = 73.11Median = 74

Mode = 79Variance = 200.62Std. Dev. = 14.16