Numerical descriptions of distributions
-
Upload
shona-chapman -
Category
Documents
-
view
216 -
download
0
description
Transcript of Numerical descriptions of distributions
Numerical descriptions of distributions
Describe the shape, center, and spread of a distribution for shape,
see slide #6 below... Center:mean and median Spread: range, IQR,
standard deviation We treat these as aids to understanding the
distribution of the variable at hand We'll start with the mean: The
mean is often called the "average" and is in fact the arithmetic
average ("add all the values and divide by the number of
observations"). Mathematical notation:
w o ma n ( i ) h ei gh t x = 1 5 8 . 2 14 6 4 9 15 3 7 16 17 18 19
20 21 22 10 23 11 24 12 25 13 S Learn right away how to get the
mean with calculators & JMP Your numerical summary must be
meaningful!
Height of 25 women in a class The distribution of womens heights
appears symmetrical. The mean is a good numerical summary. Here the
shape of the distribution is wildly irregular. Why? Could we have
more than one plant species or phenotype? A single numerical
summary here would not make sense.
A single numerical summary here would not make sense. The Median
(M) is often called the "middle" value and is the value at the
midpoint of the observations when they are ranked from smallest to
largest value. arrange the data from smallest to largest if n is
odd then the median is the single observation in the center (at the
(n+1)/2 position in the ordering) if n is even then the median is
the average of the two middle observations (at the (n+1)/2
position; i.e., in between) In Table 1.10 (1.2,1/11), calculate the
mean and median for the 2-seater cars' city m.p.g. to see that the
mean is more sensitive to outliers than the median use JMP-get data
from the eBook Skewness SYMMETRIC SKEWED LEFT SKEWED RIGHT
(negatively) (positively)
Mode=Mean =Median SYMMETRIC Mean Mode Mode Mean Median Median
SKEWED LEFT (negatively) SKEWED RIGHT (positively) Mean and median
of a distribution with outliers
Without the outliers With the outliers Percent of people dying The
median, on the other hand, is only slightly pulled to the right by
the outliers (from 3.4 to 3.6). The mean is pulled to the right a
lot by the outliers(from 3.4 to 4.2). Mean and median of a
symmetric and a right-skewed distribution
Impact of skewed data Disease X: Mean and median are the same. Mean
and median of a symmetric Multiple myeloma: and a right-skewed
distribution The mean is pulled toward the direction of the skew.
Spread: percentiles, quartiles (Q1 and Q3), IQR,
5-number summary (and boxplots), range, standard deviation pth
percentile of a variable is a data value such that p% of the values
of the variable are less than or equal to it. the lower (Q1) and
upper (Q3) quartiles are special percentiles dividing the data into
quarters (fourths).get them by finding the medians of the lower and
upper halfs of the data IQR = interquartile range = Q3 - Q1 =
spread of the middle 50% of the data.IQR is used with the so-called
1.5*IQR criterion for outliers - know this! Measure of spread: the
quartiles
The first quartile, Q1, is the value in the sample that has 25% of
the data less than or equal to it ( it is the median of the lower
half of the sorted data, excluding M). The third quartile, Q3, is
the value in the sample that has 75% of the data less than or equal
to it ( it is the median of the upper half of the sorted data,
excluding M). Q1= first quartile = 2.2 M = median = 3.4 Q3= third
quartile = 4.35 Five-number summary and boxplot
Largest = max = 6.1 BOXPLOT Q3= third quartile = 4.35 M = median =
3.4 Q1= first quartile = 2.2 Five-number summary: min Q1 M Q3 max
Smallest = min = 0.6 Boxplots for skewed data
Comparing box plots for a normaland a right-skewed distribution
Boxplots remain true to the data and depict clearly symmetry or
skew. 5-number summary: min. , Q1, median, Q3, max
when plotted, the 5-number summary is a boxplot we can also do a
modified boxplot to show outliers (mild and extreme).Boxplots have
less detail than histograms and are often used for comparing
distributions e.g.,Fig. 1.19, p.37 and below... Figure 1.19
Introduction to the Practice of Statistics, Sixth Edition 2009 W.H.
Freeman and Company Distance to Q3 7.9 4.35 = 3.55 Interquartile
range Q3 Q1
8 Distance to Q3 7.9 4.35 = 3.55 Q3 = 4.35 Interquartile range Q3
Q1 4.35 2.2 = 2.15 Q1 = 2.2 Individual #25 has a value of 7.9
years, which is 3.55 years above the third quartile. This is more
than years, 1.5 * IQR. Thus, individual #25 is an outlier by our
1.5 * IQR rule. Definition, pg 4041 2009 W.H. Freeman and
Company
Introduction to the Practice of Statistics, Sixth Edition 2009 W.H.
Freeman and Company Be sure you know how to compute the
standard
Look at Example 1.19 on page 41 (1.2, 8/11) see Fig for a graph of
deviations from the mean...metabolic rates for 7 men in a dieting
study:1792, 1666, 1362, 1614, 1460, 1867, Mean=1600 cals., s=
calories. Figure 1.21 Introduction to the Practice of Statistics,
Sixth Edition 2009 W.H. Freeman and Company Be sure you know how to
compute the standard deviation with JMP since its almost never done
by hand with the previous pages formula... Put the metabolic rates
into a JMP table and analyze why do we square the deviations
why do we square the deviations? - two technical reasons that we'll
see when we discuss the normal distribution in the next section why
do we use the standard deviation (s) instead of the variance (s2)?
s2has units which are the squares of the original units of the data
why do we divide by n-1 instead of n?n-1 is called the number of
degrees of freedom; since the sum of the deviations is zero, the
last deviation can always be found if we know n-1 of them which
measure of spread is best? 5-number summary is better than the mean
and s.d. for skewed data - use mean & s.d. for symmetric data
What should you use, when, and why?
$$$ Arithmetic mean or median? Middletown is considering imposing
an income tax on citizens. City hall wants a numerical summary of
its citizens income to estimate the total tax base. In a study of
standard of living of typical families in Middletown, a sociologist
makes a numerical summary of family income in that city. Mean:
Although income is likely to be right-skewed, the city government
wants to know about the total tax base. Median: The sociologist is
interested in a typical family and wants to lessen the impact of
extreme incomes. Finish reading section 1.2
Be sure to go over the Summary at the end of each section and know
all the terminology Do # 1.56, , 1.67, 1.69, (Mean/Median Applet),
1.78, 1.79 use JMPfor any problem requiring more than very simple
computations