Statistics Review I Class 14 WHAT WOULD YOU LIKE TO KNOW?
STATISTICS AS VOX POPULI, THE VOICE OF THE PEOPLE STATISTICAL
SKILLS AND DISCOVERY CLASS OVERVIEW Levels of Measurement
Measures of Centrality and Dispersion * Centrality (mean, median,
mode) * Dispersion (range, variance, std. deviation, std. error) *
Z scores and Z distribution Confidence Intervals Exploring Data
Sets *Reasons *Methods (histograms, features of distributions)
Dealing with Outliers LEVELS OF MEASUREMENT 1. Categorical 2.
Ordinal 3. Continuous
a.Interval b.Ratio c.Discrete Categorical Variables
1.Refer to categories:human, cat, eggplant 2. All or none:Cant be 1
third human, 2 thirds eggplant 3. Numbers serve as labels, not
values: 1 = human, 2 = eggplant 1 is not less than 2; human is not
less than eggplant 4. Common kinds of categorical variables:gender,
race, major 5. Binary:only two values: Yes/No, Day/Night,
present/absent 6. Non-Binary:Multiple values.Animal, vegetable,
mineral Democrat, Republican, Independ. 7. Nominal:Values are known
signifiers: Did Joey go potty? Yes?Was it Number 1 or Number 2? In
some sports, numbers on jerseys represent player position; e.g. 1 =
tackler, 2 = runner, etc. Ordinal Variables Numeric values refer to
the ordering of things
Rankings: 1 =First place, 2 = second place Chronology:1=occurred
first, 2 = occurred second, etc. Numeric valued DO NOT indicate how
much 1 differs from 2 Bike race:1st place (27.24); 2nd place
(27.28); 3rd place (33.10) Grant scores: winners losers CONTINUOUS
VARIABLES Interval: Most stat tests rely on interval dataEqual
intervals represent equal differences Discrete: Virtually same as
"interval" but there is a finite range of values, as in Likert
scales. How happy are you with your cell phone service? Not at all
Barely somewhat Very Greatly Ratio: Ratios of values on scale are
meaningful Must have meaningful 0 point Likert scale above NOT
ratio, b/c 2:4 1:2 Temperature, RT, number of yawns in class ARE
ratio GUESS THAT VARIABLE Example Variable 1 = female, 2 =
male
Categorical, binary 32.75 miles per gallon Ratio 1 = slightly
tired2 = moder. tired3 = very tired Interval 352 Smith Hall
Categorical, non-binary Top 4 Reasons to Learn Stats: 1.Necessary
for career2.Source of serenity Great ice-breaker4.Fun for whole
family Ordinal Distress and Disclosure: A Sample Experiment That
Never Occurred!!!
Hyp:Increased anxiety leads to disclosure. Ss see scary movie or
neutral movie. Ss asked to rate how scary they found the movie. Ss
write about thoughts and feelings movie created. Measures of
Centrality
MODE Most frequent value, occurrence MEDIAN Middle-most value; 50%
above/below MEAN Arithmetic average How many words written after
seeing scary movie? Number of words written: 2, 2, 3, 5, 8 MODE= ?
2 MEDIAN = ? 3 MEAN= ? 4 [ / 5 = 4] Relations Btwn Mean, Median,
Mode
Number of words written? ModeMedianMean N = 5:1, 2, 2, 3, 8 N =
10:1, 2, 3, 3, 3, 4, 5, 5, 6, 8 N = 20:1, 2, 2, 3, 3, 3, 4, 4, 4,
4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 8 How does change in N affect rel.
btwn Mean, Median, and Mode? If true distribution is normal, then
as sample increases mean, median, and mode converge. MEASURES OF
DISPERSON ModeMedianMean N = 20:1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5,
5, 5,5, 6, 6, 6, 7, 8 Range:Difference between highest score and
lowest score. 8 1 = 7 = range Deviation (from mean), AKA
Error:Difference between individual score and mean 8 4.35 = 3.65 =
8s deviation Sum of Squared Errors (SS):Why? To get a meaningful
index of average dispersion. 4.35 = Useless! ( )2 + (2 4.35) (7
4.35)2+(8 4.35)2= Useful! Variance and Standard Deviation
We need to get an estimate of average dispersion from mean, just
like the mean gives an estimate of average score. Variance = s2
=Average deviation in sample =SS N - 1 87= 4.58 = s Two problems
with variance: 1) units, based on sqd deviations, are not relatable
to actual scores. 2) Variance tends to be a large, unwieldy,
number. Standard Deviation = s = s =sq. root of variance = = 2.14 1
sd above and below mean = 68% of distribution 2 sd above and below
mean = 95% of distribution Z Scores and Z Distribution
Mean SD DV 1: How anxious were you during movie? discrete data DV
2:Number of words written about movie. ratio data Issue:How do we
compare anxiety with word production? Z-score conversion:Effect is
to convert different metrics into a common metric Z = X X s Sub.
24:anxious = 3; words = 22 Z_anxious = 3 = Z_words = 22 = -.58 Z
distribution is normal, mean = 0, SD = 1 SPSS:Descriptives, Save
standardized values as variables Standard Error of the Mean
Sample mean ( X ) estimates true population mean () Many sample
means from same population will vary. Standard Error of the Mean
(SE) = the average amount that sample means vary around true mean.
If n of sample mean 30, SE can be estimated based on s (std.
deviation), and sample n. Formula for SE: SE X=s/n SE Movie anxiety
study: DV = reported anxiety;n = 43, s =2.71 SE = (2.71 / 43) =
0.41 Note:SE is much smaller than SD.Why? CONFIDENCE INTERVALS
Issue:How do we know if the sample mean is a good estimate of the
true mean?In other words, how do we estimate a means accuracy?
Confidence Intervals (CI) estimate accuracy of sample means. CI
shows boundary values (highest & lowest) w/n which true mean is
likely to occur. Conventional boundary captures true mean 95% of
time. Calculation:Upper boundary = (1.96 * SE) Lower boundary =
(1.96 * SE) X X Movie anxiety study: X = 4.23, SE = 0.41 Lower CI =
(1.96 * 0.41) = 3.43 Upper CI = (1.96*.041) = 5.03 GRAPHIC
REPRESENTATION OF CI
Error bars overlap; means are likely from same distribution.
Differences are not meaninful. Error bars DONT overlap; means are
likely from different distributions Differences are meaningful
GRAPHICALLY EXPLORING DATA USING CENTRALITY AND DISPERSION
Why explore data? 1.Get a general sense or feel for your data.
Determine if distribution is normal, skewed, kurtotic, or
multi-modal (more on this soon). Identify outliers Identify
possible data entry errors DATA BUGS ARE A HAZZARD: KNOW WHAT'S IN
YOUR DATA!
= + 12, 19, 17, 14, 17, 13,17, 15 = Normally Distributed Data
Set
SPSS output: Note similarity between mean, median, mode Possible
"ceiling effect"
Skewed Distribution Positive Skew Negative Skew Possible"floor
effect" Possible "ceiling effect" Kurtosis Positive kurtosis,
leptokurotic
Neuroticism Measure Drinks Per Week Positive kurtosis, leptokurotic
Negative kurtosis, platykurotic Problems? Problems? "Normativity
bias?" DV doesn't discriminate IV wasn't impactful Distinctiveness
bias? IV and/or DV too ambiguous Population too diverse Bimodality
Note:What clues in statistics output that the distribution may be
bimodal? Bimodality suggests 2 (or more) populations
Multimodal:More than two modes. Outliers BOX AND WHISKER GRAPH Top
25% Upper Quartile Median (50 %)
Lower Quartile Bottom 25% BOX AND WHISKER GRAPH, AND DATA
CHECKING
subject number Detecting Skew Detecting Outliers DEALING WITH
OUTLIERS Check raw data: Entry problem? Coding problem?
Remove the outlier: Must be at least 2.5 DV from the mean (some say
3 DV) Must declare deletions in pubs. Try to identify reason for
outlier (e.g., other anomalous responses). Transform data:Convert
data to a metric that reduces deviation. (More on thisin next
slide). Change the score to a more conservative one (Field, 2009):
Next highest plus 1 2 SD or 3 SD above (or below) the mean. ISNT
THIS CHEATING? No (says Field) b/c retaining score biases outcome.
Again, report this step in pubs. 5.Run more subjects! Data
Transformations Log Transformation (log(X)): Converting scores into
Log X reduces positive skew, draws in scores on the far right side
of distribution. NOTE: This only works on sets where lowest value
is greater than 0.Easy fix: add a constant to all values. Square
Root Transformation (X):Sq. roots reduce large numbers more than
small ones, so will pull in extreme outliers. Reciprocal
Transformation (1/X):Divide 1 by each score reduces large
values.BUT, remember that this effectively reverses valence, so
that scores above the mean flip over to below the mean, and vice
versa. Fix:First, preliminary transform by changing each score to
highest score minus the target score.Do it all at same time by
1/(Xhighest X). 4.Correcting negative skew:All steps work on neg.
skew, but first must reverse scores. Subtract each score from
highest score. Then, re-reverse back to original scale after
transform completed.
Top Related