QUIZ 2 POSTPONED TO NOV. 12. GRAPHICALLY EXPLORING DATA USING CENTRALITY AND DISPERSION Why explore...

download QUIZ 2 POSTPONED TO NOV. 12. GRAPHICALLY EXPLORING DATA USING CENTRALITY AND DISPERSION Why explore data? 1. Get a general sense or feel for your data.

If you can't read please download the document

description

+ = 12, 19, 17, 14, = DATA BUGS ARE A HAZZARD: KNOW WHAT'S IN YOUR DATA!

Transcript of QUIZ 2 POSTPONED TO NOV. 12. GRAPHICALLY EXPLORING DATA USING CENTRALITY AND DISPERSION Why explore...

QUIZ 2 POSTPONED TO NOV. 12 GRAPHICALLY EXPLORING DATA USING CENTRALITY AND DISPERSION Why explore data? 1. Get a general sense or feel for your data. 2.Determine if distribution is normal, skewed, kurtotic, or multi-modal (more on this soon). 3.Identify outliers 4.Identify possible data entry errors + = 12, 19, 17, 14, = DATA BUGS ARE A HAZZARD: KNOW WHAT'S IN YOUR DATA! Normally Distributed Data Set SPSS output: Note similarity between mean, median, mode Skewed Distribution Positive SkewNegative Skew Possible "floor effect" Possible "ceiling effect" Kurtosis Positive kurtosis, leptokurotic Negative kurtosis, platykurotic "Normativity bias?" DV doesn't discriminate IV wasn't impactful Distinctiveness bias? IV and/or DV too ambiguous Population too diverse Problems? Neuroticism MeasureDrinks Per Week Bimodality Note: What clues in statistics output that the distribution may be bimodal? Bimodality suggests 2 (or more) populations Multimodal: More than two modes. Outliers BOX AND WHISKER GRAPH Median (50 %) Top 25% Upper Quartile Lower Quartile Bottom 25% BOX AND WHISKER GRAPH, AND DATA CHECKING Detecting Skew Detecting Outliers subject number DEALING WITH OUTLIERS 1.Check raw data : Entry problem? Coding problem? 2.Remove the outlier: a.Must be at least 2.5 DV from the mean (some say 3 DV) b.Must declare deletions in pubs. c.Try to identify reason for outlier (e.g., other anomalous responses). 3.Transform data: Convert data to a metric that reduces deviation. (More on this in next slide). 4.Change the score to a more conservative one (Field, 2009): a.Next highest plus 1 b.2 SD or 3 SD above (or below) the mean. c.ISNT THIS CHEATING? No (says Field) b/c retaining score biases outcome. Again, report this step in pubs. 5. Run more subjects! Data Transformations 1.Log Transformation (log(X)): Converting scores into Log X reduces positive skew, draws in scores on the far right side of distribution. NOTE: This only works on sets where lowest value is greater than 0. Easy fix: add a constant to all values. 2.Square Root Transformation (X) : Sq. roots reduce large numbers more than small ones, so will pull in extreme outliers. 3.Reciprocal Transformation (1/X): Divide 1 by each score reduces large values. BUT, remember that this effectively reverses valence, so that scores above the mean flip over to below the mean, and vice versa. Fix: First, preliminary transform by changing each score to highest score minus the target score. Do it all at same time by 1/(X highest X). 4. Correcting negative skew: All steps work on neg. skew, but first must reverse scores. Subtract each score from highest score. Then, re-reverse back to original scale after transform completed. Comparing Two Means Dependent and Independent T-Tests Class 14 Generating AnxietyPhotos vs. Reality: Within Subjects and Between Subjects Designs Problem Statement: Are people as aroused by photos of threatening things as by the physical presence of threatening things? Hypothesis: Physical presence will arouse more anxiety than pictures. Exptl Hypothesis: Seeing a real tarantula will arouse more anxiety than will spider photos. Spider Photos WUNDT!!!! WITHIN SUBJECTS DESIGN 1.All subjects see both spider pictures and real tarantula 2.Counter-balanced the order of presentation. Why? 3.DV: Anxiety after picture and after real tarantula Data (from spiderRM.sav) Subject Picture (anx. score) Real T (anx. score) Results: Anxiety Due to Pictures vs. Real Tarantula Do the means LOOK different? Are they SIGNIFICANTLY DIFFERENT? Yes Need t-test WHY MUST WE LEARN FORMULAS? Dont computers make stat formulas unnecessary 1. SPSS conducts most computations, error free 2. In the old daysteam of 3-4 work all night to complete stat that SPSS does in.05 seconds. Fundamental formulas explain the logic of stats 1. Gives you more conceptual control over your work 2. Gives you more integrity as a researcher 3. Makes you more comfortable in psych forums ) + (+ ( X ( 5 ) X (365 X 3 y ) = BUT KNOWING FORMULA NOT ENOUGH TODDLER FORMULA Point: Knowing the formula without understanding concepts leads to impoverished understanding. Logic of Testing Null Hypothesis Inferential Stats test the null hypothesis ("null hyp.") This means that test is designed to CONFIRM that the null hyp is true. In WITHIN GROUPS t-test (AKA "dependent" t-test) null hyp. is that responses in Cond. A and in Cond. B come from same population of responses. Null hyp.: Cond A and Cond B DON'T differ. In BETWEEN GROUPS t-test (AKA "independent" t-test) null hyp. is that responses from Group A and from Group B DONT differ. If tests do not confirm the null hyp, then must accept ALT. HYPE. Alt. hyp. within-groups: Cond A differs from Cond B Alt. hyp. between-groupsGroup A differs from Group B Null Hyp. and Alt. Hyp in Pictures vs. Reality Study Within groups design : Cond. A (all subjs. see photos), then Cond. B (all subs. see actual tarantula) Null hyp? No differences between seeing photos (Cond A) and seeing real T (Cond B) Anxiety ratings Alt. hyp? There is a difference between seeing photos (Cond A) and seeing real T (Cond B) T-Test as Measure of Difference Between Two Means 1. Two data samplesdo means of each sample differ significantly? 2. Do samples represent same underlying population (null hyp: small diffs) or two distinct populations (alt. hyp: big diffs)? 3. Compare diff. between sample means to diff. wed expect if null hyp is true 4. Use Standard Error (SE) to gauge variability btwn means. a. If SE small & null hyp. true, mean diffs should be smaller b. If SE big & null hyp. true, mean diffs. should be larger 5. If sample means differ much more than SE, then either: a. Diff. reflects improbable but true random difference w/n true pop. b. Diff. indicates that samples reflect two distinct true populations. 6. Larger diffs. Between sample means, relative to SE, support alt. hyp. 7. All these points relate to both Dependent and Independent t-tests Logic of T-Test observed difference between sample means expected difference between population means (if null hyp. is true) t = SE of difference between sample means Note: Logic the same for Dependent and Independent t-tests. However, the specific formulas differ. Mean Difference Relative to SE (overlap) Small: Null Hyp. Supported Mean Difference Relative to SE (overlap) Large: Alternative Hyp. Supported S D : The Standard Error of Differences Between Means Sampling Distribution: The spread of many sample means around a true mean. SE: The average amount that sample means vary around the true mean. SE = Std. Deviation of sample means. Formula for SE : SE = s/n, when n > 30 If sample N > 30 the sampling distribution should be normal. Mean of sampling distribution = true mean. S D = Average amount Var. 1 mean differs from Var. 2 mean in Sample 1, then in Sample 2, then in Sample 3, ---- then in Sample N Note: S D is differently computed in Between-subs. designs. S D : The Standard Error of Differences Between Means TARANTULA PICTURE D MEAN MEAN (T mean P mean) Study Study Study Study Ave. 2.25 S D : The Standard Error of Differences Between Means TARANT. PICT. D D - D (D-D) 2 Sub Sub Sub Sub X Tarant. = 5.00 X Pic = 2.75 D = 2.25 (D-D) 2 =.77 S D 2 = Sum (D -D) 2 / N - 1; =.77 / 3 =.26 [VARIANCE OF DIFFS] S D = S D 2 = .26 =.51[STD. DEV. OF DIFFS] SE of D = D = S D / N =.51 / 4 =.51 / 2 =.255 [STD. ERROR OF DIFFS] t = D / SE of D = 2.25 /.255 = = Diffs Between means is 8.23 times greater than error. Note: D is the average diff. btwn Tarant mean and Pict. mean. Small S D indicates that average difference between pairs of variable means should be large or small, if null hyp true? Small S D will therefore increase or decrease our chance of confirming experimental prediction? Small Increase it. Understanding S D and Experiment Power Power of Experiment: Ability of expt. to detect actual differences. Assumptions of Dependent T-Test 1. Samples are normally distributed 2. Data measured at interval level (not ordinal or categorical) Conceptual Formula for Dependent Samples (Within Subjects) T-Test t = D D s D / N D = Average difference between mean Var. 1 mean Var. 2. It represents systematic variation, aka experimental effect. D = Expected difference in true population = 0 It represents random variation, aka the the null effect. s D / N = Standard Error of Mean Differences. Estimated standard error of differences between all potential sample means. It represents the likely random variation between means. = Experimental Effect Random Variation Within-Subjects T Test Tests Difference between Obtained Difference Between Means and Null Difference Between Means: It Tests a Difference between Differences! Mean, SD of Obtained Diff between Picture vs. Real Tarantula Mean, SD of Actual Diff of Null Effect Diff btwn Means (Effect) less than shared variance (Error) Diff btwn Means (Effect) more than shared variance (Error) Dependent (w/n subs) T-Test SPSS Output t = expt. effect / error t = X / SE t = -7 / 2.83 = SE = SD / n 2.83 = / 12 Note: Mean = mean diff pic anx - real anx. = = - 7 Independent (between-subjects) t-test 1.Subjects see either spider pictures OR real tarantula 2.Counter-balancing less critical (but still important). Why? 3.DV: Anxiety after picture OR after real tarantula Data (from spiderBG.sav) Subject Condition Anxiety Assumptions of Independent T-Test DEPENDENT T-TEST 1. Samples are normally distributed 2. Data measured at least at interval level (not ordinal or categorical) INDEPENDENT T-TESTS ALSO ASSUME 3. Homogeneity of variance 4. Scores are independent (b/c come from diff. people). Logic of Independent Samples T-Test (Same as Dependent T-Test) observed difference between sample means expected difference between population means (if null hyp. is true) t = SE of difference between sample means Note: SE of difference of sample means in independent t test differs from SE in dependent samples t-test Conceptual Formula for Independent Samples T-Test t = ( X 1 X 2 ) ( 1 2 ) Est. of SE (X 1 X 2 ) = Diffs. btwn. samples It represents systematic variation, aka experimental effect. ( 1 2 ) = Expected difference in true populations = 0 It represents random variation, aka the the null effect. Estimated standard error of differences between all potential sample means. It represents the likely random variation between means. = Experimental Effect Random Variation Computational Formulas for Independent Samples T-Tests t = X 1 X 2 2 N1N1 N2N2 ( ) s 1 s When N 1 = N 2 t = X 1 X 2 s p s p 2 + 2 n1n1 n2n2 When N 1 N 2 spsp 2 = (n 1 -1)s 1 + (n 2 -1)s n 1 + n 2 2 Weighted average of each groups SE = Independent (between subjects) T-Test SPSS Output t = expt. effect / error t = (X 1 X 2 ) / SE t = -7 / 4.16 = Note: CI crosses 0 Dependent (within subjects) T-Test SPSS Output t = expt. effect / error t = X / SE t = -7 / 2.83 = SE = SD / n 2.83 = / 12 Note: Mean = mean diff pic anx - real anx. = = - 7 Note: CI does not cross 0 Dependent T-Test is Significant; Independent T-Test Not Significant. A Tale of Two Variances Dependent T-Test Independent T -Test SE = 2.83 SE = 4.16