CS130 – Software Tools
description
Transcript of CS130 – Software Tools
1
CS130 – Software Tools
Fall 2010Statistics and PASW Wrap-up
CS1302
T-Test
Fall 2010
Testing the difference between the means of two samples
If those samples are taken from the same population you would anticipate that they would be largely equal
In words, this simple test is to see if the means that are observed in the two samples is equivalent to the means we would EXPECT from the two sample
This is within a standardized error amount that you might expect from any two samples
Source: geography.dur.ac.uk
Remember – assumes data is taken from a normally distributed
population
CS1303
T-Test
Fall 2010
The key concept here is that PASW tells you whether or not the difference between the means of whatever the two
conditions or groups are, is large enough to not be by chance
CS1304
Types of t-Tests
Fall 2010
All t-tests have the principle of comparison of means as their basis
In PASW, this will explain why the menu item for all t-test is called Comparing Means
There are several variants of t-tests as you have already learn
Independent Paired or Dependent One-sample
There are also several “assumption” tests that can provide a check to make sure the sample data is suitable for a parametric test such as a t-test, e.g. Levene’s Test to evaluate the equal variance, we used this for our independent t-test
CS1305
Speaking of P-Values
Fall 2010
You were introduced to P-values or Sig. (2-tailed) as a method for determining when you can reject or accept the null hypothesis
However, before we wrap up the course, you should be aware of its general purpose nature
P-values use a threshold sometimes called α, alpha
We have been using 0.05
CS1306
Speaking of P-Values
Fall 2010
It is important to note that the design of the study controls the alpha, we have been using 0.05 because it is common but it can be a value based on what you are trying to do
The smaller the p-value the more evidence there is against the hypothesis (in this case our null hypothesis)
If you want an even stronger case, to reject you could insist on a threshold of 0.01 or 99% probability that the result is not by chance
However… All p-values pertain
to the probability that the means of the data are different by chance
It has nothing to do with nor does it know anything about the nature of your hypothesis
CS1307
Speaking of P-Values
Fall 2010
The Prosecutor’s Fallacy – (Shaughnessy and Chance – 2005)
“The p-value is .001. This means that the chance is only 1 in 1000 that the null hypothesis is true”
It is the data in the sample that contains the probability, not the interpretation
Then that variable data is interpreted within the context of the hypothesis
The hypothesis is a statement of how might see the data based on the samples that we have collected
CS1308
A classic example
Fall 2010
You take 1 random coin out of your bank
You want to test the fairness of this one coin
You flip it 10 times in a row and you get heads every time
Null Hypothesis: The coin is fair and it flips honestly and independently
Observed data: In 10 tries all are heads
Now calculate the p-value
P(10H in 10)=P(H)xP(H)…xP(H)=(1/2)10 = .001
This is strong evidence that the null hypothesis can be rejected
CS1309
Introduction to Analysis of Variance
Fall 2010
And Finally, a brief introduction in another major statistical test family involving comparing an attribute of variable – this time we will look at the variance not the mean
This ANOVA or Analysis of Variance
Its here that we answer the age old question (at least a 7-week course old question)
What happens if I want to compare several independent variables to see how they interact with each other?
CS13010
Introduction to Analysis of Variance
Fall 2010
Like a t-test, there are many kinds of ANOVA methods – Factorial ANOVA, MANOVA, ANCOVA, and so on.
For this intro, we will just look at what you need to know to understand if you should consider investing time in understanding this method
The simplest ANOVA for example might be to compare the effects of caffeine on learning by using a placebo (Decaf…wow, that is mean) and a specific level of caffeinated beverage
CS13011
Introduction to Analysis of Variance
Fall 2010
How about adding more groups though as independent variables? For example the effect of caffeine and weight on learning with the control being a placebo. Now you start to leave the domain of a t-test
Analysis of Variance is just what it says, a comparison of the total variance of the data, the variance of data within each group and then a comparison of the variance of data across the groups (in our case caffeine, placebo, weight as independent, maybe test score as indicator of learning)
Useless clip art, oops
CS13012
Introduction to Analysis of Variance
Fall 2010
A few terms to remember…ANOVA uses the F-ratio to determine the quality of the variances.
A high F-ratio means that there is more “planned” variance then “unplanned variance or error”
And again it has a Significance value just like our t-tests
CS13013
Introduction to Analysis of Variance
Fall 2010
One example to consider I have created a research question…I am
interested to see if job satisfaction and gender have any influence on what type of car a person might buy
More two independent factors or variables are job satisfaction and gender, my dependent variables is car category
My null hypothesis is that there is no significant relationship between the type of car I buy and my relative job satisfaction and gender
CS13014
Introduction to Analysis of Variance
Fall 2010
Of course in PASW, there is no menu pick for this factor based ANOVA, they call it the General Linear Model (GLM) with univariate. Of Course!!
Or I could use a One-Way ANOVA which is found under Comparing Mean but that does not allow for two independent variables
My data was given to me in the form of a .sav file
CS13015
Introduction to Analysis of Variance
Fall 2010
Of course in PASW, there is no menu pick for this factor based ANOVA, they call it the General Linear Model (GLM) with univariate. Of Course!!
CS13016
Introduction to Analysis of Variance
Fall 2010
The results show that in fact, there is a high degree of “similiarity” in the variance between the groups of independent variables
I see this by the F-ratios I also see a very low Sig for all for car
category which means there is no probability that the variance in the data is due to chance
Therefore, I can reject my null hypothesis and say that there is a statistically significant relationship between my gender, job satisfaction and the type of car I might purchase.
CS13017
Introduction to Analysis of Variance
Fall 2010
One final note on the introduction This is meant to give you an additional
pathway to investigate when you have a statistical project and maybe the design of experiment is slightly more complex
You will need a fair amount of study to understand the details and proper use of ANOVA and its variants (no pun intended there
CS13018
CS130 Conclusion
Fall 2010
So, this concludes our CS130 section for the Fall.
You have covered a myriad of topics and tools Excel Equation Editor Word – Templates, Styles, Merge Powerpoint – Presenting and
Information Visualization (Tufte, Klass) PASW and Statistics All in the context of Academic
Research and Design of Experiments You should feel armed and ready to
take on interesting scholarly questions and present your important work