CS130 – Software Tools

1

CS130 – Software Tools

Fall 2010Statistics and PASW Wrap-up

CS1302

T-Test

Fall 2010

Testing the difference between the means of two samples

If those samples are taken from the same population you would anticipate that they would be largely equal

In words, this simple test is to see if the means that are observed in the two samples is equivalent to the means we would EXPECT from the two sample

This is within a standardized error amount that you might expect from any two samples

Source: geography.dur.ac.uk

Remember – assumes data is taken from a normally distributed

population

CS1303

T-Test

Fall 2010

The key concept here is that PASW tells you whether or not the difference between the means of whatever the two

conditions or groups are, is large enough to not be by chance

CS1304

Types of t-Tests

Fall 2010

All t-tests have the principle of comparison of means as their basis

In PASW, this will explain why the menu item for all t-test is called Comparing Means

There are several variants of t-tests as you have already learn

Independent Paired or Dependent One-sample

There are also several “assumption” tests that can provide a check to make sure the sample data is suitable for a parametric test such as a t-test, e.g. Levene’s Test to evaluate the equal variance, we used this for our independent t-test

CS1305

Speaking of P-Values

Fall 2010

You were introduced to P-values or Sig. (2-tailed) as a method for determining when you can reject or accept the null hypothesis

However, before we wrap up the course, you should be aware of its general purpose nature

P-values use a threshold sometimes called α, alpha

We have been using 0.05

CS1306


Fall 2010

It is important to note that the design of the study controls the alpha, we have been using 0.05 because it is common but it can be a value based on what you are trying to do

The smaller the p-value the more evidence there is against the hypothesis (in this case our null hypothesis)

If you want an even stronger case, to reject you could insist on a threshold of 0.01 or 99% probability that the result is not by chance

However… All p-values pertain

to the probability that the means of the data are different by chance

It has nothing to do with nor does it know anything about the nature of your hypothesis

CS1307


Fall 2010

The Prosecutor’s Fallacy – (Shaughnessy and Chance – 2005)

“The p-value is .001. This means that the chance is only 1 in 1000 that the null hypothesis is true”

It is the data in the sample that contains the probability, not the interpretation

Then that variable data is interpreted within the context of the hypothesis

The hypothesis is a statement of how might see the data based on the samples that we have collected

CS1308

A classic example

Fall 2010

You take 1 random coin out of your bank

You want to test the fairness of this one coin

You flip it 10 times in a row and you get heads every time

Null Hypothesis: The coin is fair and it flips honestly and independently

Observed data: In 10 tries all are heads

Now calculate the p-value

P(10H in 10)=P(H)xP(H)…xP(H)=(1/2)10 = .001

This is strong evidence that the null hypothesis can be rejected

CS1309

Introduction to Analysis of Variance

Fall 2010

And Finally, a brief introduction in another major statistical test family involving comparing an attribute of variable – this time we will look at the variance not the mean

This ANOVA or Analysis of Variance

Its here that we answer the age old question (at least a 7-week course old question)

What happens if I want to compare several independent variables to see how they interact with each other?

CS13010


Fall 2010

Like a t-test, there are many kinds of ANOVA methods – Factorial ANOVA, MANOVA, ANCOVA, and so on.

For this intro, we will just look at what you need to know to understand if you should consider investing time in understanding this method

The simplest ANOVA for example might be to compare the effects of caffeine on learning by using a placebo (Decaf…wow, that is mean) and a specific level of caffeinated beverage

CS13011


Fall 2010

How about adding more groups though as independent variables? For example the effect of caffeine and weight on learning with the control being a placebo. Now you start to leave the domain of a t-test

Analysis of Variance is just what it says, a comparison of the total variance of the data, the variance of data within each group and then a comparison of the variance of data across the groups (in our case caffeine, placebo, weight as independent, maybe test score as indicator of learning)

Useless clip art, oops

CS13012


Fall 2010

A few terms to remember…ANOVA uses the F-ratio to determine the quality of the variances.

A high F-ratio means that there is more “planned” variance then “unplanned variance or error”

And again it has a Significance value just like our t-tests

CS13013


Fall 2010

One example to consider I have created a research question…I am

interested to see if job satisfaction and gender have any influence on what type of car a person might buy

More two independent factors or variables are job satisfaction and gender, my dependent variables is car category

My null hypothesis is that there is no significant relationship between the type of car I buy and my relative job satisfaction and gender

CS13014


Fall 2010

Of course in PASW, there is no menu pick for this factor based ANOVA, they call it the General Linear Model (GLM) with univariate. Of Course!!

Or I could use a One-Way ANOVA which is found under Comparing Mean but that does not allow for two independent variables

My data was given to me in the form of a .sav file

CS13015


Fall 2010

Of course in PASW, there is no menu pick for this factor based ANOVA, they call it the General Linear Model (GLM) with univariate. Of Course!!

CS13016


Fall 2010

The results show that in fact, there is a high degree of “similiarity” in the variance between the groups of independent variables

I see this by the F-ratios I also see a very low Sig for all for car

category which means there is no probability that the variance in the data is due to chance

Therefore, I can reject my null hypothesis and say that there is a statistically significant relationship between my gender, job satisfaction and the type of car I might purchase.

CS13017


Fall 2010

One final note on the introduction This is meant to give you an additional

pathway to investigate when you have a statistical project and maybe the design of experiment is slightly more complex

You will need a fair amount of study to understand the details and proper use of ANOVA and its variants (no pun intended there

CS13018

CS130 Conclusion

Fall 2010

So, this concludes our CS130 section for the Fall.

You have covered a myriad of topics and tools Excel Equation Editor Word – Templates, Styles, Merge Powerpoint – Presenting and

Information Visualization (Tufte, Klass) PASW and Statistics All in the context of Academic

Research and Design of Experiments You should feel armed and ready to

take on interesting scholarly questions and present your important work

CS130 – Software Tools

Documents

Transcript of CS130 – Software Tools