Stats Workshop2010
description
Transcript of Stats Workshop2010
![Page 1: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/1.jpg)
MCTMathematics & Statistics
Paul Garthwaite
http://statistics.open.ac.uk/advisory.html
Introduction to Introduction to Statistical AnalysisStatistical Analysis
![Page 2: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/2.jpg)
The Scientific Method
• Deductive reasoning:– from the general to the specific ("top-
down" approach)
![Page 3: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/3.jpg)
3
Theory: In a pig’s digestive system, all phosphate ions are the same, regardless of what they
were bound with.
Theory: If you are a diabetic, losing weight will help you live longer.
![Page 4: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/4.jpg)
Study Design(deductive reasoning)
![Page 5: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/5.jpg)
5
Hypothesis testing is like a court of law: You aim to disprove the null hypothesis.
The hypothesis of a court: The person in the dock is innocent.
The aim is to gather evidence that is inconsistent with this hypothesis. We reject the hypothesis (and decide the person is guilty) if the evidence makes the hypothesis unlikely (beyond all reasonable doubt).
![Page 6: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/6.jpg)
Inductive Reasoning
• From set of specific observations to broader generalizations and theories ("bottom up" approach)
![Page 7: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/7.jpg)
7
Observational Study(inductive reasoning)
![Page 8: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/8.jpg)
8
Observational studies could feed into inductive reasoning.
Pilot studies have a place in forming hypotheses.
Some disciplines (e.g. psychology) seem to disapprove of observational studies. Presumably such studies are written up as if the hypotheses were decided before gathering the data. (A dangerous practice!)
![Page 9: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/9.jpg)
Statistical Design
• Study can be:– Observational analyse existing data (Inductive)– Experimental produce new data (Deductive)
• Relies on random sampling– Obtain information about the whole from analysing
the part (inferential statistics)
• Experimental design:– randomly allocates conditions/treatments on
subjects to observe their response
![Page 10: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/10.jpg)
Warning
Poor designs can lead to:
• Inefficient use of collected data
• Difficult statistical analysis
• Inability to draw meaningful
conclusions
![Page 11: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/11.jpg)
Use Common Sense
• Think about questions your research might answer.
• Can you gather data related to those questions?• Using common sense, would the data answer
those questions?
Pigs and phosphates: feed pigs different phosphate compounds and see if their bone strengths differ?
Diabetes and diet: use patient notes to get age at death, age at diagnosis, and weight loss in first year after diagnosis.
![Page 12: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/12.jpg)
12
• In many ways, statistics just makes common sense rigorous.
• Think about what covariates may be relevant and try to measure them (gender and age in many social contexts; smoking in medical studies; etc.)
• Try to reduce random variation.
![Page 13: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/13.jpg)
13
Gather lots of data
• A decent experiment will generally form about a quarter of a PhD (perhaps more) – four papers are enough for a PhD in most disciplines.
• Designing an experiment, collecting data, analysing it, writing a paper, revising the paper, and so on, will take several months.
• People typically do not spend enough time gathering data. The data drives the conclusions you can reach
More data = Firmer conclusions
![Page 14: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/14.jpg)
14
How much data? (My rules of thumb.)• In a controlled experiment where the quantity of
interest is a measurement, forty or so independent observations will typically enable modest-sized differences to be identified.
• With observational data and questionnaire data, gathering 150 data or more should typically be the aim: you want 25 observations in each category of interest.
• More data is needed with counts than measurements.
• More data is needed with binary quantities (yes/no; cured/not cured; success/failure) than with Likert scores.
![Page 15: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/15.jpg)
15
Questionnaires
Likert scales are good:
strongly weakly indifferent/ disagree/ strongly agree/ agree/ disagree.
Having five points on a Likert scale is often about right. Code the values as 1, 2, 3, 4, 5 and it is usually OK to treat them as measurements.
Open-ended questions are hard to analyse.
![Page 16: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/16.jpg)
Statistical Data Analysis• Turning data into information: First produce
summary statistics (means percentages, standard deviations), graphs, bar-charts, cross-tabulations.
• Try to get a feel for your data – what does it tell you? (If you feel you are non-numerate, work at becoming numerate.)
• Try to form quantitative hypotheses that you think the data will refute. (e.g. “The proportions in the ‘strongly agree’ category are the same in these two sub-populations” or “As this quantity changes, the average value of this other quantity does not change”.)
![Page 17: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/17.jpg)
17
Common fundamental statistical methods
• t-tests
• Comparison of proportions
• Contingency tables
• Regression
• Analysis of variance
It is worth knowing when these are useful.
![Page 18: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/18.jpg)
18
Regression
• In many ways regression is the most useful statistical method.
• It lets you test whether one variable affects another (while controlling for other covariates if necessary).
• It also describes the relationship.• Stepwise methods help you find/test which
variables are important.• Generalised linear models add flexibility.
survival time (weight change) .age .gendera b c d
.BMI .IHD .(blood pressure).e f g
![Page 19: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/19.jpg)
19
• There is an advisory service that can help on:
– Designing an experiment
– How to approach the analysis of data
– Choosing appropriate techniques
– Interpreting results
– Understanding outputs from statistical packages
• Too few people ask for advice before gathering data.
![Page 20: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/20.jpg)
Statistical Software
• Packages are only tools (‘number crunches’)
Most important is to choose adequatemethod for your problem
Remember:
Garbage in Garbage out
![Page 21: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/21.jpg)
Some Statistical Packages
• General software (e.g. spreadsheets)
• Specialised:– Genstat, Minitab, SAS, Statistica, – SPSS
• wide range of statistical procedures• good graphical capability• fairly easy to use (menu driven option)• Good help facility with case studies
![Page 22: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/22.jpg)
Statistics Courses
• M248: Analysing Data– Exploratory data analysis. Models for data.
Estimation. Confidence intervals. Hypothesis testing. Regression and two-variable problems. (Minitab)
• M249: Practical Modern Statistics– Medical statistics. Time series analysis.
Multivariate statistics. Bayesian methods.– Focus on applications: SPSS and WinBUGS.
![Page 23: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/23.jpg)
Statistics Courses
• M343: Applications of Probability– Models to describe patterns in time and space.
Epidemiological models. Genetics and stockmarket price applications.
• M346: Linear Statistical Modelling– ANOVA. Design of experiments. Linear
regression. Generalized linear models. Diagnostic checking. Log-linear models. (GenStat)
![Page 24: Stats Workshop2010](https://reader036.fdocuments.in/reader036/viewer/2022062307/55505222b4c905ae3f8b4701/html5/thumbnails/24.jpg)
The Stats-Advisory Service
• Drop-in sessions
– Mondays: 2:00 – 4:00 (M216)
– Thursdays: 10:30 – 12:20 (M214)
(Both in Maths and Computing Building)
• Web:– http://statistics.open.ac.uk/advisory.html
• E-mail: