Hss2381A – stats... or whatever Univariate Analysis, part 1 Descriptive Statistics.
HSS2381A – Quantitative Methods in Health Sciences I Professor Raywat Deonandan [email protected]...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of HSS2381A – Quantitative Methods in Health Sciences I Professor Raywat Deonandan [email protected]...
HSS2381A – Quantitative Methods in Health Sciences I
Professor Raywat [email protected] 43 Templeton, Room 111
Class website:
Eventually, all materials will be on Virtual Campus.
However, since the University I.T. Department is run by monkeys, for the time being, I will be using my own server temporarily:
Classes.deonandan.com
Lectures
• Mondays 11:30-1pm• Thursdays 1:00-2:30pm• SMD224
• You are responsible for all material covered in the lectures, whether or not it appears in the slides– i.e., take your own notes and don’t get lazy
Lectures
• There are many sections of HSS2381, and ultimately we try to cover the same thing
• But the different sections are not interchangeable
• And our exams and assignments will also be different
• i.e., feel free to study with students from other classes, but they may employ different textbooks, methods, etc.
Recommended (but not required):
A Introduction to Statistics for Canadian Social Scientists by Michael Han, published by Oxford Press
TEXTBOOKS
Required:
Data-Analysis & Statistics for Nursing Research by Denise F. Polit, published by Appleton & Lange, Stamford, Connecticut, USA (second edition).
$73.98 + tax
Agora Books145 Besserer
Item Marks
Data analysis assignment #1 15%
Data analysis assignment #2 15%
Midterm exam 35%
Final exam 35%
Total 100%
Oct 20
Nov 28
Nov 3
Dec 9-22
Evaluation:
Lectures
• Note that I do not take attendance • Attendance in lectures in labs is voluntary
(hey, you’re all grown-ups)• But whether or not you attend, you are still
responsible for what is covered in class and in labs
Labs
• Each group of 20 has a one-hour lab on Wednesday morning (MNT 140)
• Each lab will be supervised by Teaching Assistant Armin Yazdani ([email protected])
Labs
• The purpose of the labs is to:– Introduce you to using computers to do basic
statistics– Give you protected time to work on your
homework and assignments– Allow you to approach the TA to go over anything
that is unclear from the lectures
Labs
• In a few of the labs, the TA will have you do specific exercises
• In others, you will have free time to explore on your own
• Please be respectful of the TA and others, and not use the time for socializing or for activities unrelated to this class
Exams
• Both exams (midterm and final) will be entirely multiple choice
• The final exam will NOT be cumulative, but will only cover material since the midterm
Assignments
• There will be TWO assignments, to be completed individually (not in groups)
• They can be completed either by using a computer or by hand.
• Details about the assignment will be posted soon
Contacting Me
• Of course, I am willing and eager to speak to you about anything
• However, I’m pretty hard to get hold of at times
• So.....
Contacting Me
• For issues relating to the course, especially regarding course content or issues regarding marking, please contact the TA first:– Tiffany will available via email and during office
hours (to be posted soon)– Armin will be available via email and during the
lab time
Contacting Me
• I don’t maintain regular office hours, but I try my best to be in my office on Mondays from 2-4pm
• It’s best to email me for an appointment
Rules of Engagement
• I don’t require attendance• But if you do come, please pay attention
Rules of Engagement
• I do not negotiate marks• TA’s are instructed to not change marks for
any reason except when there has been a clear error in marking
• All suspected cases of academic fraud are reported to the Dean’s office – this includes cheating on exams and collaborating
on assignments (in cases where that has been prohibited)
Rules of Engagement• The 2 exams and 2 assignments are the only ways
to earn marks... This means no make-up assignments if you’re doing poorly
• The only acceptable excuses for missing an exam or for submitting a late assignment (without penalty) are:– Medical (with documentation)– Family or personal tragedy (with evidence)
• This means that the demands of your vacation plans, sporting events and part-time job are not acceptable reasons
As a Result...
• I won’t be here Thursday Sep 22• You will receive a guest lecture that day by the
TA’s
• I will be here on Monday Nov 28• However, the TA’s will also be giving that day’s
lecture
Let’s Review
• I hate statistics
• Anyone else?
And yet I have a PhD in Biostatistics.
So what does this tell us?
The Power of Statistics
• If you really understand statistics, then you really understand the fundamentals of modern scientific research
• Gives you a grounding to assess the quality of pretty much any quantitative statement– Never be manipulated again!
The Origin of Statistics
• What we would call modern statistics began in the 1700s
• “statistics” = accounts of the “state”
• Obviously, the use of population data goes back centuries before
The Origin of Statistics
• There has been a revolution in last 200 years or so...– There was a further computer revolution in past
50 years, that has allowed for rapid advances in multivariable techniques
• Statistics has become one of the foundations of all quantitative sciences
• It’s one of the defining tools of population health, especially epidemiology
Statistics is the term for a collection of mathematical methods of organizing, summarizing, analyzing, and interpreting information gathered in a study
What is statistics?
Math vs Statistics
• Is there a difference?
220170120
weight (lbs)
Weights of members of 2000 U.S. Men’s Olympic Rowing team
Data vs Information
• 50, 52, 56
• The ages of Barack Obama, Stephen Harper and Nicolas Sarkozy
What is Measurement?
• One definition: assigning a quantity to a quality– E.g. How old are you? 25– E.g. What’s your gender? female
What is a Variable?
• A value that may change within the scope of a problem or situation (vs a “constant”)
• A logical set of attributes (gender, age, etc)
• A symbolic name given to an unknown quantity
Math
Research
Computers
What is a Variable?
• “x”
• Age
• A$
Math
Research
Computers
Relationships Between Variables
• In research, we can focus on just one variable
• Or we can try to describe relationships between 2 or more variables
What is the average age of students in this classroom?
In this classroom, is the average age of women different from the average age of men?
Relationships Between Variables
• In math, we write the relationship between 2 variables as a “function”:
e.g. F(x) = 210 - x
(Maybe this is the relationship between age and maximum attainable heart rate)
F(x) = max heart rate = HR x = age
Relationships Between Variables
HR = 210 - x
IndependentDependent
IndependentDependent
Relationships Between Variables
HR = 210 - x
Epidemiology:ExposureOutcome
ExposureOutcome
Relationships Between Variables
Cancer rate = 210 - smoking
Epidemiology:ExposureOutcome
ExposureOutcome
Two Flavours of Variables
• Continuous
• Categorical (also called “Discrete”)
Age, height, distance, temperature...
Age group, gender, number of siblings, citizenship, race...
Most Common Type of Categorical
• Dichotomous– Meaning “having two levels”– E.g., sex
• “Dichotomize”– Convert “age” to “under 40” and “over 39”
Levels of Measurement
• Level of Measurement: A system of classification with four types of measurement rules that affect the kind of statistical analysis that is appropriate:– Nominal– Ordinal– Interval– Ratio
Nominal Measurement• Think of “name” when you think of “nominal”• Nominal Measurement:
– Lowest form of measurement– Numbers are used simply as labels to name categories
• E.g. Assigning 2 arbitrary numbers to code for sex: 0=male, 1=female
• It does not matter what the codes are, the numbers have no quantitative meaning
• Therefore we can’t treat these arbitrary numbers like we would any other numbers in math– E.g. in class we have 30 men (all coded “0”) and 70 women
(all coded “1”). Average score is 0.7… which means nothing
Ordinal Measurement• Ordinal Measurement:
– Uses numbers to designate ordering on an attribute– Conveys some information about amount– But does not indicate distance between values
• Example: Degree of pain 1 = None 2 = Some 3 = A lot
– Pain of 1.7 means nothing
distances are not equal, and are not knownAverages do not make sense
Interval Measurement• Interval Measurement:
– Also uses numbers to designate ordering on an attribute and conveys information about amount
– Distance between values are assumed to be equal– Averages can be computed
• Example: Ambient temperature (Fahrenheit) |___|___|___|___|___|___|___|___|___|___| 70 71 72 73 74 75 76 77 78 79 80
The difference between 70 and 75 degrees is the same as the difference between 75 and 80 degrees
• Note: The term “interval” measurement is used in the textbook, but I don’t encounter it often in real life. Usually, we just call this a continuous variable and be done with it.
Ratio Measurement• Ratio Measurement:
– Uses numbers to designate ordering, conveys information about amount, distances are equal
– AND there is a real, rational zero – Averages can be computed
• Example: Medication dose (e.g., number of milligrams, number of pills)
• Note: The term “ratio” measurement is used in the textbook, but I don’t encounter it often in real life. Usually, we just call this a continuous variable and be done with it.
Levels of Measurement• At each successive measurement level, there is more
information, and greater analytic flexibility
• If you start with ratio measures, you can collapse information to a lower-level measure, but the reverse is not true
– i.e. you can “dichotomize” a continuous variable, but you can’t turn a dichotomous variable into a continuous one.
• Higher-level scales are usually (though not always) preferred
– Moving from continuous to ordinal causes us to lose information, but it’s often done for convenience
• E.g. age age group
Comparison of Levels
Nominal Ordinal Interval Ratio
Classification ✓ ✓ ✓ ✓
Magnitude ✓ ✓ ✓
Equal Interval ✓ ✓
True Zero ✓
Math Permissible Count Count, Rank
Count, Rank, Add, Subtract
Count, Rank, Add,
Subtract, Multiply, Divide
Sampling
POPULATION(also called “REFERENCE POPULATION”)
sample
Sampling
POPULATION = Students at U of O
Sample = this class
If I compute the average number of women in this class, I can generalize to the whole university.
sample
Sampling Bias
Is the sample “representative”?
Sampling
• Target (or reference) population is group of individuals to which one wishes to generalize findings.
• Accessible population is portion of target population that has chance of being selected. (Also called “Study population”)
• Sample is selected from accessible population.
Sampling
• There’s also something called a “Sampling Frame” that is not discussed in the textbook
• Sampling Frame is a subset of the Accessible Population, from which the Sample is taken
Target and accessible populations
(ref pop)
(accessible pop)
We’ll do more about sampling later....
Descriptive Stats vs Inferential Stats
• Descriptive statistics describe and summarize data about the sample– E.g. Average age of the women in THIS CLASS (vs.
The women in the whole university)
• Inferential statistics attempt to make conclusions about the reference population from examining the sample, based upon the Laws of Probability
Parameter vs Statistic
• A “statistic” is collected about a sample• A “parameter” is collected about the
reference population
Average daily calories consumed by all children in Toronto = parameter
Average daily calories consumed by 300 children in one school district in Toronto = statistic
Ultimately, we use the statistic to estimate the parameter
Statistical Programs
• There are scores of programs out there, each has strengths and weaknesses
• In this class, we will introduce you to the basic stats in MicroSoft Excel and dabble in SPSS
• Other options that may be available to you in the lab are SAS, S+ and R
That is All
• See you Monday• No homework this week