Psyc 235: Introduction to Statistics
description
Transcript of Psyc 235: Introduction to Statistics
Psyc 235:Introduction to
Statistics
DON’T FORGET TO SIGN IN FOR CREDIT!
http://www.psych.uiuc.edu/~jrfinley/p235/
Independent vs. Dependent Events
• Independent Events: unrelated events that intersect at chance levels given relative probabilities of each event
• Dependent Events: events that are related in some way
• So... how to tell if two events are independent or dependent? Look at the INTERSECTION: P(AB)
• if P(AB) = P(A)*P(B) --> independent• if P(AB) P(A)*P(B) --> dependent
Random Variables
• Random Variable: variable that takes on a particular
numerical value based on outcome of a random experiment
• Random Experiment (aka Random Phenomenon):
trial that will result in one of several possible outcomes
can’t predict outcome of any specific trial can predict pattern in the LONG RUN
Random Variables
• Example:• Random Experiment:
flip a coin 3 times
• Random Variable:# of heads
Random Variables
• Discrete vs Continuous finite vs infinite # possible outcomes
• Scales of MeasurementCategorical/NominalOrdinal IntervalRatio
Data World vs. Theory World
• Theory World: Idealization of reality (idealization of what you might expect from a simple experiment) Theoretical probability distribution POPULATION parameter: a number that describes the
population. fixed but usually unknown
• Data World: data that results from an actual simple experiment Frequency distribution SAMPLE statistic: a number that describes the sample
(ex: mean, standard deviation, sum, ...)
So far...
• Graphing & summarizing sample distributions (DESCRIPTIVE)
• Counting Rules• Probability• Random Variables• one more key concept is needed to start
doing INFERENTIAL statistics:
SAMPLING DISTRIBUTION
Binomial Situation
• Bernoulli Trial a random experiment having exactly two possible
outcomes, generically called "Success" and "Failure” probability of “Success” = p probability of “Failure” = q = (1-p)
Heads Tails Good RobotBad
Robot
Examples:
Coin toss: “Success”=Headsp=.5
Robot Factory:“Success”=Good Robotp=.75
Binomial Situation
• Binomial Situation:n: # of Bernoulli trials trials are independentp (probability of “success”) remains
constant across trials
• Binomial Random Variable:X = # of the n trials that are
“successes”
Binomial Situation:collect data!
Population:Outcomes of all possible coin tosses
(for a fair coin)Success=Heads p=.5
Let’s do 10 tosses n=10 (sample size)
Bernoulli Trial: one coin toss
Binomial Random Variable:X=# of the 10 tosses that come up heads
(aka Sample Statistic)Sample: X = ....
Binomial Distributionp=.5, n=10
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0 1 2 3 4 5 6 7 8 9 10
# of successes
probability
This is theSAMPLING DISTRIBUTION
of X!
Sampling Distribution
• Sampling Distribution:Distribution of values that your sample
statistic would take on, if you kept taking samples of the same size, from the same population, FOREVER (infinitely many times).
•Note: this is a THEORETICAL PROBABILITY DISTRIBUTION
Binomial Situation:collect data!
Population:Outcomes of all possible coin tosses
(for a fair coin)Success=Heads p=.5
Let’s do 10 tosses n=10 (sample size)
Bernoulli Trial: one coin toss
Binomial Random Variable:X=# of the 10 tosses that come up heads
(aka Sample Statistic)Sample: X = ....3 5 6
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
# of successes
probability
Sampling Distribution
Binomial Situation:collect data!
Population:Outcomes of all possible coin tosses
(for a fair coin)Success=Heads p=.5
Let’s do 10 tosses n=10 (sample size)
Bernoulli Trial: one coin toss
Binomial Random Variable:X=# of the 10 tosses that come up heads
(aka Sample Statistic)Sample: X = 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
# of successes
probability
Sampling Distribution
Binomial Formula
€
P(X = k) = P(exactly k many successes)
€
P(X = k) =n
k
⎛
⎝ ⎜
⎞
⎠ ⎟pk (1− p)n−k
BinomialRandomVariable
specific # ofsuccesses youcould get
€
n
k
⎛
⎝ ⎜
⎞
⎠ ⎟=
n!
k!(n − k)!
combinationcalled the
Binomial Coefficient
probabilityof success
probabilityof failure
specific # offailures
Binomial Formula
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
# of successes
probability
Sampling Distribution
p(X=3) =
Remember this idea....
Hmm... what if we had gotten X=0?...pretty unlikely outcome... fair coin?
Population:
Outcomes of all p
ossible coin tosse
s
(for a fair c
oin)
p=.5n=10
More on the Binomial Distribution
• X ~ B(n,p)
€
Expected Value
and Variance for X~B(n,p)
μX = np
σ X2 = np(1− p)
Standard Deviation : σ X = np(1− p)
these are theparameters forthe samplingdistribution of X
# heads in 5 tosses of a coin: X~B(5,1/2)
Expectation Variance Std. Dev.# heads in 5 tosses of a coin: 2.5 1.25 1.12
Ex:
Let’s see some moreBinomial Distributions
• What happens if we try doing a different # of trials (n) ?
• That is, try a different sample size...
Binomial Distribution, p=.5, n=5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 1 2 3 4 5
# of successes
probability
Binomial Distribution, p=.5, n=10
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
# of successes
probability
Binomial Distribution, p=.5, n=20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# of successes
probability
Binomial Distribution, p=.5, n=50
0
0.02
0.04
0.06
0.08
0.1
0.12
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
# of successes
probability
Binomial Distribution, p=.5, n=100
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 3 6 912 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99
# of successes
probability
Whoah.
• Anyone else notice those DISCRETE distributions starting to look smoother as sample size (n) increased?
• Let’s look at a few more binomial distributions, this time with a different probability of success...
Binomial Robot Factory
• 2 possible outcomes:Good Robot
90%Bad Robot10%
You’d like to know about how many BAD robots you’re likely to get before placing an order... p = .10 (... “success”)
n = 5, 10, 20, 50, 100
Binomial Distribution, p=.1, n=5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5
# of successes
probability
Binomial Distribution, p=.1, n=10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 1 2 3 4 5 6 7 8 9 10
# of successes
probability
Binomial Distribution, p=.1, n=20
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# of successes
probability
Binomial Distribution, p=.1, n=50
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
# of successes
probability
Binomial Distribution, p=.1, n=100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 3 6 912 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99
# of successes
probability
Normal Approximation of the Binomial
If n is large, then
X ~ B(n,p) {Binomial Distribution}
can be approximated by a NORMAL DISTRIBUTION with parameters:
€
μ =np
σ = np(1− p)
0
0.05
0.1
0.15
0.2
0.25
0.3
probability
Normal Distributions
• (aka “Bell Curve”)• Probability Distributions of a Continuous
Random Variable (smooth curve!)
• Class of distributions, all with the same overall shape
• Any specific Normal Distribution is characterized by two parameters: mean: μ standard deviation:
differentmeans
differentstandarddeviations
Standardizing
• “Standardizing” a distribution of values results in re-labeling & stretching/squishing the x-axis
• useful: gets rid of units, puts all distributions on same scale for comparison
• HOWTO: simply convert every value to a:Z SCORE:
€
z =x − μ
σ
Standardizing
• Z score:
• Conceptual meaning: how many standard deviations from the mean
a given score is (in a given distribution)
• Any distribution can be standardized• Especially useful for Normal
Distributions...€
z =x − μ
σ
Standard Normal Distribution
• has mean: μ=0• has standard deviation: =1• ANY Normal Distribution can be
converted to the Standard Normal Distribution...
StandardNormalDistribution
Normal Distributions & Probability
• Probability = area under the curve intervals cumulative probability [draw on board]
• For the Standard Normal Distribution: These areas have already been
calculated for us (by someone else)
Standard Normal Distribution
So, if this were a Sampling Distribution, ...
Next Time
• More different types of distributionsBinomial, Normal t, Chi-square F
• And then... how will we use these to do inference?
• Remember: biggest new idea today was:SAMPLING DISTRIBUTION