Numerical Descriptive Measures Interpreting Correlation ...
Transcript of Numerical Descriptive Measures Interpreting Correlation ...
Class intervals: Width of interval โ range
no.of desired class groupings Arithmetic Mean: ๏ฟฝฬ ๏ฟฝ =
๐1+๐2+โฏ๐๐
๐ Median (Position):
๐+1
2 Range ๐ฟ๐๐๐ โ ๐ฟ๐๐๐ Z Score: ๐ =
๐ฟโ๏ฟฝฬ ๏ฟฝ
๐บ Z Outliers = > 3.0 or <-3.0
Measures of Central Tendency: Arithmetic Mean, Median, Mode Quartile (Position): ๐1 = 0.25(๐ + 1), ๐2 = 0.50(๐ + 1), ๐3 = 0.75(๐ + 1) Inter-Quartile Range: ๐ผ๐๐ = ๐3 โ ๐1
Measures of Dispersion: Variance, Standard Deviation, Coefficient of Variation Covariance tells us only the direction of association Sample coefficient of correlation r: r =๐๐๐ฃ๐๐
๐ ๐ฅร๐ ๐ฆwhere ๐ ๐ฅ & ๐ ๐ฆ = S.Dev formula
cov(X,Y) where SX & SY = S.Dev formula SXSy
Question 3 Continuous Probability Distribution Find the following probabilities 1. ๐(๐ < โ1.67) = 0.0475 Read straight from the table. Note: P(Z<1.846) we can only look up z values to two decimal places so round 1.846 up to 1.85
2. ๐(๐ > โ2.78) = ? 1 โ ๐(๐ < โ2.78) = ? 1 โ 0.27 = 0.9973
3.๐(0.15 < ๐ < 1.99) = ? ๐(๐ < 1.99) = 0.9767 ๐(๐ < 0.15) = 0.5596 0.9767 โ 0.5596 = 0.4171
Solve the following inverse problems for the standard normal distribution ๐(๐ > ____ ) = 0.01
Look up the Inverse Normal Table ๐(๐ > 2.3263) = 0.01
The Inverse table only gives the Z values for upper-tail areas, but because the normal distribution is symmetric about zero, we find the upper-tail Z value, and the lower-tail Z value that we need is the same value but negative.
Find the two values of Z (symmetrically distributed around the mean) such that the following statements are true: ๐(____ < ๐ < ____ ) = 0.80
Each tail will have an area of 0.10, so looking up the Inverse table to get the two Z values: ๐๐ฟ๐๐๐ธ๐ = โ1.2816 ๐๐๐๐๐ธ๐ = โ1.2816
P(โ1.2816 < ๐ < 1.2816) = 0.80
Continuous Probability Distribution cont. Between what two values of Z (symmetrically distributed around the mean) will 68.26% of all possible Z values be contained? Each tail has an area, ฮฑ = 0.1587 (i.e. (1 - 0.6826)/2, so if we use the Cumulative Normal Distribution table and look for the area of 0.1587, we find that P(Z < -1) = 0.1587. Therefore the right tail where Z = +1 has the same area. So the two values of Z that we are looking for are -1 and +1. i.e. P( -1 < Z < 1) = 0.6826 as in the diagram. Using Inverse Normal table, only look up an area to two decimal places: 0.16 (i.e. 0.1587 rounded to two decimal places) and we would conclude that the two values of Z were Z = 0.9945 and Z = -0.9945 i.e. P( -0.9945 < Z < 0.9945) = 0.68
Question 4. Sampling Distribution
Sampling Distribution cont.
Sampling Distribution cont. I
Estimation
Estimation cont. / Confidence Intervals.
A population consists of all the members of a group about which you want to draw a conclusion (Greek letters (ฮผ, ฯ, ฮ) are used) A sample is the portion of the population selected for analysis (Roman letter (x, s, n) are used for sample data) A parameter is a numerical measure that describes a characteristic of a population A statistic is a numerical measure that describes a characteristic of a sample
Numerical data is measured on a natural numerical scale (age) Continuous โ Data that can take on any real number (time/length) Discrete - Countable number of responses (cannot have 0.5) Categorical data can only be named or categorised Nominal โ no order, no response is considered better (gender) Ordinal โ There is an order (very good, good, average) Descriptive Statistics - Collect, Present, Characterise data
Inferential Statistics - Drawing conclusions about a population based on sample data Frequency Distributions - summary table in which data are arranged into numerically ordered classes or intervals Ordered array: sequence of data in rank order Time Series โ Data collect through time (Months sales for May) Cross Sectional โ Collected for a point in time (My height today)
Question 1. (Topics 1-3)
Numerical Descriptive Measures Reordered data: 3, 4, 7, 9 Variance: firstly find ๐ฅ = 5.75
๐ 2 = (๐ฅ โ ๐ฅ )2๐๐=1
๐ โ 1= Sample Variance
[(3 โ 7)2 + (4 โ 7)2 + (7 โ 7)2 + (9 โ 7)2]
5.75 โ 1
=[(โ4)2 + (โ3)2 + (0)2 + (2)2]
4.75
=16 + 9 + 0 + 4
4.75 =
29
4.75 = 6.10
Standard deviation: ๐ = ๐ 2 = 6.1 = 2.46 Coefficient of variation:
๐ถ๐ =๐
๐ฅ ร 100% =
2.46
4ร 100% = 61.7%
Sample of n = 4: (2, 3), (7, 9), (4, 5), (4, 6)
๐ฅ =2 + 7 + 4 + 4
4=17
4= 4.25
๐ฆ =3 + 9 + 5 + 6
4=23
4= 5.75
๐ฅ ๐ฆ (๐ฅ โ ๐ฅ ) (๐ฆ โ ๐ฆ) (๐ฅ โ ๐ฅ )(๐ฆ โ ๐ฆ) 2 3 -2.25 -2.75 6.19 7 9 2.75 3.25 8.94 4 5 -0.25 -0.75 0.19 4 6 -0.25 0.25 -0.06
(๐ฅ โ ๐ฅ )(๐ฆ โ ๐ฆ) = 15.26
๐๐๐ฃ๐๐๐๐๐๐๐ = (๐ฅ โ ๐ฅ )(๐ฆ โ ๐ฆ)
๐ โ 1=15.26
4 โ 1= 5.09
๐๐๐๐๐๐๐๐ก๐๐๐ = ๐ =๐๐๐ฃ๐๐
๐ ๐ฅ ร ๐ ๐ฆ=
5.09
2.06 ร 2.5= 0.99
Interpreting Correlation Coefficient r Interpretation
r = -1 PERFECT negative linear -1 < r โค -0.7 STRONG negative linear
-0.7 < r โค -0.3 MODERATE negative linear -0.3 < r < 0 WEAK negative linear
r = 0 No relationship 0 < r < 0.3 WEAK positive linear
0.3 โค r < 0.7 MODERATE positive linear 0.7 โค r < 1 STRONG positive linear
1 PERFECT positive linear Population mean โ ฮผ Sample mean - ๏ฟฝฬ ๏ฟฝ Population variance - 2 Sample Proportion โ p Standard Deviation โ S Variance โ ๐2
Student Name: Student No:
(Direction) (Strength)
Is it for ฮผ? No ๐2 = (๐โ1)2
2 Yes Is known? No ๐ก =
๏ฟฝฬ ๏ฟฝโ๐๐ ๐โ
Yes Quantitative โ ๐ =๏ฟฝฬ ๏ฟฝโ๐ ๐โ
Qualitative ๐ =๐โ๐
โ๐(1โ๐)๐
L
Pooled-Variance t Test Example โ Two Sample (Sigma Unknown, Variance Equal, Assume n =30min (Central Limit T)
(t0.05, 1998) ๐๐ = ๐1 + ๐2 โ 2 = 1000 + 1000 โ 2 = 1998
F Test Example โ Two Sample (F table for reject regions)
Fu = F 0.025 , 99 , 71 = F 0.025 , 60 , 60 = 1.67 Fu* = F 0.025 , 71, 90 = F 0.025 , 60 , 60 = 1.67
FL = 1
๐น๐ขโ
= 11.67
= 0.599
Two population Proportion Example โ Two Sample (Rejection region use inverse normal table)
1.6449
Analysis of Variance (ANOVA)
Question 2 Simple Linear Regression & Probability
Probability & Discrete Probability Distributions
Probability & Discrete Probability Distributions
Binomial Distribution (Question will provide n, x and % (portion)
Question 5 Hypothesis testing Hypothesis Testing cont.
BSB123 Data Analysis Semester 2 2015
Workshop 8 (Week 10) โ Estimation
Question 1
The quality control manager at a light bulb factory needs to estimate that mean life of a large
shipment of light bulbs. The standard deviation is 100 hours. A random sample of 64 light bulbs
indicates a sample mean life of 350 hours.
(a) Construct a 95% confidence interval estimate of the population mean life of light bulbs in this
shipment.
(b) Do you think that the manufacturer has the right to state that the light bulbs last an average
of 400 hours? Explain.
The first approach is purely to say itโs outside the confidence interval. The second approach is to take
that value of 400 convert it to a Z value, so you can determine the probability that the statement is
correct.
(c) Must you assume that the population of light bulb life is normally distributed? Explain.
No because my sample size is >30. Therefore according to the CLT (central limit theorem) at the very
least I will end up with approximate normal distribution
In other words if we have 30 observations or more, under the CLT we have a โ Normal
Question 2
If Xฬ = 75, S = 24, n = 36, and assuming that the population is normally distributed, construct a 95%
confidence interval estimate of the population mean ฮผ.
Question 3
A study conducted by the Australian Stock Exchange found that 46% of 2,405 Australian adults
surveyed in 2006 held shares, either directly or indirectly through managed funds or self-managed
superannuation funds (2006 Australian Share Ownership Study, ASX).
(a) Construct a 95% confidence interval for the proportion of Australian adults who held shares
in 2006.
When dealing with populations proportions we always use a Z.
(b) Interpret the interval constructed in (a).
As above. I am 95% confident that the true proportion of Australian adults who held shares in 2006 is
between 44 and 48%
(c) To construct a follow-up study to estimate the population proportion of adults who currently
hold shares to within 0.01 with 95% confidence, how many adults would you interview?