Numerical Descriptive Measures Interpreting Correlation ...

5
Class intervals: Width of interval โ‰… range no.of desired class groupings Arithmetic Mean: = 1 + 2 +โ‹ฏ Median (Position): +1 2 Range โˆ’ Z Score: = โˆ’ Z Outliers = > 3.0 or <-3.0 Measures of Central Tendency: Arithmetic Mean, Median, Mode Quartile (Position): 1 = 0.25( + 1), 2 = 0.50( + 1), 3 = 0.75( + 1) Inter-Quartile Range: = 3 โˆ’ 1 Measures of Dispersion: Variance, Standard Deviation, Coefficient of Variation Covariance tells us only the direction of association Sample coefficient of correlation r: r = ร— where & = S.Dev formula Question 3 Continuous Probability Distribution Find the following probabilities 1. ( < โˆ’1.67) = 0.0475 Read straight from the table. Note: P(Z<1.846) we can only look up z values to two decimal places so round 1.846 up to 1.85 2. ( > โˆ’2.78) = ? 1โˆ’ ( < โˆ’2.78) = ? 1โˆ’ 0.27 = 0.9973 3.(0.15 < < 1.99) = ? ( < 1.99) = 0.9767 ( < 0.15) = 0.5596 0.9767 โˆ’ 0.5596 = 0.4171 Solve the following inverse problems for the standard normal distribution ( > ____ ) = 0.01 Look up the Inverse Normal Table ( > 2.3263) = 0.01 The Inverse table only gives the Z values for upper-tail areas, but because the normal distribution is symmetric about zero, we find the upper-tail Z value, and the lower-tail Z value that we need is the same value but negative. Find the two values of Z (symmetrically distributed around the mean) such that the following statements are true: (____ < < ____ ) = 0.80 Each tail will have an area of 0.10, so looking up the Inverse table to get the two Z values: = โˆ’1.2816 = โˆ’1.2816 P(โˆ’1.2816 < < 1.2816) = 0.80 Continuous Probability Distribution cont. Between what two values of Z (symmetrically distributed around the mean) will 68.26% of all possible Z values be contained? Each tail has an area, ฮฑ = 0.1587 (i.e. (1 - 0.6826)/2, so if we use the Cumulative Normal Distribution table and look for the area of 0.1587, we find that P(Z < -1) = 0.1587. Therefore the right tail where Z = +1 has the same area. So the two values of Z that we are looking for are -1 and +1. i.e. P( -1 < Z < 1) = 0.6826 as in the diagram. Using Inverse Normal table, only look up an area to two decimal places: 0.16 (i.e. 0.1587 rounded to two decimal places) and we would conclude that the two values of Z were Z = 0.9945 and Z = -0.9945 i.e. P( -0.9945 < Z < 0.9945) = 0.68 Question 4. Sampling Distribution Sampling Distribution cont. Sampling Distribution cont. I Estimation Estimation cont. / Confidence Intervals. A population consists of all the members of a group about which you want to draw a conclusion (Greek letters (ฮผ, ฯƒ, ฮ) are used) A sample is the portion of the population selected for analysis (Roman letter (x, s, n) are used for sample data) A parameter is a numerical measure that describes a characteristic of a population A statistic is a numerical measure that describes a characteristic of a sample Numerical data is measured on a natural numerical scale (age) Continuous โ€“ Data that can take on any real number (time/length) Discrete - Countable number of responses (cannot have 0.5) Categorical data can only be named or categorised Nominal โ€“ no order, no response is considered better (gender) Ordinal โ€“ There is an order (very good, good, average) Descriptive Statistics - Collect, Present, Characterise data Inferential Statistics - Drawing conclusions about a population based on sample data Frequency Distributions - summary table in which data are arranged into numerically ordered classes or intervals Ordered array: sequence of data in rank order Time Series โ€“ Data collect through time (Months sales for May) Cross Sectional โ€“ Collected for a point in time (My height today) Question 1. (Topics 1-3) Numerical Descriptive Measures Reordered data: 3, 4, 7, 9 Variance: firstly find = 5.75 2 = ( โˆ’ ) 2 =1 โˆ’1 = Sample Variance [(3 โˆ’ 7) 2 + (4 โˆ’ 7) 2 + (7 โˆ’ 7) 2 + (9 โˆ’ 7) 2 ] 5.75 โˆ’ 1 = [(โˆ’4) 2 + (โˆ’3) 2 + (0) 2 + (2) 2 ] 4.75 = 16 + 9 + 0 + 4 4.75 = 29 4.75 = 6.10 Standard deviation: = 2 = 6.1 = 2.46 Coefficient of variation: = ร— 100% = 2.46 4 ร— 100% = 61.7% Sample of n = 4: (2, 3), (7, 9), (4, 5), (4, 6) = 2+7+4+4 4 = 17 4 = 4.25 = 3+9+5+6 4 = 23 4 = 5.75 ( โˆ’ ) ( โˆ’) ( โˆ’)( โˆ’ ) 2 3 -2.25 -2.75 6.19 7 9 2.75 3.25 8.94 4 5 -0.25 -0.75 0.19 4 6 -0.25 0.25 -0.06 ( โˆ’ )( โˆ’) = 15.26 = ( โˆ’)( โˆ’ ) โˆ’1 = 15.26 4โˆ’1 = 5.09 = = ร— = 5.09 2.06 ร— 2.5 = 0.99 Interpreting Correlation Coefficient r Interpretation r = -1 PERFECT negative linear -1 < r โ‰ค -0.7 STRONG negative linear -0.7 < r โ‰ค -0.3 MODERATE negative linear -0.3 < r < 0 WEAK negative linear r = 0 No relationship 0 < r < 0.3 WEAK positive linear 0.3 โ‰ค r < 0.7 MODERATE positive linear 0.7 โ‰ค r < 1 STRONG positive linear 1 PERFECT positive linear Population mean โ€“ ฮผ Sample mean - Population variance - 2 Sample Proportion โ€“ p Standard Deviation โ€“ S Variance โ€“ 2 Student Name: Student No: (Direction) (Strength) Is it for ฮผ? No 2 = (โˆ’1) 2 2 Yes Is known? No = โˆ’ โ„ Yes Quantitative โ€“ = โˆ’ โ„ Qualitative = โˆ’ โˆš (1โˆ’)

Transcript of Numerical Descriptive Measures Interpreting Correlation ...

Page 1: Numerical Descriptive Measures Interpreting Correlation ...

Class intervals: Width of interval โ‰… range

no.of desired class groupings Arithmetic Mean: ๏ฟฝฬ…๏ฟฝ =

๐‘‹1+๐‘‹2+โ‹ฏ๐‘‹๐‘›

๐‘› Median (Position):

๐‘›+1

2 Range ๐‘ฟ๐’Ž๐’‚๐’™ โˆ’ ๐‘ฟ๐’Ž๐’Š๐’ Z Score: ๐’ =

๐‘ฟโˆ’๏ฟฝฬ…๏ฟฝ

๐‘บ Z Outliers = > 3.0 or <-3.0

Measures of Central Tendency: Arithmetic Mean, Median, Mode Quartile (Position): ๐‘„1 = 0.25(๐‘› + 1), ๐‘„2 = 0.50(๐‘› + 1), ๐‘„3 = 0.75(๐‘› + 1) Inter-Quartile Range: ๐ผ๐‘„๐‘… = ๐‘„3 โˆ’ ๐‘„1

Measures of Dispersion: Variance, Standard Deviation, Coefficient of Variation Covariance tells us only the direction of association Sample coefficient of correlation r: r =๐‘๐‘œ๐‘ฃ๐‘Ž๐‘Ÿ

๐‘ ๐‘ฅร—๐‘ ๐‘ฆwhere ๐‘ ๐‘ฅ & ๐‘ ๐‘ฆ = S.Dev formula

cov(X,Y) where SX & SY = S.Dev formula SXSy

Question 3 Continuous Probability Distribution Find the following probabilities 1. ๐‘ƒ(๐‘ < โˆ’1.67) = 0.0475 Read straight from the table. Note: P(Z<1.846) we can only look up z values to two decimal places so round 1.846 up to 1.85

2. ๐‘ƒ(๐‘ > โˆ’2.78) = ? 1 โˆ’ ๐‘ƒ(๐‘ < โˆ’2.78) = ? 1 โˆ’ 0.27 = 0.9973

3.๐‘ƒ(0.15 < ๐‘ < 1.99) = ? ๐‘ƒ(๐‘ < 1.99) = 0.9767 ๐‘ƒ(๐‘ < 0.15) = 0.5596 0.9767 โˆ’ 0.5596 = 0.4171

Solve the following inverse problems for the standard normal distribution ๐‘ƒ(๐‘ > ____ ) = 0.01

Look up the Inverse Normal Table ๐‘ƒ(๐‘ > 2.3263) = 0.01

The Inverse table only gives the Z values for upper-tail areas, but because the normal distribution is symmetric about zero, we find the upper-tail Z value, and the lower-tail Z value that we need is the same value but negative.

Find the two values of Z (symmetrically distributed around the mean) such that the following statements are true: ๐‘ƒ(____ < ๐‘ < ____ ) = 0.80

Each tail will have an area of 0.10, so looking up the Inverse table to get the two Z values: ๐‘๐ฟ๐‘‚๐‘Š๐ธ๐‘… = โˆ’1.2816 ๐‘๐‘ˆ๐‘ƒ๐‘ƒ๐ธ๐‘… = โˆ’1.2816

P(โˆ’1.2816 < ๐‘ < 1.2816) = 0.80

Continuous Probability Distribution cont. Between what two values of Z (symmetrically distributed around the mean) will 68.26% of all possible Z values be contained? Each tail has an area, ฮฑ = 0.1587 (i.e. (1 - 0.6826)/2, so if we use the Cumulative Normal Distribution table and look for the area of 0.1587, we find that P(Z < -1) = 0.1587. Therefore the right tail where Z = +1 has the same area. So the two values of Z that we are looking for are -1 and +1. i.e. P( -1 < Z < 1) = 0.6826 as in the diagram. Using Inverse Normal table, only look up an area to two decimal places: 0.16 (i.e. 0.1587 rounded to two decimal places) and we would conclude that the two values of Z were Z = 0.9945 and Z = -0.9945 i.e. P( -0.9945 < Z < 0.9945) = 0.68

Question 4. Sampling Distribution

Sampling Distribution cont.

Sampling Distribution cont. I

Estimation

Estimation cont. / Confidence Intervals.

A population consists of all the members of a group about which you want to draw a conclusion (Greek letters (ฮผ, ฯƒ, ฮ) are used) A sample is the portion of the population selected for analysis (Roman letter (x, s, n) are used for sample data) A parameter is a numerical measure that describes a characteristic of a population A statistic is a numerical measure that describes a characteristic of a sample

Numerical data is measured on a natural numerical scale (age) Continuous โ€“ Data that can take on any real number (time/length) Discrete - Countable number of responses (cannot have 0.5) Categorical data can only be named or categorised Nominal โ€“ no order, no response is considered better (gender) Ordinal โ€“ There is an order (very good, good, average) Descriptive Statistics - Collect, Present, Characterise data

Inferential Statistics - Drawing conclusions about a population based on sample data Frequency Distributions - summary table in which data are arranged into numerically ordered classes or intervals Ordered array: sequence of data in rank order Time Series โ€“ Data collect through time (Months sales for May) Cross Sectional โ€“ Collected for a point in time (My height today)

Question 1. (Topics 1-3)

Numerical Descriptive Measures Reordered data: 3, 4, 7, 9 Variance: firstly find ๐‘ฅ = 5.75

๐‘ 2 = (๐‘ฅ โˆ’ ๐‘ฅ )2๐‘›๐‘–=1

๐‘› โˆ’ 1= Sample Variance

[(3 โˆ’ 7)2 + (4 โˆ’ 7)2 + (7 โˆ’ 7)2 + (9 โˆ’ 7)2]

5.75 โˆ’ 1

=[(โˆ’4)2 + (โˆ’3)2 + (0)2 + (2)2]

4.75

=16 + 9 + 0 + 4

4.75 =

29

4.75 = 6.10

Standard deviation: ๐‘  = ๐‘ 2 = 6.1 = 2.46 Coefficient of variation:

๐ถ๐‘‰ =๐‘ 

๐‘ฅ ร— 100% =

2.46

4ร— 100% = 61.7%

Sample of n = 4: (2, 3), (7, 9), (4, 5), (4, 6)

๐‘ฅ =2 + 7 + 4 + 4

4=17

4= 4.25

๐‘ฆ =3 + 9 + 5 + 6

4=23

4= 5.75

๐‘ฅ ๐‘ฆ (๐‘ฅ โˆ’ ๐‘ฅ ) (๐‘ฆ โˆ’ ๐‘ฆ) (๐‘ฅ โˆ’ ๐‘ฅ )(๐‘ฆ โˆ’ ๐‘ฆ) 2 3 -2.25 -2.75 6.19 7 9 2.75 3.25 8.94 4 5 -0.25 -0.75 0.19 4 6 -0.25 0.25 -0.06

(๐‘ฅ โˆ’ ๐‘ฅ )(๐‘ฆ โˆ’ ๐‘ฆ) = 15.26

๐‘๐‘œ๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘›๐‘๐‘’ = (๐‘ฅ โˆ’ ๐‘ฅ )(๐‘ฆ โˆ’ ๐‘ฆ)

๐‘› โˆ’ 1=15.26

4 โˆ’ 1= 5.09

๐‘๐‘œ๐‘Ÿ๐‘Ÿ๐‘’๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› = ๐‘Ÿ =๐‘๐‘œ๐‘ฃ๐‘Ž๐‘Ÿ

๐‘ ๐‘ฅ ร— ๐‘ ๐‘ฆ=

5.09

2.06 ร— 2.5= 0.99

Interpreting Correlation Coefficient r Interpretation

r = -1 PERFECT negative linear -1 < r โ‰ค -0.7 STRONG negative linear

-0.7 < r โ‰ค -0.3 MODERATE negative linear -0.3 < r < 0 WEAK negative linear

r = 0 No relationship 0 < r < 0.3 WEAK positive linear

0.3 โ‰ค r < 0.7 MODERATE positive linear 0.7 โ‰ค r < 1 STRONG positive linear

1 PERFECT positive linear Population mean โ€“ ฮผ Sample mean - ๏ฟฝฬ…๏ฟฝ Population variance - 2 Sample Proportion โ€“ p Standard Deviation โ€“ S Variance โ€“ ๐‘†2

Student Name: Student No:

(Direction) (Strength)

Is it for ฮผ? No ๐‘‹2 = (๐‘›โˆ’1)2

2 Yes Is known? No ๐‘ก =

๏ฟฝฬ…๏ฟฝโˆ’๐œ‡๐‘† ๐‘›โ„

Yes Quantitative โ€“ ๐‘ =๏ฟฝฬ…๏ฟฝโˆ’๐œ‡ ๐‘›โ„

Qualitative ๐‘ =๐‘โˆ’๐œ‹

โˆš๐œ‹(1โˆ’๐œ‹)๐‘›

Page 2: Numerical Descriptive Measures Interpreting Correlation ...

L

Pooled-Variance t Test Example โ€“ Two Sample (Sigma Unknown, Variance Equal, Assume n =30min (Central Limit T)

(t0.05, 1998) ๐‘‘๐‘“ = ๐‘›1 + ๐‘›2 โˆ’ 2 = 1000 + 1000 โˆ’ 2 = 1998

F Test Example โ€“ Two Sample (F table for reject regions)

Fu = F 0.025 , 99 , 71 = F 0.025 , 60 , 60 = 1.67 Fu* = F 0.025 , 71, 90 = F 0.025 , 60 , 60 = 1.67

FL = 1

๐น๐‘ขโˆ—

= 11.67

= 0.599

Two population Proportion Example โ€“ Two Sample (Rejection region use inverse normal table)

1.6449

Analysis of Variance (ANOVA)

Question 2 Simple Linear Regression & Probability

Probability & Discrete Probability Distributions

Probability & Discrete Probability Distributions

Binomial Distribution (Question will provide n, x and % (portion)

Question 5 Hypothesis testing Hypothesis Testing cont.

Page 3: Numerical Descriptive Measures Interpreting Correlation ...

BSB123 Data Analysis Semester 2 2015

Workshop 8 (Week 10) โ€“ Estimation

Question 1

The quality control manager at a light bulb factory needs to estimate that mean life of a large

shipment of light bulbs. The standard deviation is 100 hours. A random sample of 64 light bulbs

indicates a sample mean life of 350 hours.

(a) Construct a 95% confidence interval estimate of the population mean life of light bulbs in this

shipment.

(b) Do you think that the manufacturer has the right to state that the light bulbs last an average

of 400 hours? Explain.

The first approach is purely to say itโ€™s outside the confidence interval. The second approach is to take

that value of 400 convert it to a Z value, so you can determine the probability that the statement is

correct.

Page 4: Numerical Descriptive Measures Interpreting Correlation ...

(c) Must you assume that the population of light bulb life is normally distributed? Explain.

No because my sample size is >30. Therefore according to the CLT (central limit theorem) at the very

least I will end up with approximate normal distribution

In other words if we have 30 observations or more, under the CLT we have a โ‰ˆ Normal

Question 2

If Xฬ… = 75, S = 24, n = 36, and assuming that the population is normally distributed, construct a 95%

confidence interval estimate of the population mean ฮผ.

Page 5: Numerical Descriptive Measures Interpreting Correlation ...

Question 3

A study conducted by the Australian Stock Exchange found that 46% of 2,405 Australian adults

surveyed in 2006 held shares, either directly or indirectly through managed funds or self-managed

superannuation funds (2006 Australian Share Ownership Study, ASX).

(a) Construct a 95% confidence interval for the proportion of Australian adults who held shares

in 2006.

When dealing with populations proportions we always use a Z.

(b) Interpret the interval constructed in (a).

As above. I am 95% confident that the true proportion of Australian adults who held shares in 2006 is

between 44 and 48%

(c) To construct a follow-up study to estimate the population proportion of adults who currently

hold shares to within 0.01 with 95% confidence, how many adults would you interview?