Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data...

46
Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Transcript of Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data...

Page 1: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Chapter 6 - Random Sampling and Data Description

More joy of dealing with large quantities of data

Chapter 6B

You can never have too much

data.

Page 2: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Today in Prob & Stat

Page 3: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

6-2 Stem-and-Leaf Diagrams

Steps for Constructing a Stem-and-Leaf Diagram

Page 4: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

6-2 Stem-and-Leaf Diagrams

Page 5: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Example 6-4

Page 6: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-4

Stem-and-leaf diagram for the compressive strength data in Table 6-2.

Page 7: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-5

25 observations on batch yields

Stem-and-leaf displays for Example 6-5. Stem: Tens digits. Leaf: Ones digits.

too few

too many

just right

Page 8: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-6

Stem-and-leaf diagram from Minitab.

Number of observationsIn the middle stem

Page 9: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

6-4 Box Plots

• The box plot is a graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of observations that lie unusually far from the bulk of the data.

• Whisker• Outlier• Extreme outlier

Page 10: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-13

Description of a box plot.

Page 11: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-14

Box plot for compressive strength data in Table 6-2.

Page 12: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-15

Comparative box plots of a quality index at three plants.

Page 13: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

6-5 Time Sequence Plots

• A time series or time sequence is a data set in which the observations are recorded in the order in which they occur. • A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.). • When measurements are plotted as a time series, weoften see

•trends, •cycles, or •other broad features of the data

Page 14: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-16

Company sales by year (a) and by quarter (b).

Page 15: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-17 gosh! – a stem and leaf diagram combined with a time series plot

A digidot plot of the compressive strength data in Table 6-2.

Page 16: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-18

A digidot plot of chemical process concentration readings, observed hourly.

Page 17: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

6-6 Probability Plots

• Probability plotting is a graphical method for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data.

• Probability plotting typically uses special graph paper, known as probability paper, that has been designed for the hypothesized distribution. Probability paper is widely available for the normal, lognormal, Weibull, and various chi-square and gamma distributions.

Page 18: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Probability (Q-Q)* Plots

•Forget ‘normal probability paper’

•Plot the z score versus the ranked observations, x(j)

•Subjective, visual technique usually applied to test normality. Can also be adapted to other distributions.

•Method (for normal distribution):

•Rank the observations x(1), x(2), …, x(n) from smallest to largest

•Compute the (j-1/2)/n value for each x(j)

•Plot zj=F-1((j-1/2)/n) versus x(j)

Parentheses usually indicate

ordering of data.

Page 19: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Computing zj, where zj = -1(j – ½)/n

25 5.65 1.01 0.84526 5.75 1.17 0.87927 5.79 1.36 0.91428 5.85 1.63 0.94829 5.86 2.11 0.983

29 values xj values zj (j-1/2)/n1 4.07 -2.11 0.0172 4.88 -1.63 0.0523 5.10 -1.36 0.0864 5.26 -1.17 0.1215 5.27 -1.01 0.155

xj values are ordered least to greatest

Page 20: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Example in EXCEL – Table 6-6, pp. 214

Cavendish Earth Density Data

-2.50

-2.00

-1.50

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

2.50

4.0 4.5 5.0 5.5 6.0

xj values

zj v

alu

es

zj is the function NORMSINV

Page 21: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Example in EXCEL – Table 6-6, cont’d

Cavendish Earth Density Data(censored)

-2.50

-2.00

-1.50

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

4.0 4.5 5.0 5.5 6.0

xj values

zj v

alu

es

zj is the function NORMSINV

Page 22: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Example 6-7

Page 23: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Example 6-7 (continued)

Page 24: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-19

Normal probability

plot for battery life.

Page 25: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-20

Normal probability plot obtained from standardized normal scores.

Page 26: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Figure 6-21

Normal probability plots indicating a nonnormal distribution. (a) Light-tailed distribution. (b) Heavy-tailed distribution. (c ) A distribution with positive (or right) skew.

Page 27: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

The Beginning of a Comprehensive Example

Descriptive Statistics in Action see real numbers, real data watch as they are manipulated in perverse ways be thrilled as they are sorted and be amazed as they are compressed into a single numbers

Page 28: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

The Raw Data

As part of a life span study of a particular type of lithium

polymer rechargable battery, 120 batteries were operated and their life span in operating hours determined.

1676.5 2386.6 1347.6 1840.2 916.0 1592.0895.6 1663.8 2420.6 943.5 1351.9 1395.4

1682.0 2045.9 2450.0 2210.5 1823.0 2401.81913.6 1985.2 2319.7 2856.3 1944.9 2968.72881.9 1387.6 2560.1 745.0 1641.3 1952.32007.8 718.3 884.1 2125.3 1694.0 2430.53313.4 1088.7 596.2 1759.9 1378.0 999.12156.4 1879.4 1779.7 1297.0 849.4 1608.41954.7 2056.6 908.3 2210.1 1882.6 983.82210.4 1740.2 955.4 543.4 2323.8 1831.11630.3 2791.8 2383.4 891.5 807.0 1307.41818.8 2476.0 1577.6 1818.8 2088.8 2139.01779.5 845.1 2365.4 1803.7 2940.7 1552.6984.2 1581.8 1527.9 1460.3 2004.6 1808.1

1512.6 2713.7 2749.2 1753.3 1714.3 2398.02046.1 2238.5 2439.7 2633.1 2039.1 2398.81613.3 1314.2 2016.2 4300.8 1760.5 2824.32066.1 729.3 1757.8 1250.8 577.8 715.22926.9 1898.7 1022.7 1005.2 1945.6 2277.31995.7 1377.2 2063.8 667.1 1299.9 1941.2

Data generated from a Weibull distribution with = 2.8 and = 2000

Page 29: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Descriptive Statistics - Minitab

Variable N Mean Median TrMean StDev SE MeanBattery Life 120 1789.4 1813.4 1773.9 661.5 60.4

Variable Minimum Maximum Q1 Q3Battery Life 543.4 4300.8 1348.7 2210.3

trimmed mean

Page 30: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

More Minitab

450040003500300025002000150010005000

20

10

0

Battery Life

Fre

quen

cy

Histogram of Battery Life

Page 31: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

More Minitab

450040003500300025002000150010005000

20

10

0

Battery Life

Fre

quen

cy

Histogram of Battery Life, with Normal Curve

Page 32: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Stem and Leaf Plot

Leaf Unit = 100

21 0 555677778888889999999 36 1 000222333333334 (40) 1 5555556666666677777777888888888899999999 44 2 0000000000111222223333333444444 13 2 56777888999 2 3 3 1 3 1 4 3

Page 33: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

4000300020001000

Battery Life

Dotplot for Battery Life

Page 34: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

More Minitab

40003000200010000

Battery Life

Boxplot of Battery Life

Page 35: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

More Minitab

430037003100250019001300700

95% Confidence Interval for Mu

1950185017501650

95% Confidence Interval for Median

Variable: Battery Life

1691.57

587.10

1669.87

Maximum3rd QuartileMedian1st QuartileMinimum

NKurtosisSkewnessVarianceStDevMean

P-Value:A-Squared:

1945.04

757.74

1909.03

4300.802210.321813.451348.68 543.40

1200.7366280.359617

437627 661.531789.45

0.1390.566

95% Confidence Interval for Median

95% Confidence Interval for Sigma

95% Confidence Interval for Mu

Anderson-Darling Normality Test

Descriptive Statistics

Page 36: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Time Series Plot

12010080604020

4000

3000

2000

1000

0

Index

Battery

Life

Based upon the order that the data was generated

Page 37: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Time Series Plot

12010080604020

4000

3000

2000

1000

0

Index

sort

ed

Sorted by failure time

Page 38: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

40003000200010000

99

95

90

80706050403020

10

5

1

Data

Per

cent

Normal Probability Plot for Battery Life

ML Estimates

Mean:

StDev:

1789.45

658.771

Page 39: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

1000100

99959080706050403020

10

5

3 2

1

Data

Per

cent

Weibull Probability Plot for Battery Life

ML Estimates

Shape:

Scale:

2.92250

2005.35

Page 40: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

1000050000

99

98

97

95

90

807060503010

Data

Per

cent

Exponential Probability Plot for Battery Life

ML Estimates

Mean: 1789.45

Page 41: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Computer Support

This is easy if you use the computer.

hang on, we are going to Excel…

Page 42: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

A Recap …

•Population – the totality of observations with which we are concerned. Issue: conceptual vs. actual.

•Sample – subset of observations selected from a population.

•Statistic – any function of the observations in a sample.

•Sample range – If the n observations in a sample are denoted by x1, x2, …,xn, then the sample range is r = max(xi) – min(xi).

•Sample mean and variance.

1n

xnx

1n

)xx(s

xx

2n

1i

2i

n

1i

2i

2

n

1ii

Note that these are

functions of the observations in a sample and are,

therefore, statistics.

Page 43: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

More Recapping …

N

Nx

N

)x(

x

2N

1i

2i

N

1i

2i

2

N

1ii

1n

xnx

1n

)xx(s

2n

1i

2i

n

1i

2i

2

Note difference in denominators

Sample variance uses an estimate of the mean (xbar) in its calculation. If divided by n, the sample variance would be a

biased estimate – biased low.

Note terminology – ‘population parameter’ vs.

‘sample statistic’

Page 44: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Sampling Process

X a random variable that represents one selection from a population.

Each observation in the sample is obtained under identical conditions. The population does not change during sampling. The probability distribution of values does not change

during sampling. f(x1,x2,…,xn) = f(x1)f(x2)…f(xn) if the sample is

independent. Notation

X1, X2,…, Xn are the random variables.

x1, x2,…, xn are the values of the random variables.

Page 45: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

A Final Recap…

A probability distribution is often a model for a population.

This is often the case when the population is conceptual or infinite.

The histogram should resemble to distribution of population values. The bigger the sample

the stronger the resemblance.

Page 46: Chapter 6 - Random Sampling and Data Description More joy of dealing with large quantities of data Chapter 6B You can never have too much data.

Our Work Here Today is Done

Next Week: The Glorious Midterm

Prob/Stat studentsDiscussing stem and leaf plots