Random Sampling Approximations of E(X), p.m.f, and p.d.f.

Random Sampling

Approximations of E(X), p.m.f, and p.d.f

Important• Read through simulation slides for

Thursday• Homework #8 is due on Thursday• Check web-page on Wednesday night --

print off any worksheets for simulation that might be there for Thursday

• Major mistakes on study guide put on-line; there is a new one there now– A different definition of the p.d.f and c.d.f for

the uniform random variable then what was given in class but they mean the same thing.

• P.M.F versus P.D.F – need clarification because I mispoke

P.M.F versus P.D.F

• Either graph can be a histogram– I was assuming that the bin width will always

be 1 for a finite random variable but that is not necessarily the case

• Take X = 0, ½, 1, etc.

• Probability Mass Function– The values along the y-axis of a histogram

represent probabilities• If you sum up the probabilities, they should add up to

1 every time (regardless of the bin width)• Thus, to determine is a graph is a p.m.f, you need to

add up the heights of the rectangles – if they add up to 1, then it is a p.m.f.

P.M.F versus P.D.F

• A probability density function can also be a histogram

• The values along the y-axis do not represent probabilities for a continuous random variable

• However, the area under graph must be equal to 1– How can you check if a histogram represents an

p.d.f? If the heights of the rectangles do not add up to 1, but the areas of the rectangles do sum to 1.

In conclusion

• Both a p.m.f and p.d.f graph can be histograms– To tell if a histogram represents a p.m.f, the

sum of the HEIGHTS of the rectangles must equal 1 (because the heights represent probabilities)

– To tell if a histogram represents an p.d.f, the sum of the heights of the rectangles do not equal but the area of the rectangles do.

• Let’s look at number 4b from the homework just turned in ….

Why do we use Random Sampling?• In business, we identify a random

variable• We want its probability information • Problem: We do not know its

distribution OR expected value• Solution: Estimate E(X) and

estimate FX(x) and fX(x) using random sampling

Definitions

• A number x that results from a trial of the process is called an observation of X

• A set {x1, x2, ……, xn} of n independent observations of the same random variable X is called a random sample of size n.

Example #1

• Suppose that X is the number of assembly line stoppages that occur during an 8-hour shift in a manufacturing plant. We could obtain a random sample of size 10 by watching the line for 10 different shifts and recording the number of stoppages during each 8-hour shift

Example #1 (continued)

• Looking above, we see the information recorded during the 10 different shift observations

• We can compute the sample mean of the observations

• The sample mean is denoted by

Shift observed 1 2 3 4 5 6 7 8 9 10Number of stoppages 2 11 6 8 6 5 10 4 8 3

___

x

Statistics and Probability

• There is a difference between probabilities and statistics even though people use them interchangably

• A number that describes a sample is called a statistic

• THEOREM: The statistic can be used as an estimate of E(X).

• In general, the larger the sample size n, the better the estimate will be

___

x

Sample Mean

• We can find the mean of example #1

1

1

1(2 11 6 8 6 5 10 4 8 3)

1063

6.310

( )

So, 6.3 is the approximate number of expected

line stoppages during an 8-hour shift

However, 10 observations is a little small to

base an approxima

n

ii

x xn

x

x

x E X

tion on

Approximating Probability Mass and Density Functions• If we have a large enough sample,

we can approximate functions• I.e., we can approximate a p.m.f or

a p.d.f depending on the random variable

• If we approximate a p.m.f or p.d.f, we can also look at the corresponding graphs

Example #3

Suppose that the assembly line discussed in Example #1 runs 24 hours a day, with workers in three shifts. Observations of the number of stoppages during an 8-hour shift were recorded for a nine month period. I.e., 819 different shifts were observed and recorded in the file Stoppages.xls.

Relative Frequencies

• Relative frequencies were plotted to obtain the histogram seen in Stoppages.xls

• The relative frequency of each value X in the sample gives an estimate for the probability that X will assume that value. WHY?

• How did we obtain the relative frequencies?

• A histogram will give a good approximation for the graph of fX

Continuous Random Variables• A large random sample can also be

used to approximate the p.d.f of a continuous random variable

• One way we can obtain our p.d.f is by looking at smaller and smaller bin widths of our data

• Use the HISTOGRAM function in Excel to find the approximation of the graph of the p.d.f

Example #4

• The manager of the plant that was described in the the previous examples wants to get a better of understanding of the delays caused by stoppagesof the assembly line. So, in addition to knowing how many stoppages there are, the manager wants to know how long they last.

Example #4 (continued)

• Let T be the (exact) length of time, in minutes, that a randomly selected stoppage will last

• QUESTION: Is T a continuous random variable?

• In Stoppages.xls, the duration of each stoppage was also recorded for all 819 shifts

• Therefore, we have a random sample of observations for T

Example #4 (continued)• Used the function HISTOGRAM in Excel to plot an

approximation of the p.d.f., fT • In Stoppages.xls, bin widths of 2 minutes are used• Since our bin width is 2, to make the area under

the graph be 1, we had to divide each relative frequency by 2 and then plot those “new” relative frequencies – Note: Here you are dividing the relative

frequency by the bin width – not the frequency by the bin width as stated in class• Thus, you find the relative frequency as you

did before and then divide it by the bin width• By connecting the midpoints of the tops of the

rectangles gives us an approximate curve

Using the approximated p.d.f

• We can use our plot to calculate probabilities

• For example, if we wanted to know P(2<T4), we could look at the corresponding area under the graph

• Note: P(2<T4) corresponds to an area under the graph between (2,4] which is a rectangle

• So, to find our probability, find the area of the rectangle

Focus on the Project

• We have a continuous random variable Rnorm which gives the normalized ratio of weekly closing prices on Disney stock (class project)

• Option Focus.xls contains 417 values of Rnorm

from 417 weekly closing ratios• They are considered to be independent

observations• Thus, make up a random sample of size 417

for Rnorm

Focus on the Project

• We can calculate sample mean which we know should be equal to what?

• We can create a plot using the relative frequencies

• Note: If your bin width is greater than 1, you will have to divide the relative frequency by your bin width to make the area under the curve be 1

• Graphing the midpoints at the tops of the bars will produce a line graph approximation for fnorm

What should you do?

• Plot an approximation of the probability density function for the normalized ratios of weekly closing prices

• The plot should be a line graph, where you are connecting the midpoints of the tops of the bars

• Remember, if your bin width is greater than 1 you will have to divide the relative frequencies by that width before you plot

• Find the sample mean of the normalized ratios – you already know what it should equal

Random Sampling Approximations of E(X), p.m.f, and p.d.f.

Documents

Transcript of Random Sampling Approximations of E(X), p.m.f, and p.d.f.