What is statistic. Statistics is a tool for creating an understanding from a set of numbers.
-
Upload
helena-harper -
Category
Documents
-
view
215 -
download
1
Transcript of What is statistic. Statistics is a tool for creating an understanding from a set of numbers.
What is statistic
• Statistics is a tool for creating an understanding from a set of numbers.
An Example: Stats Anxiety.
Key Statistical Concepts. . .
• Population• — a population is the group of all items of interest to a• statistics practitioner.• — frequently very large; sometimes infinite.• E.g. All 5 million Florida voters • Sample• — A sample is a set of data drawn from the population.• — Potentially very large, but less than the population.• E.g. a sample of 765 voters exit polled on election day.
• Parameter• — A descriptive measure of a population.• Statistic• — A descriptive measure of a sample.
Descriptive Statistics. . .
• . . . are methods of organizing, summarizing, and presenting data in a convenient and informative way. These
• methods include:• Graphical Techniques , and• Numerical Techniques .• The actual method used depends on what information we
would like to extract.
Inferential Statistics. . .
• Descriptive Statistics describe the data set that’s being analyzed, but doesn’t allow us to draw any conclusions or make any interferences about the data. Hence we need another branch of statistics: inferential statistics.
• Inferential statistics is also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.
• We use statistics to make inferences about parameters.• Therefore, we can make an estimate, prediction, or decision
about a population based on sample data.• Thus, we can apply what we know about a sample to the
larger population from which it was drawn!• Rationale:• Large populations make investigating each member
impractical and expensive.• Easier and cheaper to take a sample and make estimates
about the population from the sample.
• However:• Such conclusions and estimates are not always going to be correct. For this
reason, we build into the statistical inference “measures of reliability,” namely confidence level and significance level.
Confidence and Significance Levels. . .
• The confidence level is the proportion of times that an estimating procedure will be correct.
• E.g. a confidence level of 95% means that, estimates• based on this form of statistical inference will be correct 95%
of the time.• When the purpose of the statistical inference is to draw a
conclusion about a population, the significance level measures how frequently the conclusion will be wrong in the long run.
• E.g. a 5% significance level means that, in the long run, this type of conclusion will be wrong 5% of the time.
• If we use α (Greek letter “alpha”) to represent significance, then our confidence level is 1 − α.
• This relationship can also be stated as:• Confidence Level + Significance Level = 1• Consider a statement from polling data you may hear about in
the news:• “This poll is considered accurate within 3.4 percentagepoints,
19 times out of 20.”• In this case, our confidence level is 95% (19/20 = 0.95), while
our significance level is 5%.
Random Variables. .
• variability is omnipresent in the business world. To model variability probabilistically, we need the concept of a random variable.
• A random variable is a numerically valued variable which takes on different values with given probabilities.
• Examples:• The return on an investment in a one-year
period• The price of an equity• The number of customers entering a store• The sales volume of a store on a particular day• The turnover rate at your organization next
year
Types of Random Variables. . .
• Discrete Random Variable:• — one that takes on a countable number of
possible• values, e.g.,• • total of roll of two dice: 2, 3, . . . , 12• • number of desktops sold: 0, 1, . . .• • customer count: 0, 1, . . .
• Continuous Random Variable:• — one that takes on an uncountable number of
possible• values, e.g.,• • interest rate: 3.25%, 6.125%, . . .• • task completion time: a nonnegative value• • price of a stock: a nonnegative value• Basic Concept: Integer or rational numbers are
discrete, while real numbers are continuous.
Probability Distributions. . .
• Random variables have values that are determined by chance events. The future price of a share of stock is a random variable because its value is determined by chance factors such as market conditions, the accomplishment of revenue targets by the company, interest rates, and so on.
• Random variables can be either discrete or continuous. A random variable is discrete if it can assume only a finite number of values or if its values are distinct and separate units.
• For example, the number of boxes of cookies produced during a given shift is a discrete random variable, because each box is a distinct, whole unit; a manufacturer would not produce or measure half a box of cookies.
• continuous random variables can assume any range of values along a continuum. Consider boxes of cookies again. The weight of a box of cookies is a continuous random variable because it can be measured using an infinite range of fractional values.
• For example, the weight could assume values such as 16 ounces, 16.24 ounces, 16.2411 ounces, or any of a range of fractional values.
• Consider the experiment of tossing a single die. Define X as the number of spot on the up face of the die after a toss. Then R = (I. 2. 3. 4. 5. 6). Assume the die is loaded so that the probability that a given face lands up is proportional to the number of spot showing. The discrete probability distribution for this random experiment is given by
Population Mean — Expected Value. . .
• The population mean is the weighted average of all of its values. The weights are specified by the probability mass function. This parameter is also called the expected value of X and is denoted by E(X).
• The formal definition is similar to computing sample mean for grouped data:
• Example: Expected No. of TVs• Let X be the number of TVs in a household.• Then,• E(X) = 0 · 0.012 + 1 · 0.319 + · · · + 5 · 0.028 = 2.084
Population Variance. . .• The population variance is calculated similarly. It is the weighted
average of the squared deviations from the mean. Formally
• Since (2) is an expected value (of (X − µ) 2 ), it should be interpreted as the long-run average of squared deviations
• from the mean. Thus, the parameter σ2 is a measure of the extent of variability in successive realizations of X.
• 1. Terminals on an on-line computer system are attached to a communication line to the central computer system. The probability that any terminal is ready to transmit is 0.95.
• Let X = number of terminals polled until the firstready terminal is located.
• 2. Toss a coin repeatedly.• Let X = number of tosses to first head• 3. It is known that 20% of products on a production line are
defective. Products are inspected until first defective is encountered.
• Let X = number of inspections to obtain first defective
Poisson distribution• The Poisson distribution is a discrete distribution. It is often used as a
model for the number of events (such as the number of telephone calls at a business, number of customers in waiting lines, number of defects in a given surface area, airplane arrivals, or the number of accidents at an intersection) in a specific time period.
• The major difference between Poisson and Binomial distributions is that the Poisson does not have a fixed number of trials. Instead, it uses the fixed interval of time or space in which the number of successes is recorded.
is the parameter which indicates the average number of events in the given time interval.
• Parameters: The mean is λ. The variance is λ.
• λ
• Consider a computer system with Poisson job-arrival stream at an average of 2 per minute. Determine the probability that in any one-minute interval there will be
• (i) 0 jobs;• (ii) exactly 2 jobs;• (iii) at most 3 arrivals.• (iv) What is the maximum jobs that should
arrive one minute with 90 % certainty?
Hypergeometric Distribution
• The probability distribution of a hypergeometric random variable is called a hypergeometric distribution
• The following notation is helpful, when we talk about hypergeometric distributions and hypergeometric probability.
• N: The number of items in the population.• k: The number of items in the population that are classified as successes.• n: The number of items in the sample.• x: The number of items in the sample that are classified as successes.• kCx: The number of combinations of k things, taken x at a time.• h(x; N, n, k): hypergeometric probability - the probability that an n-trial
hypergeometric experiment results in exactly x successes, when the population consists of N items, k of which are classified as successes.
Hypergeometric Experiments
• A hypergeometric experiment is a statistical experiment that has the following properties:
• A sample of size n is randomly selected without replacement from a population of N items.
• In the population, k items can be classified as successes, and N - k items can be classified as failures.
• Consider the following statistical experiment. You have an urn of 10 marbles - 5 red and 5 green. You randomly select 2 marbles without replacement and count the number of red marbles you have selected. This would be a hypergeometric experiment.
Hypergeometric Distribution
• A hypergeometric random variable is the number of successes that result from a hypergeometric experiment. The probability distribution of a hypergeometric random variable is called a hypergeometric distribution.
• Given x, N, n, and k, we can compute the hypergeometric probability based on the following formula:
• Hypergeometric Formula. Suppose a population consists of N items, k of which are successes. And a random sample drawn from that population consists of nitems, x of which are successes. Then the hypergeometric probability is:h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
• The hypergeometric distribution has the following properties:
• The mean of the distribution is equal to n * k / N .
• The variance is n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] .
Example 1• Suppose we randomly select 5 cards without replacement from an ordinary
deck of playing cards. What is the probability of getting exactly 2 red cards (i.e., hearts or diamonds)?
• Solution: This is a hypergeometric experiment in which we know the following:
• N = 52; since there are 52 cards in a deck.• k = 26; since there are 26 red cards in a deck.• n = 5; since we randomly select 5 cards from the deck.• x = 2; since 2 of the cards we select are red.• We plug these values into the hypergeometric formula as follows:• h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
h(2; 52, 5, 26) = [ 26C2 ] [ 26C3 ] / [ 52C5 ] h(2; 52, 5, 26) = [ 325 ] [ 2600 ] / [ 2,598,960 ] = 0.32513
• Thus, the probability of randomly selecting 2 red cards is 0.32513.
Multinomial
• The Binomial distribution was based on having a series of events that could take on only two states: success/failure, sick/well, heads/tails, et cetera.
• But what if there are several possible events, like left/right/center, or Africa/Eurasia/Australia/Americas? The Multinomial distribution extends the Binomial distribution for such cases.
• The Binomial case could be expressed with one parameter, p, which indicated success with probability p and failure with probability 1 − p. The Multinomial case requires k variables, p1, . . . , p k, such that
• The binomial distribution allows one to compute the probability of obtaining a given number of binary outcomes. For example, it can be used to compute the probability of getting 6 heads out of 10 coin flips. The flip of a coin is a binary outcome because it has only two possible outcomes: heads and tails. The multinomial distribution can be used to compute the probabilities in situations in which there are more than two possible outcomes.
• For example, suppose that two chess players had played numerous games and it was determined that the probability that Player A would win is 0.40, the probability that Player B would win is 0.35, and the probability that the game would end in a draw is 0.25. The multinomial distribution can be used to answer questions such as: "If these two chess players played 12 games, what is the probability that Player A would win 7 games, Player B would win 2 games, and the remaining 3 games would be drawn?" The following formula gives the probability of obtaining a specific set of outcomes when there are three possible outcomes for each event:
• where• p is the probability,
n is the total number of eventsn1 is the number of times Outcome 1 occurs,n2 is the number of times Outcome 2 occurs,n3 is the number of times Outcome 3 occurs,p1 is the probability of Outcome 1p2 is the probability of Outcome 2, andp3 is the probability of Outcome 3.
• For the chess example,• n = 12 (12 games are played),
n1 = 7 (number won by Player A),n2 = 2 (number won by Player B),n3 = 3 (the number drawn),p1 = 0.40 (probability Player A wins)p2 = 0.35(probability Player B wins)p3 = 0.25(probability of a draw)
Continuous Distributions
Normal Distribution
83
The lognormal distribution• A random variable x is lognormally distributed if ln(x) is
normally distributed– If x is normal, and ln(y) = x (or y = ex), then y is lognormal– If continuously compounded stock returns are normal then the
stock price is lognormally distributed
• Product of lognormal variables is lognormal– If x1 and x2 are normal, then y1=ex
1 and y2=ex2 are lognormal.
– The product of y1 and y2: y1 x y2 = ex1 x ex
2 = ex1+x
2
– Since x1+x2 is normal, ex1+x
2 is lognormal
Lognormal Distribution – Probability Density Function
• A random variable X is said to have the Lognormal Distribution with parameters and , where > 0 and > 0, if the probability density function of X is:
• • , for X >0
• , for X 0 • • •
22
xln2
1
e2x
1 )x(f
0
x
f(x)
0
• If X ~ LN(,),•
• then Y= ln (X) ~ N(,)
Lognormal Distribution - Probability Distribution Function
xFxXPxF
ln )()(
where F(z) is the cumulative probability distribution function of N(0,1)
Lognormal Distribution - Example• A theoretical justification based on a certain material failure
mechanism underlies the assumption that ductile strength X of a material has a lognormal distribution.
• If the parameters are µ=5 and σ=0.1 ,• Find: (a) µx and σx
(b) P(X >120)(c) P(110 ≤ X ≤ 130)(d) The median ductile strength(e) The expected number having strength at least 120, if ten
different samples of an alloy steel of this type were subjected to a strength test.
The probability density function of a log-normal distribution is
Negative binomial• Say that we have a sequence of Bernoulli draws. How many failures will
we see before we see n successes? If p percent of cars are illegally parked, and a meter reader hopes to write n parking tickets, the Negative binomial tells her the odds that she will be able to stop with n + x cars.
Gamma distribution
• A better name in the statistical context would be ‘Negative Poisson,’ because it relates to the Poisson distribution in the same way the Negative binomial relates to the Binomial.
• If the timing of events follows a Poisson distribution, meaning that events come by at the rate of λ per period, then this distribution tells us how long we would have to wait until the nth event occurs
• The form of the Gamma distributionis typically expressed in terms of a shape parameter θ ≡ 1/λ, where λ is the Poisson parameter.
• Just as the Gamma distribution is named for the Gamma function, the Beta distribution is named after the Beta function—whose parameters are typically notated as α and β
Bivariate Distributions. . .
• Up to now, we have looked at univariate distributions, i.e., probability distributions in one variable.
• Bivariate distributions, also called joint distributions, are probabilities of combinations of two variables.
• For discrete variables X and Y , the joint probability distribution or joint probability mass function of X and Y is defined as:
• P(x, y) ≡ P(X = x and Y = y)• for all pairs of values x and y.
Marginal Probabilities
• Covariance and correlation describe how two variables are related.
• Variables are positively related if they move in the same direction.
• Variables are inversely related if they move in opposite directions.
• Both covariance and correlation indicate whether variables are positively or inversely related. Correlation also tells you the degree to which the variables tend to move together.
• You are probably already familiar with statements about covariance and correlation that appear in the news almost daily.
• For example, you might hear that as economic growth increases, stock market returns tend to increase as well. These variables are said to be positively related because they move in the same direction. You may also hear that as world oil production increases, gasoline prices fall. These variables are said to be negatively, or inversely, related because they move in opposite directions.
• To determine the actual relationships of these variables, you would use the formulas for covariance and correlation.
• Covariance• Covariance indicates how two variables are related.
A positive covariance means the variables are positively related, while a negative covariance means the variables are inversely related. The formula for calculating covariance of sample data is shown below.
• To understand how covariance is used, consider the table below, which describes the rate of economic growth (xi) and the rate of return on the S&P 500 (yi).
• Using the covariance formula, you can determine whether economic growth and S&P 500 returns have a positive or inverse relationship. Before you compute the covariance, calculate the mean of x and y
Correlation
• correlation also tells you the degree to which the variables tend to move together.
• covariance measures variables that have different units of measurement. Using covariance, you could determine whether units were increasing or decreasing, but it was impossible to measure the degree to which the variables moved together because covariance does not use one standard unit of measurement. To measure the degree to which variables move together, you must use correlation.
• Correlation standardizes the measure of interdependence between two variables and, consequently, tells you how closely the two variables move. The correlation measurement, called a correlation coefficient, will always take on a value between 1 and – 1:
• If the correlation coefficient is one, the variables have a perfect positive correlation. This means that if one variable moves a given amount, the second moves proportionally in the same direction.
• If correlation coefficient is zero, no relationship exists between the variables. If one variable moves, you can make no predictions about the movement of the other variable; they are uncorrelated.
• If correlation coefficient is –1, the variables are perfectly negatively correlated (or inversely correlated) and move in opposition to each other. If one variable increases, the other variable decreases proportionally.
• To understand how correlation is used, consider the table below, which describes the rate of economic growth (xi) and the rate of return on the S&P 500 (yi).
• Using the correlation formula, you can determine whether economic growth and S&P 500 returns have a positive or inverse relationship.
• you know that the covariance of S&P 500 returns and economic growth was calculated to be 1.53. Now you need to determine the standard deviation of each of the variables. You would calculate the standard deviation of the S&P 500 returns and the economic growth
• Using the information from above, you know that• COV(x,y) = 1.53
sx = 0.90sy = 2.58
• Now you can calculate the correlation coefficient by substituting the numbers above into the correlation formula, as shown below.
• A correlation coefficient of .66 tells you two important things:
• Because the correlation coefficient is a positive number, returns on the S&P 500 and economic growth are postively related.
• Because .66 is relatively far from indicating no correlation, the strength of the correlation between returns on the S&P 500 and economic growth is strong.