Comp. Genomics Recitation 3 The statistics of database searching.
Recitation 3
description
Transcript of Recitation 3
+
Recitation 3
+The Normal Distribution
+Probability Distributions
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.
+Distributions fit with different types of variables:Discrete variables: takes on a countable number of values -the number of job classifications in an agency -the number of employees in a department -the number of training sessions
Continuous variables: takes a countless (or super big) range of numerical values -temperature -pressure -height, weight, time -Dollars: budgets, income. (not strictly continuous) but they can take so many values that are so close that you may as well treat them that way
+Visualized Discrete vs. Continuous
The difference is being able to go on the discrete histogram and saying what is the prob. of a 3? and you can see it is .4. For a continuous variable, you would have to say a 1.1 and a 2.9 (give a range and take the area—integrate)
+Real life normal distribution
+The Normal Distribution Characteristics -continuous variables only
-The bell curve shape is familiar
-most values cluster around the mean mu
-As values fall at a greater distance from the mean, their likelihood of occurring shrinks
-Its shape is completely determined by its mean and its standard deviation -The height of the curve is the greatest at the mean (where probability of occurrence is highest)
+ -68.26% of values fall within one standard deviation of the mean in either direction-95.44% of values fall within 2 standard deviations of the mean in either direction-99.72% of values fall within 3 standard deviations of the mean in either direction
+z scores• The number of standard deviations a score of interests lies
away from the mean in a normal distribution
• It is used to convert raw data into their associated probability of occurrence with reference to the mean
• The score we are interested in is X. To find the z score of X, subtract the mean mu from it then divide by the amount of standard deviations (sigma) to determine how many SD’s the score is from the mean
+z scores• The z score itself equals the number of SD’s (sigma) that a
score of interest (X) is from the mean (mu) in a normal distribution
• A data value X one standard deviation above the mean has a z score of 1
• A data value X 2 SD’s above the mean has a z score of 2• The probability associated with a z score of one is 0.3413; see
below in the blue oval: (68.26/2)=34.13% of the data values lie between the mean and 1 SD above it
+z scores• The z score for 1 SD below the mean will be the same in magnitude
(0.3413) but with a negative z score of -1.0• Thus, the z score of -1.0 contains 34.13% of the data
i.e. just over one third of the data fall between mu and 1 SD below it
+Example: What is the likelihood that a value has a z score of 2.0?
+Example: What is the likelihood that a value has a z score of 2.0?
It is equal to 95.55/2=47.72% (Meaning, just over 47% of the data fall between mu and 2 SD above it)
+The normal distribution table
• Displays the percentage of data values falling between the mean mu and each z score
• the first 2 digits are in the far left column
• the third digit is on the top row • The associated probability is
where they meet
+Locating z score for 1.0
+exampleWhat percent of the data lies between mu and 1.33 SD away?
+Locate 1.33 on the z table
The answer is that 40.824% of the data fall between the mean and 1.33 SD from the mean
+Application Example:
The police chief is reviewing the academy’s exam scores. The police department’s entrance exam has a normal distribution with a mean of 100 and SD of 10. Someone scored 119.2 on the exam. Is this a good score?
+Solution-another way of asking this is: what is the probability that any random applicant takes the test and scores a 119.2?-If the probability is high, then it is an average or mediocre score, if the probability is low, then it is an exceptional score
-Step 1: convert the test score to a z score using the formula:
(119.2-100)/10=1.92
+Solution-Step 2: Use the z score of 1.92 (how many standard deviations the score is above the mean, since it is a positive z score) Look it up in the z table.
+Solution-Step 3: The value here is .4726-But you’re not done. Here is what you just found:
-We also need to add in the part of the curve shaded in green, or all ofThe scores under the mean.
(0.5+0.4726=) 97.26 is the percentile, or in other words, 97.26% of the scores fall below this score-The probability that a randomly selected individual will get this score or better is 1-97.26=.0274
.4726.50
+Tips Always draw a picture, it helps you reason through your
answer The z curve is symmetric, so if a your score was a -
1.92, it would still contain ~47.2% of the data.
+
The Binomial Distribution
The last section, I promise.
+A gem from the reading
+Probability Distributions
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.
+Binomial Distribution Definition• The probability an event will occur a
specified number of times within a specified number of trials
• Examples: mail will be delivered before a certain time every day this weekequipment in a factory remains operational in a 10 day period • This is a DISCRETE distribution that
deals with the likelihood of observing a certain number of events in a set number of repeated trials
+A Bernoulli process• The Binomial distribution can be used when the
process is Bernoulli • Bernoulli characteristics:• The outcome of a trial is either a success or a
failure • The outcomes are mutually exclusive• The probability of a success is constant from
trial to trial • One trial’s probability of success is not
affected by the trial before it (INDEPENDENCE)• Examples of independent events could
be multiple coin tosses , A fire occurring in a community isn’t affected by if one happened the night before
+When looking at Bernoulli EventsYou can calculate their probability with
the binomial distribution• Examples of Bernoulli events:
• coin flip is either heads, or not heads
• A crime is either solved or not solved
+To calculate a probability using the Binomial Distribution you need• n=number of trials• r=number of successes• p=probability that the
event will be a success• q=(1-p)
+Breaking down the formula
Is a combination, it is read, “a combination of n THINGS taken r at a time”
The formula is:
+ExampleWe flip a coin three times, and we want to know the probability of getting three heads
+Step 1
Define N, P, R, and Q
n (number of trials) =3r (successes)=3 [number of heads]p (probability of getting a heads on a flip)= 0.5q (1-p)=0.5
Now fill in the formula
+Important when solving• 0!=1• Any number raised to the power of 0
= 1
+Example 2A Public works department has been charged with discrimination. Last year, 40% of people who passed the civil service exam were minorities (eligible to be hired). From this group, Public works hired 10 people, and 2 were minorities. What is the probability that if Public works DID NOT discriminate it still would have hired 2 or fewer minorities? (assuming everyone had the same probability of getting hired)
+Step 1
identify n, p, r, and q• n (number of trials=number of
people hired) =10• r (successes)=2 [number of hired
minorities]• p (probability of getting hired=% of
minorities in the pool)= 0.4• q (1-p)=0.6
+Step 2
Reason through the problem. It asks the likelihood that Public Works hired 2 or fewer minorities. Thus, we need to calculate the binomial for 2 hires, 1 hire, and 0 hires.
+Step 3set up the probability calculations:Two minorities:n (number of trials=number of people hired) =10r (successes)=2 [number of hired minorities]p (probability of getting hired=% of minorities in the pool)= 0.4q (1-p)=0.6
(10!/2!8!) * 0.4^2 * .6^8 = 0.12
+Formula reminder
Binomial:
Combinations:
+Step 4 Repeat for one minority hired
One minority
n (number of trials=number of people hired) =10r (successes)=1 [number of hired minorities]p (probability of getting hired=% of minorities in the pool)= 0.4q (1-p)=0.6
(10!/1!9!) * 0.4^1 * 0.6 ^9 = 0.04
+Step 5 Repeat for 0 minorities hired
No minorities at all
n (number of trials=number of people hired) =10r (successes)=0 [number of hired minorities]p (probability of getting hired=% of minorities in the pool)= 0.4q (1-p)=0.6
(10!/0!10!) * 0.4^0 * 0.6^10=.006
+Step 6
Add these probabilities together: =0.166 The likelihood of hiring 2 minorities by chance is !6%