Probability and Statistics Review pt 1
-
Upload
zesi-villamor-delos-santos -
Category
Documents
-
view
40 -
download
7
description
Transcript of Probability and Statistics Review pt 1
PROBABILITY &
STATISTICS
Prepared by:
Ms. KAREN S. TAFALLA
INTRODUCTION
STATISTICS is a collection of methods for planning experiments, obtaining data, and then organizing, summarizing, analyzing, interpreting, and drawing conclusions based on the data.
DESCRIPTIVE STATISTICS consists of procedures used to summarize and describe the important characteristics of a set of measurements.
INFERENTIAL STATISTICS consists of procedures used to make inferences about population characteristics from
information contained in a sample drawn from this population.
"The theory of statistics uses probability to measure the
uncertainty associated with an inference. It enables us to
calculate the probabilities of observing specific samples,
under specific assumptions about the population. The
statistician uses these probabilities to evaluate the
uncertainties associated with sample inferences."
Definition of terms:
Data are information or facts necessary to conduct a certain
study.
A variable is a characteristic that changes or varies over time
and/or for different individuals or object under
consideration.
A random variable is a variable whose numerical is
determined by the outcome of some chance experiment.
An experimental unit is the individual or object on which a
variable is measured. A single measurement or data
value results when a variable is actually measured on an
experimental unit.
The population in a statistical study is the group of objects
drawn about which conclusions are to be drawn.
A sample is a subset of measurements selected from the
population of interest.
A parameter is a numerical measurement describing some
characteristics of a population and a statistic is a
numerical measurement describing some characteristics
of a sample
Univariate data result when a single variable is measured on
a single experimental unit.
Bivariate data result when two or more variables are
measured on a single experiment unit.
Multivariate data results when more than two variables are
measured.
A. Types of variables:
Qualitative variable measures a quality or characteristic on
each experiment unit.
Ex. - taste ranking: excellent, good, fair, poor,
- color of M&M candy: brown, yellow, red orange,
green, blue
Quantitative variable measures a numerical quantity or
amount on each experiment unit.
Ex. - weight of package ready to be shipped
- volume of orange juice in a glass
Types of Quantitative Data:
Discrete data results from either a finite of possible values or
countable number of possible values (That is, the number
of possible values is 0, 1 or 2, and so on)
Continuous data results from many possible values that can
be associated with points on a continuous scale in such a
way that there are no gaps or interruptions.
B. Four Levels of Measurement:
The nominal level of measurement is characterized by the
data that consist of names, labels or categories only, and the
data cannot be arranged in an ordering scheme.
Ex. - collection of “ yes, no, undecided” responses to a
survey question.
- responses consisting of 10 nurses, 15 teachers,
16 engineers, 5 priests, 20 businessmen.
The ordinal level of measurement involves data that may
be arranged in some order, but differences between data
values either cannot be determined or are meaningless.
Ex. -In a sample of 24 car stereos, 15 were rated
“good”, 6 were rated “better”, 3 were rated “ best”
-in considering employee promotion, a manager
ranked Myrna 3rd, Al 7th, and Jena 10th
The Interval level measurement is like the ordinal level, with
the a additional that meaningful amounts of differences
between data can be determined. However, there is no
inherent zero stating point.
Ex. -body temperatures ( in degrees Celsius )
The ratio level of measurement is the interval level modified
to include the inherent zero starting point. For values at
this level, differences and ratios are meaningful.
Ex. -heights of pine trees along Session road.
- temperature readings on Kelvin Scale since
the scale ha s an absolute zero
Classify the following statements as belonging to the area of descriptive statistics or statistical inference:
(a) As a result of recent cutbacks by the oil-producing nations, we can expects the price of gasoline to double in the next years.
(b) At least 5% of all fires reported last year in a certain city were deliberately set by arsonists.
(c) Of all patients who have received this particular type of drug at a local clinic, 60% later developed significant side effects.
(d) Assuming that less than 20% of the Columbian coffee beans were destroyed by frost this past winter, we should expect an increase of no more than 30 cents for a kilogram of coffee by the end of the year
(e) As a result of a recent poll, most Americans are in favor of building additional nuclear power plants.
EXERCISES: Understanding the concepts
A. Identify the experimental units on which the ff.
Variables are measured:
1. Gender of student
2. Number of errors on a midterm exam
3. Age of a cancer patient
4. Number of flowers on an azalea plant
5. Color of a car entering the parking lot
B. Identify each variable as quantitative or qualitative:
1. Amount of time it takes to assemble a simple puzzle
2. Number of students in a first grade classroom
3. Rating of newly elected politician ( excellent, good,
fair, poor )
4. State in which a person lives.
C. Identify the following quantitative variables as discrete
or continuous:
1. Population in a particular area of the Philippines
2. Weight of newspapers recovered for recycling on a
single day.
3. Time to complete a probability exam
D. A data set consist of the ages at death for each of the
41 past president of the United States
1. Is this a set of measurements a population or a
sample?
2. What is the variable being measured?
3. Is the variable in part b quantitative or qualitative?
E. Determine which of the four level of measurement is
most appropriate:
1. Weights of a sample of M&M candies
2. Instructors rated as superior, above average, average,
or poor
3. Lengths (in minutes) of movies
4. Zip codes
5. Movies listed according to their genre, such as comedy,
adventure, and romance
FREQUENCY DISTRIBUTION
When the set of data includes a large number of
observe values. It becomes practical to group the data into
classes or categories with the corresponding number of
terms falling into each class. The result is a tabular
arrangement called a frequency distribution.
Definition of terms:
A frequency table categories (or classes) of scores,
along with counts (or frequencies) of the number of scores
that fall into each category.
The frequency for a particular class is the number of
original scores that fall into that class.
Lower class limits are the smallest number that can actually
belong to the different classes.
Upper class limits are the largest number that can actually
belong to the different classes.
Class boundaries are the numbers used to separate
classes, but without the gaps created by the class limits.
They are obtained increasing the upper class limits and
decreasing the lower class limits by the same amount so
that there are no gaps between consecutive classes. The
amount be added or subtracted is one-half the difference
between the upper limit of one class and the lower limit of
the following class.
Class marks are the midpoints of the classes. They can be
found by adding lower class limits and dividing by 2.
Class width or Class size is the difference between two
consecutive lower class limits or two consecutive lower
class boundaries.
Relative Frequency ratio of the class frequency to the total
frequency
Cumulative Frequency accumulated frequency that is <, > to
a stated value. We obtain the > cumulative frequency if the
frequencies are summed from bottom up to find the
number of observations greater than a specified lower
class boundary. The less than cumulative is constructed if
the frequencies are summed from top down to find the
number of observations less than a particular upper class
boundary.
A. Steps in constructing Frequency table.
Step 1: Count the number of data points in the set of data.
Step 2: Determine the range R, for the entire data set. The
range is the smallest value in the set of data subtracted
from the largest value
Step 3: Decide on the number of the class intervals. The
ideal number of class intervals is somewhere between 5
and 15. To approximate the appropriate number of class
intervals, we may use Herbert Sturges’ Formula
K = 1 +3.322 log n
Where K stands for the number of classes suggested and
n represents the total frequency. Avoid having too many
classes or too few classes. Too many classes may lead to
several empty classes. Too few classes tend to lose
important details of the data.
Step 4: Determine the class width by dividing the number
of classes into the range. Round the result up to a
convenient number. This rounding up ( not off ) not only
is convenient, but also guarantees that all of the data will
be included in frequency table.
Class width ( i ) = round up of ( range/number of classes )
Step 5: Select as the lower limit of the first class either the
lower score or convenient value slightly less than the
lowest score. This value serve as the starting point.
Step 6: Add the class width to the starting point to get
the second lower class limit. Add the class width to the
second lower class limit to get the third, so on.
Step 7: List the lower class limits in a vertical column,
and enter the upper class limits, which can be easily
identified at this stage.
Step 8: Represent each score by a tally in the
appropriate class.
Step 9: Replace the tally marks in each class with the
total frequency count for that class.
Example: The test scores of sixty students in Statistics are recorded as follows:
78 51 61 74 68 78
62 71 88 72
66 77 82 68 68 73 56 82 66 71
58 75 67 75 86 66 70 71 64 73
85 74 62 84 66 92 91 57 61 78
63 73 58 79 61 83 88 81 75 57
68 70 54 79 62 78 59 70 66 81
1. Number of data points = 60
2. Range = 92 – 51 = 41
3. 3. Using Sturges’ formula, K = 1 +3.322 log 60 = 7.
Therefore, class intervals is seven.
4. The class size or width is computed as i = 41/7 = 5.86 = 6
Instead of starting the first class at 51, choose to start
at the nice round number 50.
Thus , the first class is 50- 55. Adding 6 to both limits, we
obtain the next interval 56-61.
CLASS
INTERVAL
CLASS
BOUNDARIES MIDPOINT TALLY FREQUENCY
50 – 55 49. 5 – 55.5
56 – 61 55.5 – 61.5
62 – 67 61.5 – 67.5
68 – 73 67.5 – 73.5
74 – 79 73.5 – 79.5
80 – 85 79.5 – 85.5
86 – 91 85.5 – 91.5
92 – 97 92.5 – 97.5
3. The number of television viewing hours per household and the prime viewing times are two factors that affect television advertising income, A random sample of 50 households in a particular viewing area produced the following estimated of viewing hours per household.
a. Starting with the lowest value as the lower class limit,
construct a frequency distribution.
b Determine the class marks, class boundaries, relative
frequency, <CF, and >CF.
3.0 6.0 7.5 15.0 12.0 6.6 9.5 14.5 10.5 11.0
6.5 8.0 4.0 5.5 6.0 5.6 13.3 13.1 5.5 12.5
5.0 12.0 1.0 3.5 3.0 2.4 3.8 4.5 8.0 2.5
7.5 5.0 10.0 8.0 3.5 2.6 8.5 2.5 6.4 7.6
9.0 2.0 6.5 1.0 5.0 7.7 9.3 6.5 8.2 8.8
GRAPHICAL REPRESENTATION OF FREQUENCY
DISTRIBUTION
A histogram or frequency histogram, is a bar
graph which consist of a set of rectangles while the
frequency polygon is a line graph. Both graphs are
intended to show more salient features of the frequency
distribution.
a. HISTOGRAM
The histogram is a set of vertical bars having their bases
or the horizontal axes which center on the class marks.
The width corresponds to the class marks and the height
correspond to the frequencies.
A histogram differs from a bar chart in the bases of each
bar are the class boundaries rather than the class limits.
b. FREQUENCY POLYGON
The frequency polygon is a modification of the histogram;
only, the frequency polygon is line graph where the class
frequencies is plotted against the class marks. To close the
polygon, an extra class mark at each end must be added. The
frequency polygon can also be obtained by connecting
midpoints of the tops of the rectangles in the histogram.
c. OGIVES
A line graph showing the cumulative frequency of distribution
is called an ogive. For the “less than” ogive, the “less than”
cumulative frequencies are plotted against the upper class
boundaries. For the “greater than” ogive, the greater than
cumulative frequencies are plotted directly above the lower
class boundaries. These graphs are useful in estimating the
number of observations that are less than or more than a
specified value.
STEM AND LEAF PLOTS
Another simple way to display the distribution of a
quantitative data set is the stem and leaf plot. This
procedure was introduced by Tukey and is one of the
primary tools of explanatory data analysis. A stem and leaf
diagram consists of a series of horizontal rows of
numbers. The number used to label a row is called a stem,
and the remaining numbers in the row are called leaves..
Steps:
1. Divide each measurement into two parts: the stem and
the leaf.
2. List the stem in a column, with a vertical line to their right.
3. For each measurement, record the leaf potion in the
same row as its corresponding stem.
4. Order the leaves from the lowest to highest in each stem.
5. Provide a key to your stem and leaf coding so that the
reader can recreate the actual measurements if
necessary.
Sometimes the available stem choices result in a plot that
contains too few stems and a large number of leaves
within each stem. In this situation, you can stretch the
stems by dividing each one into several lines, depending
on the leaf values assigned to them. Stems are usually
divided in one of two ways:
Into two lines, with leaves 0-4 in the first line and
leaves 5-9 in the second line.
Into five lines, with leaves 0-1, 2-3, 4-5, 6-7, and
8-9 in the five lines respectively.
Example:
The data below ate the GPAs of 30 Adamson University
freshmen, recorded at the end of the freshmen year.
Construct a stem and leaf plot to display the distribution
of the data.
2.0 3.1 1.9 2.5 1.9 2.3 2.6 3.1 2.5 2.1
2.9 3.0 2.7 2.5 2.4 2.7 2.5 2.4 3.0 3.4
2.6 2.8 2.5 2.7 2.9 2.7 2.8 2.2 2.7 2.1
DESCRIPTIVE STATISTICS
MEASURES OF CENTRAL TENDENCY
A measure of central tendency gives a single
value that acts as a representative average of
the values of all the outcomes of your
experiment. Three parameters that measure the
center of the distribution in some sense are of
interest. These parameters, called the
population mean, the population median and the
population mode.
a. THE MEAN
For Ungrouped Data:
Let x1 , x2 , x3 ,…. xn be n observations of a random variable X. The sample mean, denoted by x, is the arithmetic average of these values. That is,
_ x1 + x2 + x3 +…+ xn
x (x-bar) = -------------------------------
n
For Grouped Data
_ fi xi
x (x-bar) = ----------
fi
Where: fi is the frequency of class interval i
xi is the class midpoint of class interval i
k
i =1
i = 1
k
B. THE MEDIAN
For Ungrouped Data: Let x1 , x2 , x3 ,…. xn be a sample observations arranged in the order of smallest to largest. The
sample median for this collection is given by the middle observation if n is odd. If n is even, the sample median is the average of the two middle observations.
For Grouped Data: When the data are grouped into a frequency distribution, the median is obtained by finding the cell
that has the middle umber and then interpolating within the cell. n/2 – <cfi-1 n/2 – >cfi-1 x = Lbi + -------------------- (i) OR x = Ubi - -------------------- (i) fi fi where: Lbi = lower class boundary of the interpolated interval Ubi = lower class boundary of the interpolated interval <cfi-1 = less than cumulative frequency of the class before interpolated interval >cfi-1 = greater than cumulative frequency of the class before interpolated interval fi = frequency of the interpolated interval i = class size n = number of data points
~ ~
_
C. THE MODE
The last measure of central tendency is the mode. For a finite population, the population mode is the value of X that occurs most often. The mode of a sample is the value that occurs most often in the sample. The drawback to this measure is that there might not be a unique mode. There might be no single number that occurs more often that any another. For this reason, the mode is not a particularly useful descriptive measure.
When the data are grouped into a frequency distribution, the midpoint of the cell with the highest frequency is the mode, since this point represents the highest point (greatest frequency).
EXAMPLES:
1. The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 4.3 seconds. Calculate the mean, median and mode.
2.5 + 3.6 + 3.1 + 4.3 + 2.9 + 2.3 + 2.6 + 4.1 + 4.3
Mean = ------------------------------------------------------------------
9
Mean = 3.3
Median : 2.3, 2.5, 2.6, 2.9, , 3.6, 4.1, 4.3, ,4.3
Median = 3.1
Mode = 4.3
3.1
2. The frequency table (on the right side) represent the final
examination for an statistics course. Find the mean, the median and the mode.
Class Interval Frequency Class mark Cumulative
Frequency
<CF
10– 19 3 14.5 3
20 – 29 2 24.5 5
30 – 39 3 34.5 8
40 – 49 4 44.5 12
50 – 59 5 54.5 17
60 – 69 11 64.5 28
70 – 79 14 74.5 42
80 – 89 14 84.5 56
90 – 99 4 94.5 60
fi xi
Mean = ---------------
fi
(3)(14.5) + (2)(24.5) +( 3)(34.5) + (4)(44.5) + (5)(54.5) +
(11)(64.5) + 14(74.5)+ (14)(84.5) +(4)(94.5)
Mean = --------------------------------------------------------------------------------
3 + 2 + 3 + 4 + 5 + 11 + 14 + 14 + 14
Mean = 66
n/2 – <cfi-1
Median = Lb + -------------------- (i)
fi
60/2 – 28
Median = 69.5 + -------------------- (10)
14
Median = 70.93
Mode = Classmark with the highest frequency
Mode = 74.5 and 84.5
MEASURES OF VARIABILITY
Refers to the extent of scatter or dispersion around the
zone of central tendency
A. RANGE One measure of variation is the range, which has the advantage of
being very easy to compute. The range, R, of a set of n measurements is defined as the difference between the largest and smallest measurements.
Formula: Range = Highest score – Lowest Score or R = (H – L) B. VARIANCE and STANDARD DEVIATION The variance of a population of N measurements is defined to be the
average of the squares of the deviations of the measurements about their mean μ. The population variance is denoted by σ² and is given by the formula
(x - µ) ² ² = -------------- for ungrouped data N ƒ (x - µ) ² ² = ----------------- for grouped data
ƒ
The variance of a sample of n measurements is defined to be the sum of the squared deviations of the measurement about their mean x divided by (n-1). The sample variance is denoted by s² and is given by the formula
(x – x) ²
s² = --------------- for ungrouped data
n-1
ƒ (x – x) ²
s² = ------------------- for grouped data
ƒ -1
The standard deviation, in essence, represents the “average amount of variability” in a set of measures, using the mean as a reference point. Strictly speaking, the standard deviation is the positive square root of the average of the square deviations about the mean or the positive square root of the variance. The standard deviation is basically a measure of how far each score, on the average, is from the mean
_
_
1. The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 4.3 seconds. Calculate the range, variance and standard deviation.
Range = HV – LV
= 4.3 – 2.3 = 2
(x – x-bar) ²
s² = --------------------------
n-1
(2.5-3.3)2 + (3.6-3.3)2 + (3.1-3.3)2 +(4.3-3.3)2 + (2.9-3.3)2 +
(2.3-3.3)2 +(2.6-3.3)2 + (4.1-3.3)2 + (4.3-3.3)2
= -----------------------------------------------------------------------------------
9 -1
= 0. 6325 (sample variance)
s = sqrt (0.6325)
= 0.795298686 or 0.80 (sample standard deviation)
The frequency table (below) represent the final
examination for statistics course. Find the population
range, population variance and population standard
deviation
Class Interval Frequency Class mark Cumulative
Frequency
10– 19 3 14.5 3
20 – 29 2 24.5 5
30 – 39 3 34.5 8
40 – 49 4 44.5 12
50 – 59 5 54.5 17
60 – 69 11 64.5 28
70 – 79 14 74.5 42
80 – 89 14 84.5 56
90 – 99 4 94.5 60
Range = Highest Upper Class Boundary - Smallest Lower Class
Boundary
= 99.5 – 9.5
= 90
ƒ (x - µ) ²
² = -----------------
ƒ
3(14.5 – 66)2 +2 (24.5 – 66)2 +3 (34.5 – 66)2 + 4(44.5 – 66)2 +
5(54.5 – 66)2 +11 (64.5 – 66)2 +14 (74.5 – 66)2 +
14(84.5 – 66)2 + 4(94.5 – 66)2
² = ----------------------------------------------------------------------------
60
= 432.75
= 20.80264406 or 20.80
- refer to the visual characteristics of a certain
distribution.
- knowledge of the shape of the distribution can
help in concluding whether the distribution is
normal or not
Measures of Shape
Two (2) Principal Measures
of Shape
SKEWNESS
KURTOSIS
refers to the symmetry of a
distribution. A distribution
which is not symmetric with
respect to its mean can be
termed as either positively-
skewed or negatively-skewed
Measures of Shape
refers to the flatness or
peakedness of a particular
distribution
Skewness
Kurtosis
Skewness
SK = 0 Symmetric (Normal)
SK > 0 Positively Skewed
SK< 0 Negatively Skewed
where:
Xi - individual reading
σ - standard deviation
μ - mean
N - population size
SK= S[(Xi - μ)/]3 N
negative skew: The left tail is longer than the right tail. It
has relatively few low values. The distribution is said to
be left-skewed or "skewed to the left“; Example
(observations): 1,1000,1001,1002,1003
positive skew: The right tail is longer the left tail. It has
relatively few high values. The distribution is said to be
right-skewed or "skewed to the right".Example
(observations): 1,2,3,4,100.
The skewness for a normal distribution is zero, and any
symmetric data should have a skewness near zero.
Kurtosis
k = S[(Xi - μ)/]4 N
where:
Xi - individual reading
σ - standard deviation
μ - mean
N - population size
k = 3 MesoKurtic (Normal)
k > 3 LeptoKurtic
k < 3 PlatyKurtic
Platykurtic data set has a flatter peak around its mean,
which causes thin tails within the distribution. The
flatness results from the data being less concentrated
around its mean, due to large variations within
observations
Mesokurtic data, A term used in a statistical context
wherekurtosis of a distribution is similar, or identical, to
the kurtosis of a normally distributed data set.
Leptokurtic distributions have higher peaks around the
mean compared to normal distributions, which leads to
thick tails on both sides. These peaks result from the data
being highly concentrated around the mean, due to lower
variations within observations.
Examples
1. A technician checks the resistance value of 5 coils and
records the values in ohms: 3.35, 3.37, 3.28, 3.34 and
3.30. Determine the average.
2. Tensile tests on aluminum alloy rods are conducted at
three different times, which results in three different
average values in megapascals (Mpa). On the first
occasion, 5 tests are conducted with an average of 207
Mpa; on the second occasion, 6 tests, with an average of
203MPa; and on the last occasion, 3 tests, with an
average of 206MPa. Determine the weighted average.
3. Determine the standard deviation of the moisture content
of a roll of kraft paper. The results of six readings across
the paper web are 6.7, 6.0, 6.4, 6.4, 5.9, and 5.8%.
4. Given the frequency distribution of the life of 320
automotive tires in 1000 km as shown in table below,
determine the average and standard deviation
Boundaries Midpoint Frequency
23.5-26.5 25.0 4
26.5-29.5 28.0 36
29.5-32.5 31.0 51
32.5-35.5 34.0 63
35.5-38.5 37.0 58
38.5-41.5 40.0 52
41.5-44.5 43.0 34
44.5-47.5 46.0 16
47.5-50.5 49.0 6
PRACTICAL SIGNIFICANCE OF THE
STANDARD DEVIATION
A. TCHEBYSHEFF’S THEOREM
Tchebysheff’s theorem applies to any set of measurements and can be used to describe either a sample of or population. The idea involved in this theorem is illustrated below. An interval is constructed by measuring a distance k σ on either side of the mean μ. Note that the theorem is true for any number we choose for k as it is greater than or equal to 1. Then at least 1 – (1/k²) of the total number of n measurements lies constructed interval
1–1/ k2
The theorem states that:
At least one the measurements lie in the interval μ-σ to μ+σ.
At least ¼ of the measurements lie in the interval μ-2σ to μ+2σ.
At least 8/9 of the measurements lie in the interval μ-3σ to μ+3σ.
B. EMPIRICAL RULE
Another rule helpful in interpreting a value for a standard deviation is the Empirical rule, which applies to a data set having a distribution that is approximately bell-shaped. The empirical rule is often stated in abbreviated form, sometimes called the 68-95-99 rule.
1. A sample of 3000 observations has a mean of 82
and a standard deviation of 16.
Using the empirical rule, find what percentage of the
observations fall in the intervals x+2s; x+3s.
2. The mean life of a certain brand of auto batteries is
44 months with a standard deviation of three
months. Assume that the lives of all auto batteries of
this brand have a bell-shaped distribution. Using the
empirical rule, find the percentage of auto batteries
of this brand that have a life of
a. 41 to 47 months b. 38 to 50 months c. 35 to
53 months
3.
3.The ages of cars owned by all employees of
a large company have a bell-shaped
distribution with a mean of seven years and
a standard deviation of 2 years.
a. Using the empirical rule, find the
percentage of cars owned by these
employees are i. 5 to 9 years old ii. 1 to 13
years old.
b. Using the empirical rule, find the interval
that contains the ages of the cars owned by
95% of all employees of this company.
MEASURES OF POSITION
A. PERCENTILE
A set of n measurements on the variable
x has been arranged in order of
magnitude. The pth percentile is the value
that separate the bottom p% of the ranked
score from the top (100-p)%.
( Xnp + Xnp+1 ) if np is integer
Any percentile =
Xnp ( round to the next largest integer) if np is non-integer
For Grouped Data
np – <cfi
Any Percentile = Lb + -------------------- (i) fi
OR
n(1-p) – >cfi
Any Percentile = Ub - -------------------- (i) fi
where:
Lb = lower class boundary of the interpolated interval
Ub = lower class boundary of the interpolated interval
<cfi = less than cumulative frequency of the class before interpolated interval
>cfi = greater than cumulative frequency of the class before interpolated interval
fi = frequency of the interpolated interval
i = class size
n = number of data points.
p = the desired proportion or percentile
B. QUARTILE are values that divide a set of
observations into 4 equal parts. These
values, denoted by Q1 , Q2 and Q3 are
such that 25% of the data falls below Q1
50% fall below Q1 , and 75% falls below
Q3
C. DECILE are values that divide a set of
observations into 10 equal parts.