Statistics Lecture Notes
Dr. Halil İbrahim CEBECİ
Chapter 02Graphical and TabularDescriptive Techniques
A variable is some characteristic of a population or sample.
E.g. student grades.Typically denoted with a capital letter: X, Y, Z…
The values of the variable are the range of possible values for a variable.
E.g. student marks (0..100)
Data are the observed values of a variable.E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Definitions
Statistics Lecture Notes – Chapter 02
Interval data
Real numbers, i.e. heights, weights, prices, etc. Also referred to as quantitative or numerical.
Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.
Statistics Lecture Notes – Chapter 02
Definitions
Nominal Data
The values of nominal data are categories.E.g. responses to questions about marital status,
coded as:Single =1, Married =2, Divorced =3, Widowed =4
Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!)
Nominal data are also called qualitative or categorical.
Statistics Lecture Notes – Chapter 02
Definitions
Ordinal Data
appear to be categorical in nature, but their values have an order; a ranking to them:
E.g. College course rating system:poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric values are assigned to each category.
Statistics Lecture Notes – Chapter 02
Definitions
Time Series Data: Ordered data values observed over time
Cross Section Data: Data values observed at a fixed point in time
Statistics Lecture Notes – Chapter 02
Definitions
Ex2.1 -For each of the following examples of data, determine whether the data are quantitative,qualitative, or ranked.
a. the month of the highest sales for each firm in a sample (qualitative)
b. the department in which each of a sample of university professors teaches (qualitative)
c. the weekly closing price of gold throughout a year (quantitative)
d. the size of soft drink (large, medium, or small) ordered by a sample of customers in a restaurant (ranked)
e. the number of barrels of crude oil imported monthly by the United States (quantitative)
Statistics Lecture Notes – Chapter 02
Definitions
Statistics Lecture Notes – Chapter 02
Tabular and Graphical Techniques for Nominal Data
Categorical Variables
• Frequency distribution
• Bar chart• Pie chart• Pareto diagram
Numerical Variables
• Line chart• Frequency
distribution• Histogram and
ogive• Stem-and-leaf
display• Scatter plot
The only allowable calculation on nominal data is to count the frequency of each value of the variable.
We can summarize the data in a table that presents the categories and their counts called a frequency distribution.
A relative frequency distribution lists the categories and the proportion with which each occurs.
Statistics Lecture Notes – Chapter 02
Tabular and Graphical Techniques for Nominal Data
Ex2.2 – The student placement office at university conducted a survey of last year’s business school graduates to determine the general areas in which the graduates found jobs. The placement office intended to use the resulting information to help decide where to concentrate its efforts in attracting companies to campus to conduct job interviews The areas of employment are;
1. Accounting2. Finance3. General Management4. Marketing/Sales5. Other
Statistics Lecture Notes – Chapter 02
Tabular and Graphical Techniques for Nominal Data
Statistics Lecture Notes – Chapter 02
1 1 2 4 1 4 2 4 5 2 5 4 1 1 4 2 3
4 5 1 4 1 3 2 4 3 1 2 5 4 2 3 3 2
5 4 1 4 1 4 5 5 1 4 2 4 2 2 5 2 5
1 5 3 4 1 4 1 2 1 3 4 2 4 5 5 1 2
2 1 4 3 3 1 4 1 1 1 1 2 4 1 4 3 2
2 4 1 1 2 4 4 4 5 4 5 1 1 3 2 1 3
3 1 5 3 1 3 2 1 1 1 5 3 2 3 4 2 5
1 3 1 1 1 4 2 4 4 2 1 4 4 5 5 2 1
4 4 2 5 3 2 4 1 1 4 3 2 4 2 3 1 1
1 2 1 1 4 1 4 3 4 4 2 3 1 4 5 3 3
1 4 1 2 4 1 4 5 2 2 2 5 4 4 4 1 4
4 1 4 4 1 2 4 2 2 3 2 1 4 4 3 4 1
3 4 5 3 3 1 5 1 4 2 2 1 5 5 4 1 1
1 4 3 2 2 1 1 4 2 3 1 3 3 2 2 3
4 2 2 1 4 2 3 1 1 5 1 1 2 1 1 1
Tabular and Graphical Techniques for Nominal Data
Statistics Lecture Notes – Chapter 02
Area Frequency Relative Frequency(%)
Accounting 73 28.9
Finance 52 20.6
General Management 36 14.2
Marketing/Sales 64 25.3
Other 28 11.1
Total 253 100
It all the same information,(based on the same data).Just different presentation.
Tabular and Graphical Techniques for Nominal Data
There are several graphical methods that are used when the data are interval (i.e. numeric, non-categorical).
The most important of these graphical methods is the histogram.
The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.
Statistics Lecture Notes – Chapter 02
Graphical Techniques for Ordinal Data
1. Collect the Data 2. Create a frequency distribution for the data.3. Draw the Histogram.
Ex2.3 – Draw a histogram for long distance telephone bills data
Statistics Lecture Notes – Chapter 02
Graphical Techniques for Ordinal Data
42.19 39.21 75.71 8.37 … 114.67 15.30
38.45 48.54 88.62 7.18 … 27.57 75.49
29.23 93.31 99.50 11.07 … 64.78 68.69
89.35 104.88 85.00 1.47 … 45.81 35.00
… … … … … … …
74.01 93.57 23.31 9.01 … 3.03 41.38
56.01 0 11.05 84.77 … 9.16 45.77
Statistics Lecture Notes – Chapter 02
Graphical Techniques for Ordinal Data
Class Limits Frequency
0 to 15 71
15 to 30 37
30 to 45 13
45 to 60 9
60 to 75 10
75 to 90 18
90 to 105 28
105 to 120 14
Total 200
Histogram
Statistics Lecture Notes – Chapter 02
Graphical Techniques for Ordinal Data
Skewness
Measure of the degree of asymmetry of a frequency distribution
Skewed to left Skewed to right Symmetric or unskewed
Statistics Lecture Notes – Chapter 02
Graphical Techniques for Ordinal Data
KurtosisMeasure of flatness or peakedness of a frequency distribution
relatively peaked
relatively flat
normal
A stemplot (or stem-and-leaf display), in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution.
Ex2.4 - The weights in pounds of a group of workers are as follows
Construct a stem and leaf display for these data
Statistics Lecture Notes – Chapter 02
Stem-and-Leaf Display (Stemplot)
173 165 171 175 188
183 177 160 151 169
162 179 145 171 175
168 158 186 182 162
154 180 164 166 157
The first step in constructing a stem and leaf display is to decide how to split each observation (weight) into two parts: a stem and a leaf.Thus, the first two weights are split into a stem and a leaf as follows:
Statistics Lecture Notes – Chapter 02
Stem-and-Leaf Display (Stemplot)
Weight Stem Leaf
173 17 3
183 18 3
Next, we consider each observation in turn and place its leaf in the same row as its stem, to the right of the vertical line. The resulting stem and leaf display shown below has grouped the 25 weights into five categories.
Stem Leaf
14 5
15 4 8 1 7
16 2 8 5 0 4 6 9 2
17 3 7 9 1 5 1 5
18 3 0 6 2 8
Is a graph of a cumulative frequency distribution.
We create an ogive in three steps…
1. Calculate relative frequencies.2. Calculate cumulative relative frequencies by
adding the current class’ relative frequency to the previous class’ cumulative relative frequency.
3. Draw the Ogive
(For the first class, its cumulative relative frequency is just its relative frequency)
Statistics Lecture Notes – Chapter 02
Ogive
Statistics Lecture Notes – Chapter 02
Ogive
Class Limits Relative Frequency
CumulativeRel. Freq.
0 to 15 71/200=0.355 0.355
15 to 30 37/200 = 0.185 0.540
30 to 45 13/200 = 0.065 0.605
45 to 60 9/200 = 0.045 0.650
60 to 75 10/200 = 0.050 0.700
75 to 90 18/200 = 0.090 0.790
90 to 105 28/200 = 0.140 0.930
105 to 120 14/200 = 0.070 1.00
Total 200
Ex2.5 – Draw an Ogive for the value given in ex2.3
Ogive for Telephone Bills
What telephone bill value is at the 50th percentile?
Tabular Method:
A Contingency table (Cross-Tab Classification table) is used to describe the relationship between two nominal variables.
Ex2.5 –To help advertising campaigns, the advertising managers of the newspapers need to know which segments of the newspaper market are reading their papers. Discuss the relationship between newspaper and occupation with contingency table given below
Statistics Lecture Notes – Chapter 02
Describing the Relationship Between Two Variables
Occupation
Newspaper Blue Collar White Collar Professional Total
G&M 27 39 33 89
Post 18 43 51 112
Star 38 21 22 81
Sun 37 15 20 72
Total 120 108 126 354
Graphic Method (Scatter Diagram):
We create a scatter diagram in three steps
1) Collect the data2) Determine the independent variable (X – house size)
and the dependent variable (Y – selling price)3) Use Excel to create a “scatter diagram”…
Ex2.6 - A real estate agent wanted to know to what extent the selling price of a home is related to its size. Use a graphical technique to describe the relationship between size and price
Statistics Lecture Notes – Chapter 02
Describing the Relationship Between Two Variables
Size 23 16 26 20 22 14 33 28 23 20 27 18
Price 315 229 355 261 234 216 308 306 289 204 265 195
It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price
Statistics Lecture Notes – Chapter 02
Scatter Diagram
Linearity and Direction are two concepts we are interested in
Statistics Lecture Notes – Chapter 02
Scatter Diagram
Observations measured at the same point in time are called cross-sectional data. Observations measured at successive points in time are called time-series data.
Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis.
Statistics Lecture Notes – Chapter 02
Describing Time Series Data
Q2.1 - Identify the type of data observed for each of the following variables.
a. the number of students in a statistics classb. the student evaluations of the professor (1 = poor, 5
= excellent)c. the political preferences of votersd. the states in the United States of Americae. the size of a condominium (in square feet)
Statistics Lecture Notes – Chapter 02
Exercises
Q2.2 - The salaries (in hundreds of dollars) of a sample of 40 government employees are as follows:
a. Construct a stem and leaf display for these data. (When leaves consist of two digits, they should be separated from one another by commas.)
b. Construct a frequency distribution for these data.c. Construct a relative frequency histogram for the data.d. Construct an ogive for the data.
Statistics Lecture Notes – Chapter 02
208 160 175 334 228 211 179 354
265 215 191 239 298 226 220 260
173 263 226 165 252 422 284 232
225 348 290 180 300 200 245 204
256 281 230 275 158 224 315 217
Exercises
Q2.3 - The number of men and women who have received an M.B.A. degree from a particular universityin each of five years is shown below:
a. Use a component bar chart to depict these datab. Use a line chart to depict these data
Statistics Lecture Notes – Chapter 02
Year Men Women
1988 74 12
1989 85 20
1990 90 32
1991 112 48
1992 128 67
Exercises
Q2.4 - The manager of a large furniture store wanted to know if the number of ads influenced the number of customers. During the past eight months, she kept track of both figures, which are shown below. Construct a scatter diagram for these data, and describe the relationship between the number of ads and the number of customers.
Statistics Lecture Notes – Chapter 02
Month Number of Ads (x) Number of Customer (y)
1 5 528
2 12 876
3 8 653
4 6 571
5 4 556
6 15 1058
7 10 963
8 7 719
Exercises
Keller, Gerald; Statistics for Management and Economics, 9e, 2012
Groebner, D.F.; Shannon, P.W., Fry, P.C, Smith, K.D; Business Statistics: A decision Making Approach, 7e, 2007
Azcel, A.D; Complete Business Statistics, 7e, 2009
Newbold, P., Carlson, W., Thorne B.; Statistics for Business and Economics, 6e, 2007
Ott, R.L., Longnecker, M.; An Introduction to Statistical Methods and Data Analysis, 6e, 2010
Black, K.; Business Statistics for Contemporary Decision Making, 6e, 2010
References
Statistics Lecture Notes – Chapter 01