Variable
An item of data Examples:
– gender– test scores– weight
Value varies from one observation to another
Qualitative Data
Describes the quality Non-numerical format
Counts Cannot order or measure Examples
– gender– marital status– geographical region – job title….
Categorical data
Non-overlapping categories or characteristics
Examples:– Completes/Incompletes
– Professions
– Gender
Discrete
Measurements are integers
Examples:
– number of employees of a company
– number of incorrect answers on a test
– number of participants in a program…
Continuous
Measurements can take on any value - usually within some range
Examples:
– Age
– Income
Arithmetic operations such as differences and averages make sense.
Qualitatiave or Quantitative?Discrete or Continuous?
Score on a placement examPreferred restaurantDollar amount of a loanHeight SalaryLength of time to complete a taskNumber of applicantsEthnic origin
Treatment as Ranks
Natural order Not strictly measuredExamples:
– Age group – Likert Scale data
Distinction between adjacent points on the scale is not necessarily the same
AnalysisQualitative Data
Frequency tables
Modes - most frequently occurring
Graphs: Bar Charts and Pie Charts
AnalysisQuantitative Data
Any form
Create groups or categories and generate
frequency tables
All descriptive statistics
Effective Graphs: Quantitative Data
Histograms
Stem-and-Leaf plots
Dot Plots
Box plots
XY Scatter Plots (2 variables).
Stem and Leaf PlotStem and Leaf PlotWeight of Meat
7 58 38 79999 239 66789
1010 68811 224411 78812 412 81313 814 1
Analyze Ranked Data
Frequency tables Mode, Median, Quartiles Graphs:
– Bar Charts– Dot Plots, Pie Charts– Line Charts (2 variables)
Data Example
Suggest some ways you could analyze these items.
Score on a placement exam Preferred restaurant Dollar amount of a loan Height Salary Length of time to complete a task Number of applicants Ethnic origin
Tables and Graphs
Note Excel will create any graph that you specify
Consider the type of data before selecting your graph.
Frequency Table/Frequency Distribution
Summarize data:categoricalnominal Continuous data - the data set has been
divided into meaningful groups
Frequency Distribution
Count the number of observations that fall into each category.
Frequency: the number associated with each category
Relative Frequency Distribution
Proportion of observations falling in a given category
Report relative frequencies or percentages
ExampleFrequency Distribution
No. of Defective Parts n
0 & < 2 02 4 44 6 56 8 128 10 14
10 12 912 14 1014 16 6
Pie Charts
Circle - divided
proportionately
Segment - percentage of
the whole that falls into
each category Native Language
English55%
Viet Namese
15%
Swedish5%
Spanish25%
Bar Charts
Bar charts - % in various
categories Vertical scale -
frequencies, relative
frequencies Horizontal scale -
categories Allows comparisons
Ave ra ge Units Sold (pe r pe rs on) by Produc t
0
5
10
15
20
B41 BA42 B41F C21 Other
Product
Ave
rag
e S
old
/Pe
rso
nBefore Training
A f ter Training
Constructing Bar Charts
All boxes should have the same width Gaps between the boxes - no connection
between Any order. Use to represent two categorical variables
simultaneously
Graphs: MeasuredContinues Quantitative Data
Histograms Stem and LeafBox plotsLine GraphsXY Scatter Charts (2 variables)
Histograms
Frequency distributions of continuous variables
Drawn without gaps between the bars
Grade Distribution
02468
1012
59 69 79 89 99Grade
Fre
qu
ency
Constructing Histograms
Non-overlapping intervals
Intervals - generally the same length
Number of values in each interval -class frequency
Relative frequencies o
Grade Distribution
02468
1012
59 69 79 89 99Grade
Fre
qu
ency
XY Scatter Chart
Two variables Variables: quantitative
and continuous. Plot pairs - rectangular
coordinate system Examine the relationship
between two variables
Abscent by Age
0
5
10
15
20
0 10 20 30 40 50 60 70
Age
Days
Abs
ent
Line Chart
Similar to the scatter chart
Values of the independent variable (shown on the horizontal axis) can be ranked values (i.e.. they do not have to be continuous variables).
1997 Monthly Sales
125130135140145150155160165170
Jan
Feb
Mar
Apr
May
June
Month
Sale
s (x
$10,
000)
Basic Principles for Constructing All Plots
Data should stand out clearly from background
The information should be clearly labeled – title– axes, bars, pie segments, etc. - include units
that are needed to interpret data– scale including starting points.
Describing Data
Shape of the Distribution– Symmetry– Skewness– Modality: most frequently occurring value – Unimodal or bimodal or uniform
Right Skewed Left Skewed
Histogram
0
2
4
6
8
10
12
59 69 79 89 99
Grade
Fre
qu
ency
Histogram
0
2
4
6
8
10
12
59 69 79 89 99
Grade
Fre
qu
ency
Histogram
0
2
4
6
8
10
12
59 69 79 89 99
Grade
Fre
qu
ency
Symmetrical
Mean
Most common measure
Extremely large values in a data set will
increase the value of the mean
Extremely low values will decrease it.
Median
Central point .Half of the data has a value than the medianHalf of the data has a higher value than the
medianNot affected by extremely large or small
values
Range
Subtract the smallest value from the largest
Report the smallest and largest values.
85 90 75 92 95 Scores
Range: 75 to 95
or 20
Variance/Standard Deviation
Average variation of the data values from
the mean of the valuesVariance.
The Empirical Rule
Symmetrical DataAt least:
68% of the data values are within one standard deviation of the mean
90% of the data values are within two standard deviation of the mean
99% of the data values are within three standard deviations of the mean
Tchybychef’s Inequality
Skewed DataAt least:
75% of the data values are within two standard deviation of the mean.
90% of the data values are within one standard deviation of the mean.
Quartiles
The lower quartile is the same as the 25th percentile.– 25% of the scores are lower and– 75% of the scores are higher than the lower
quartile.
The upper quartile is the same as the 75th percentile.– 75% of the scores are lower and
Correlation
Describes the strength of the relationship
between two (or more) variables
Pearson Product-moment Correlation
Coefficient - assumes continuous
quantitative data
Interpreting Correlation Coefficients.
0.20 to 0.35- show a slight relationship(little value in practical prediction situations)
0.50 - crude group prediction
(Correlations this low do not suggest a good relationship)
0.65 to 0.85 - group predictions that are good
Over 0.85 - a close relationship between the two
variables.
Coefficient of Determination
Square root of the correlation coefficient
Gives the percent of variation in the
dependent variable that is ‘explained’ by
the independent variable.
Look at an XY scatter plot
Least Square Line
Describe the relationship between the two
variables
Make predictions of the dependent variable
from the independent variable
Top Related