Skyline Technologies presents
StatPadTM
Quick and Easy Data Analysis Using Excel
0
5
10
15
2004 2006 2008 2010 2012 2014
Am
azo
n R
even
ue
($b
illi
on
s)
Time
Revenue
Trend
Forecast
Copyright 1988, 1997, 2000, 2003, 2011 by Skyline Technologies, Inc.
StatPad is a trademark of Skyline Technologies, Inc.
Excel is a registered trademark of Microsoft Corporation.
2 What is StatPad?
Table of Contents
What is StatPad? ..............................................................................................................................4
How to Install StatPad .....................................................................................................................5
How to Use StatPad .........................................................................................................................6
Overview of StatPad Features ........................................................................................................12
One-Sample Analysis .....................................................................................................................18
Summaries................................................................................................................................18
Histogram .................................................................................................................................19
Histogram (With Customized Bin Width and Landmark) .......................................................20
Box Plot ...................................................................................................................................21
Cumulative Distribution ..........................................................................................................22
Confidence Interval ..................................................................................................................23
Confidence Interval (One-Sided, 99%) ....................................................................................24
Hypothesis Test ........................................................................................................................25
Hypothesis Test (One-Sided) ...................................................................................................26
Percentile ..................................................................................................................................27
Percentile Ranking ...................................................................................................................28
Sampling ........................................................................................................................................29
Random Sample Without Replacement ...................................................................................29
Random Sample With Replacement ........................................................................................30
Uniform Distribution ...............................................................................................................31
Normal Distribution .................................................................................................................32
Binomial Distribution ..............................................................................................................33
Binomial Percentages ...............................................................................................................34
Probability Calculations .................................................................................................................35
Normal Probability (Greater Than) ..........................................................................................35
Normal Probability (Between) .................................................................................................36
Binomial Probability (Equal to) ...............................................................................................37
Binomial Probability (This or Less) .........................................................................................38
Binomial Percent (Equal to) .....................................................................................................39
Binomial Percent (Between) ....................................................................................................40
Poisson Probability (Equal to) .................................................................................................41
Poisson Probability (This or Less) ...........................................................................................42
Exponential Probability (This or More) ...................................................................................43
Exponential Probability (Between) ..........................................................................................44
Discrete Probability ..................................................................................................................45
Two-Sample Analysis ....................................................................................................................46
Summaries................................................................................................................................46
Histograms ...............................................................................................................................47
Box Plots ..................................................................................................................................48
Confidence Interval ..................................................................................................................49
Hypothesis Test ........................................................................................................................50
Many-Sample Analysis ..................................................................................................................51
Summaries................................................................................................................................51
Histograms ...............................................................................................................................52
Box Plots ..................................................................................................................................53
F Test for One-Way ANOVA ..................................................................................................54
Mean Differences .....................................................................................................................55
Bivariate Analysis ..........................................................................................................................56
Scatterplot ................................................................................................................................56
Scatterplot with Least-Squares Line ........................................................................................57
Correlation ...............................................................................................................................58
Correlation with Test ...............................................................................................................59
Regression ................................................................................................................................60
Predicted and Residuals ...........................................................................................................61
Univariate Summaries ..............................................................................................................62
Histograms ...............................................................................................................................63
Box Plots ..................................................................................................................................64
Multivariate Analysis and Multiple Regression ............................................................................65
Scatterplots ...............................................................................................................................65
Correlations ..............................................................................................................................66
Multiple Regression .................................................................................................................67
Predicted and Residuals ...........................................................................................................68
Diagnostic Plot .........................................................................................................................69
Univariate Summaries ..............................................................................................................70
Histograms ...............................................................................................................................71
Box Plots ..................................................................................................................................72
Time-Series Analysis .....................................................................................................................73
Trend-Seasonal ........................................................................................................................73
Forecast with Series .................................................................................................................74
Moving Average (Smooth) ......................................................................................................75
Seasonal Index .........................................................................................................................76
Seasonally Adjusted Series ......................................................................................................77
Long-Term Trend .....................................................................................................................78
Seasonalized Trend ..................................................................................................................79
A Combination: Data Series With Long-Term Trend and Forecast ........................................80
Numeric Output .......................................................................................................................81
Quality Control ..............................................................................................................................82
X-Bar, R Charts (No Standard Given) ......................................................................................82
X-Bar, R Charts (Standard Given) ............................................................................................83
Percentage or Count Chart (No Standard Given) .....................................................................84
Percentage or Count Chart (Standard Given) ..........................................................................85
4 What is StatPad?
What is StatPad? Welcome to StatPad1, a software system designed for people who wish to perform statistical
analysis within their Microsoft Excel2 computer spreadsheets. StatPad was designed to make
statistical analysis as accessible, painless, and easy to understand as possible by bringing basic
statistical analysis and its interpretation into the environment where business and other data are
often found: namely within an Excel spreadsheet. Whenever possible, the analysis is guided by
choices from a dialog box that adapts itself automatically to your situation. The results,
consisting of charts, explanatory text, and computations, then become part of your worksheet.
StatPad will perform all aspects of basic statistics: design using a random sample, exploration
through graphic representations of data, estimation with summaries and confidence intervals
(both one-and two-sided at various confidence levels), hypothesis testing, normal and binomial
probability calculations, multiple regression analysis, trend-seasonal time series analysis, and
statistical quality control charts.
Heres how to get started if you are in a hurry: after you open the file STATPAD.XLA, you will
find StatPad listed under the Excels Add-Ins Ribbon (or Tools menu for older versions of Excel)
ready for you to select. When selected, StatPad greets you with its main dialog box, ready for
analysis.
1StatPad is a trademark of Skyline Technologies, Inc.
2Excel is a registered trademark of Microsoft Corporation.
How to Install StatPad 5
How to Install StatPad All you need in order to run StatPad is a computer running Microsoft Excel for Windows. There
are two ways to install StatPad, depending upon whether or not you want StatPad to be there
automatically whenever you work in Excel. Please begin by copying the file STATPAD.XLA to a
folder on your computer.
If you wish StatPad to be available automatically when you run Excel:
1. In Excel, choose File/Options, select Add-Ins at the left, wait a moment, then choose
"Go" near the bottom to manage Excel Add-Ins (Excel 2007 users will start by clicking
on the OfficeButton at the top left, choose ExcelOptions at the bottom before continuing
by selecting Add-Ins at the left and choosing "Go").
2. Browse to the folder where you put the file STATPAD.XLA, select the file, and click OK.
3. Be sure the StatPad entry is checked in the list of add-ins, then choose OK.
4. StatPad will be available in the Add-Ins Ribbon near the top (or Tools menu for older
versions of Excel).
If you wish to load StatPad manually each time you open Excel:
Either double-click the file STATPAD.XLA or use Excels File Open menu commands to
open this file from its folder on your computer. Choose Enable Macros if necessary.
The choice StatPad will then be available under Excels Add-Ins Ribbon (or Tools menu
for older versions of Excel). StatPad will remain available until you close Excel.
If you need to change Excel's macro security level, you will find this at File / Options /
TrustCenter / TrustCenterSettings / Add-Ins.
6 How to Use StatPad
How to Use StatPad Heres how to use StatPad:
1. Get into Excel and bring your data (if any) into the worksheet.
2. If StatPad has already been installed, simply select StatPad from Excels Add-Ins Ribbon
near the top of the screen to begin statistical analysis.
If StatPad has not yet been installed, either open the file STATPAD.XLA using Excels File
Open menu command near the top of the screen or read the previous section How to
Install StatPad to see how to make StatPad available whenever you are in Excel.
3. You will see StatPads main dialog box, ready to guide you through the analysis:
4. Select a situation from the list near the top left (One Sample, Sampling, Probability, Two
Sample, Many Sample, Bivariate, Multivariate, Time Series, or Quality Control).
5. Select the analysis you want from the list near the top right. Note that this analysis list
changes automatically for you, depending on the situation you choose. For a One Sample
situation, the analysis choices are Summaries, Histogram, etc. But if you select
Probability instead, the analysis choices instantly change to Normal Probability, Binomial
Probability, and Binomial Percent.
6. Give StatPad the additional information it needs. StatPad will automatically change to
show you what is needed, so you may fill in the blanks as they appear. For One Sample,
Summaries, you need to give StatPad a data set name and an output range. For One
Sample, Confidence Interval, so that you can tell StatPad which confidence level you
How to Use StatPad 7
wish, an edit box will appear automatically for this purpose (you may also decide to
choose a one-sided interval). Heres how the main dialog box changes:
For a multiple regression analysis, StatPads main dialog changes again (automatically!)
allowing you to select the X variables (for example, income, percent male, and
readership) to use to explain the Y variable (for example, the cost of a full-page color
magazine ad).
8 How to Use StatPad
7. Heres how to select your data set(s) from the list(s). StatPad puts into its lists each Excel
range name that identifies a single column of numbers.3 When you name your data with
StatPad, the name also becomes an Excel range name.
a. If just one data set is needed (e.g., for one-sample analysis), you may choose one of
the following:
i. Click on its name, in the list.
or
ii. Type its name into the edit-box, just above the list.
or
iii. Click on the edit-box, just above the list, and then drag in the worksheet with the
mouse to identify your column of numbers. This is useful for a quick analysis
when you do not care to use a name to identify the data.
b. If more than one data set can be specified (e.g., many-sample analysis, or the X
variables for a multiple regression), you may choose one of the following:
i. Click on each name that you wish to select, scrolling up and down as needed. If
you click again on a selected name, it is unselected (be careful not to click quickly
twice on the same name; Excel will interpret this as a double-click and StatPad
will immediately begin the analysis).
or
ii. Move through the list using the cursor (arrow) keys, selecting and unselecting by
hitting the spacebar.
8. If your data are in the worksheet, but are not offered to you as a choice4 in StatPads lists,
heres how to proceed:
a. Click on StatPads Add Data button (at the right, just above the middle of StatPads
main dialog box) to put a data set name into the list. You then see the following
dialog box, and you may drag with the mouse to select the data (one column of
numbers) and specify the name you want. This name will then appear in StatPads
lists along with the other data sets.
3Heres a quick way to find out the name (if any) associated with a list of numbers. Highlight the list (drag with the
mouse), then look for the name in Excels Name Box near the top left corner of the worksheet. StatPad limits the size
of a each list to a maximum of 65,000 numbers.
4If you have used Excel to name a column of numbers (e.g., with Excels Insert Name Define menu items), this name
will appear automatically in StatPads list. When you name a column of numbers within StatPad, this name also
becomes an Excel range name for your data. Names can be deleted using Excels Insert Name Define Delete menu
items.
How to Use StatPad 9
Heres how the screen might look after you (1) click in the Range box of the above
dialog box, (2) highlight your data in the worksheet, (3) click in the Name box of the
dialog box, and (4) type in the name (Prices for this example, but please dont use
spaces or special characters):
b. Alternatively, you may feel free to type a name for the data set into the edit-box in
StatPads main dialog box, even if that name is not proposed for you. This can be
done whenever only one data set can be used for the chosen situation (but please dont
use spaces or special characters in the name). Once you hit Enter, click on Do It, or
double click, to begin the analysis, StatPad will ask you to select the column of
numbers you want, using the following dialog box. After this, the name will
automatically show up in StatPads data set lists.
10 How to Use StatPad
9. Use the Output Range box at the lower right of the main dialog box to tell StatPad where
to put the results.
a. If youve asked for a chart:
i. If you provide a single cell as the Output Range, then StatPad will place a chart of
the default size with upper-left corner at this cell.
ii. If you provide a rectangular range of cells as the Output Range, then StatPad will
make the chart the same size as your range.
b. If youve asked for numbers and text:
i. If there is enough room without erasing any of your data, StatPad will place the
upper-left cell of the output at the Output Range you specified.
ii. If your results would overwrite any of your data, StatPad will give you the option
of either specifying a different Output Range, or (use caution!) going ahead and
erasing some of your data to make room for the results if you wish.
10. After StatPad performs the analysis you requested (or asks for clarification, if needed),
you will again find the StatPad main dialog box on your screen, ready for further analysis.
You may either continue your analysis with StatPad, or leave StatPad (select Cancel or
hit the Esc key) to return control to Excel and your worksheet.
11. You can format StatPads results because they are part of your Excel spreadsheet, after
leaving the StatPad dialog box by hitting the Esc key or selecting Cancel.
a. You can select and format numbers in individual cells as you ordinarily would in
Excel (for example, using the Number Group of the Home Ribbon). For example, you
can format with dollar signs, set the number of decimal places, format as percentages,
etc.
b. You can customize StatPads charts as you would for any Excel chart. For example,
you might select the chart and then use the Chart Tools Ribbons (Design, Layout, and
Format) at the top of the Excel window. Another method would be to double-click the
part of the chart you wish to change, for example the x axis, to bring up the relevant
formatting options. You might then choose set the scale under Axis Options (e.g., to
change the minimum and/or maximum) or select Number (e.g., to change the number
formatting).
How to Use StatPad 11
12. You can copy StatPads results to your word processor, after leaving the StatPad dialog
box by hitting the Esc key or selecting Cancel.
a. To copy text and numbers to your word processor, proceed as follows:
i. Highlight your cell(s) and choose Copy from the Clipboard Group of the Home
Ribbon.
ii. Activate your word processor and move the cursor to where you want the results
to go.
iii. Depending upon your word processor, you may wish to paste as unformatted text.
The text then becomes part of the text document and you may format it as you
like. For example, with Microsoft Word 2010, you might click on the word
"Paste" in the Clipboard Group of the Home Ribbon, then choose Paste Special
from the Paste Options, to obtain the Unformatted Text choice.
b. To copy charts to your word processor, proceed as follows:
i. Click on the edge of a chart (just one at a time) to select it, then choose Copy from
the Clipboard Group of Excel's Home Ribbon.
ii. Activate your word processor and move the cursor to where you want the chart to
go.
iii. Depending upon your word processor, you may wish to paste as a Picture
(instead of as an Excel object). The chart then becomes part of the text document
and you would be able to place and size it using your word processors
commands. For example, with Microsoft Word 2010, you might click on the
word "Paste" in the Clipboard Group of the Home Ribbon, then choose Paste
Special from the Paste Options, to obtain the Picture (Enhanced Metafile)
choice.
13. For more information about statistical analysis, its applications and interpretation, please
consult a book such as Practical Business Statistics by Andrew F. Siegel (Elsevier /
Academic Press, sixth edition, 2012).
12 Overview of StatPad Features
Overview of StatPad Features StatPads statistical analyses are grouped into the following situations:
One Sample
Sampling
Probability
Two Sample
Many Sample
Bivariate
Multivariate
Time Series
Quality Control
These situations are presented in a list at the left in StatPads main dialog box. When you select a
situation, the appropriate analyses are available in a list to the right in this dialog box. When you
select a situation and analysis, an explanation also appears in the dialog box and the dialog box
changes to allow you to specify what is needed for the analysis (e.g., a confidence level). Here is
a list of the situations, analyses, and explanations available within StatPad. More details about
each one, with an example, are given on the pages that follow.
One Sample
Summaries Compute statistical summaries for the data: count, average or mean, median,
smallest, largest, quartiles, standard deviation, and standard error.
Histogram Draw a histogram to explore the data, showing the shape of the distribution,
typical values, variability, and outliers. Data are concentrated where the
histogram bars are high. Check 'Customize' to specify optional bin width and
landmark point.
Box Plot Draw a box plot to explore the data, showing the 5-number summary (smallest,
lower quartile, median, upper quartile, and largest). In the ordinary box plot, a
line extends from the box on each side to the most extreme value. Check
Detailed box plot to indicate outliers separately and have the lines extend from
the box on each side to the most extreme value (adjacent value) that is not an
outlier.
Cumulative
Distribution
Draw a cumulative distribution function for the data, showing the percentage of
data values less than each given number. This shows you the percentiles.
Overview of StatPad Features 13
Confidence
Interval
Compute a confidence interval for the population mean. This is statistical
inference about the population, based on random sampling. Two-sided or one-
sided interval, with your chosen confidence level.
Hypothesis
Test
Test the null hypothesis that the population mean is equal to a given reference
value. This is statistical inference about the population, based on random
sampling. Two-sided or one-sided testing (Student's t test) is used.
Percentile Given a percentage, find the percentile value. This data value has approximately
this percentage of the data values smaller than it.
Percentile
Ranking
Find the percentage ranking for a given value. This is the approximate
percentage of data values that are less than the given value.
Sampling
Sample
Without
Replacement
Select a random sample from a larger population, without replacement so that
no item can be selected more than once. All population items are equally likely
to appear in the sample, and they are chosen independently of one another.
Sample With
Replacement
Select a random sample from a larger population, with replacement so that an
item may be selected more than once. All population items are equally likely to
appear in the sample, and they are chosen independently of one another.
Uniform
Distribution
Select a random sample from a uniform distribution, where all values are
equally likely between the smallest and largest possible value. By specifying a
name, you will be able to easily use the result later.
Normal
Distribution
Select a random sample from a normal distribution, given the mean and
standard deviation. By specifying a name, you will be able to easily use the
result later.
Binomial
Distribution
Select a random sample from a binomial distribution (the number of
occurrences) given the number of trials and the probability of occurrence. By
specifying a name, you will be able to easily use the result later.
Binomial
Percentages
Select a random sample of binomial percentages, given the number of trials and
the probability of occurrence. By specifying a name, you will be able to easily
use the result later.
14 Overview of StatPad Features
Probability
Normal
Probability
Probabilities for a normal distribution: the symmetric bell-shaped curve, given a
mean and a positive standard deviation.
Binomial
Probability
Probabilities for a binomial distribution: the number of occurrences out of a
given number of independent trials with a given probability.
Binomial
Percent
Probabilities for a binomial percentage, given the number of independent trials
and the probability for each trial.
Poisson
Probability
Probabilities for a Poisson distribution: the number of random occurrences
where the rate is fixed, given the mean number. For example, the number of
orders you will receive next week, if orders occur at a constant rate with an
average of 5 per week.
Exponential
Probability
Probabilities for an exponential distribution: a highly skewed distribution with
no memory, given the mean. For example, the length of a telephone call or the
time until the next customer arrives where the mean is 9 minutes.
Discrete
Probability
Mean (expected value) and standard deviation for a discrete random variable,
given a set of values and their associated probabilities.
Two Samples
Summaries Compute univariate summaries for each data set. Also find the average
difference and its standard error. If sample sizes are identical, you may indicate
that a pair of measurements was made on each item.
Histograms Draw a histogram for each data set, for data exploration.
Box Plots Draw a box plot for each data set, for data exploration, using the same scale for
comparison. In the ordinary box plot, a line extends from the box on each side
to the most extreme value. Check Detailed box plot to indicate outliers
separately and have the lines extend from the box on each side to the most
extreme value (adjacent value) that is not an outlier.
Confidence
Interval
Compute a confidence interval for the population mean difference. This is
statistical inference. Two-sided interval, with chosen confidence level. If
sample sizes are identical, you may indicate that a pair of measurements was
made on each item.
Hypothesis
Test
Test the null hypothesis that the population mean difference is zero. This is
statistical inference. Two-sided testing using Student's t test. If sample sizes are
identical, you may indicate that a pair of measurements was made on each item.
Overview of StatPad Features 15
Many Samples
Summaries Select as many data sets as you wish. Compute univariate summaries for each.
Histograms Draw a histogram for each sample, for data exploration.
Box Plots Draw a box plot for each sample, for data exploration, using the same scale for
comparison. In the ordinary box plot, a line extends from the box on each side
to the most extreme value. Check Detailed box plot to indicate outliers
separately and have the lines extend from the box on each side to the most
extreme value (adjacent value) that is not an outlier.
F Test One-way analysis of variance (ANOVA). Test the null hypothesis that the
population means are all identical. This is statistical inference.
Mean
Differences
Confidence intervals and hypothesis tests for the difference of each pair of
population means (least-significant-difference test). This is statistical inference.
Bivariate
Scatterplot Draw a scatterplot to explore the relationship between two variables.
Scatterplot
with Line
Draw a scatterplot with least-squares line to explore the relationship between
two variables.
Correlation Find the strength of the relationship between two variables as a pure number
where 1 indicates a perfect increasing relationship, -1 a perfect decreasing
relationship, and 0 suggesting no relationship.
Correlation
with Test
Find and test the strength of the relationship between two variables. This is
statistical inference.
Regression Predict the dependent Y variable from the independent X variable using a
straight-line relationship.
Predicted and
Residuals Predicted values of Y based on X, the residual difference: Actual Y Predicted
Y, and the standardized residuals.
Univariate
Summaries
Compute univariate summaries for each variable.
Histograms Draw a histogram for each variable, for data exploration.
16 Overview of StatPad Features
Box Plots Draw a box plot for each variable, for data exploration. In the ordinary box plot,
a line extends from the box on each side to the most extreme value. Check
Detailed box plot to indicate outliers separately and have the lines extend from
the box on each side to the most extreme value (adjacent value) that is not an
outlier.
Multivariate
Scatterplots Select as many X variables as you wish, but just one Y variable. Draw
scatterplots for all pairs of variables to explore their relationships.
Correlations Find the strength of the relationship between pairs of variables as a matrix of
correlation coefficients (1 is perfect positive correlation, 1 is perfect negative
correlation, and 0 suggests no relationship).
Regression Prediction of the dependent Y variable from the independent X variables using a
linear relationship.
Predicted and
Residuals
Predicted values of Y based on the X variables, the residual differences (Actual
Y Predicted Y) and the standardized residuals.
Diagnostic
Plot
Look for problems in the regression linear model, such as unequal variability or
nonlinearity.
Univariate
Summaries
Compute univariate summaries for each variable.
Histograms Draw a histogram of each variable, for data exploration.
Box Plots Draw a box plot of each variable, for data exploration. In the ordinary box plot,
a line extends from the box on each side to the most extreme value. Check
Detailed box plot to indicate outliers separately and have the lines extend from
the box on each side to the most extreme value (adjacent value) that is not an
outlier.
Time Series
Trend-
Seasonal
A decomposition into (1) long-term trend (linear or exponential), (2) repeating
seasonal component (monthly or quarterly), (3) wandering cyclic component,
and (4) irregular component. Seasonal adjustment & forecasting. Time must
increase down data column.
Overview of StatPad Features 17
Quality Control
X-Bar, R
Charts
Chart the averages and the ranges of your data to see if this process is in or out
of control. Choose a subgroup size from 2 to 25. You may specify a standard if
one is available.
Pct, Count
Chart
Chart the percents or counts to see if this process is in or out of control. Your
data may be either counts or percentages (counts divided by the sample size).
You may specify a standard if one is available.
18 Overview of StatPad Features
One-Sample Analysis
Summaries
Summaries are used to give you selected
numbers that represent and describe your
data set.
StatPads summaries (below) for Quality
scores show how many data values there are
(n = 50), typically how high the scores are
( 1 /n
iiX X n
=90.78), and about how
far individual scores are
( S X X niin
( ) / ( )21
1 = 7.56) from
the mean. The quartiles are about 1/4 of the
way in from each end (highest and lowest)
while the median is 1/2 way in. The standard
error of the average is S S nX / .
To compute the summaries using StatPads main dialog box, select One Sample as the
situation and Summaries as the analysis. Select your data from the list (or use Add Data if your
column of numbers is in the worksheet but is not in the list), check the Output Range to be sure
that is where you want the results to appear, and then select Do It (or hit the Enter key).
Quality Summaries
50 Count n
90.78 Mean or average
7.56 Standard deviation (variability of individuals)
72 Smallest
86 Lower quartile
93 Median
97 Upper quartile
99 Largest
1.069 Standard error (variability of sample average, if random sample)
Overview of StatPad Features 19
Histogram
The histogram is used to visually explore
a data set. The data axis is horizontal, and
the bars show how many data values are
within each interval. Data are concentrated
where bars are tall. You can see typical
value, variability, and distribution shape.
StatPads histogram (below) shows that
the Quality scores fall within the interval
from about 70 to 100. They are skewed with a
long tail towards lower values, being more
concentrated in the higher end of the range.
To create a histogram using StatPads
main dialog box, select One Sample as the
situation and Histogram as the analysis. Select your data from the list (or use Add Data if your
column of numbers is in the worksheet but is not in the list), check the Output Range, and then
select Do It.
StatPad chooses a default bin width and landmark (which could be a left or right endpoint
of the histogram, or any bin boundary) for the histogram bars. These can be changed using the
Customize check-box (see next item). Note that Excel (not StatPad) chooses the minimum and
maximum horizontal scale. These may be changed (as was done for the chart below) by leaving
StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to find
Minimum and Maximum as Axis Options.
0
10
20
60 70 80 90 100
Fre
qu
ency
Quality
20 Overview of StatPad Features
Histogram (With Customized Bin Width and Landmark)
There are often several reasonable
choices for how wide to make the histogram
bars and where to place them left-to-right.
StatPad can choose a default bin width and
landmark for the histogram bars, or you can
specify customized values.
In the customized histogram below, the
bin width has been decreased to 1 to show
more detail (StatPads default bin width for
this data set was 5).
To create a customized histogram using
StatPads main dialog box, select One
Sample as the situation and Histogram as the
analysis. Select your data from the list (or use Add Data if your column of numbers is in the
worksheet but is not in the list). When you click on Customize, two edit-boxes appear: for Bin
Width and for the optional Landmark. You may then click on each and type the value you wish.
The Landmark setting would shift the bars left or right to align on the specified value. Then
check the Output Range, and then select Do It.
0
5
10
60 70 80 90 100
Fre
quen
cy
Quality
Overview of StatPad Features 21
Box Plot
The box plot is used to quickly and
visually explore a data set; it shows you a
central box defined by the quartiles, with the
median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
In StatPads box plot for Sensitivity
(below left) you see that the middle half of the
data extends from about 60 to 100, with the
median at about 80. The line at the right
extends to the largest at about 180.
StatPads detailed box plot (below right) shows outliers separately, revealing that the
largest value, at about 180, is an outlier.
To display a box plot using StatPads main dialog box, select One Sample as the situation
and Box Plot as the analysis. Select your data from the list (or use Add Data if your column of
numbers is in the worksheet but is not in the list). Click on Detailed box plot if you wish outliers
to be displayed separately. Then check the Output Range, and then select Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile.
0 50 100 150 200
Sensitivity
0 50 100 150 200
Sensitivity
22 Overview of StatPad Features
Cumulative Distribution
The cumulative distribution function is
used to show you the percentiles of the data.
Percentages are shown vertically (from 0 to
100%) and data values are horizontal. The
chart shows the percentage of the data values
(vertical scale) that are equal or less to the
given value (horizontal scale).
In StatPads cumulative distribution
function for Quality (below) you can see that
about 10% of the Quality scores are less than
or equal to 80, about 25% of the Quality
scores are less than or equal to 85, and that
about a third are scores of 90 or less.
To compute a cumulative distribution function using StatPads main dialog box, select One
Sample as the situation and Cumulative Distribution as the analysis. Select your data from the
list (or use Add Data if your column of numbers is in the worksheet but is not in the list), check
the Output Range, and then select Do It.
0%
20%
40%
60%
80%
100%
60 70 80 90 100
Cu
mu
lati
ve
Per
cen
t
Quality
Overview of StatPad Features 23
Confidence Interval
A confidence interval for the mean
includes the unknown population mean with
known confidence, e.g., 95%. Random
sampling from a normal population is
assumed.
StatPads two-sided 95% confidence
interval results for Quality (below) tell you
that the bounds of the interval are 88.63 and
92.93.
To compute a confidence interval using
StatPads main dialog box, select One
Sample as the situation and Confidence
Interval as the analysis. Select your data
from the list (or use Add Data if your column of numbers is in the worksheet but is not in the
list), check the Output Range, and then select Do It. You may also change the Confidence level
(from the default 95%) or select a one-sided interval instead of a two-sided interval (see next
item).
Confidence interval for Quality:
We are 95% sure that the
population mean for Quality
is somewhere between
88.63 and 92.93
(assuming a random sample from a normal population).
24 Overview of StatPad Features
Confidence Interval (One-Sided, 99%)
The one-sided interval says, with
specified confidence, that the unknown
population mean is either at least ... (for
an upper confidence interval) or no more
than ... (for a lower confidence interval).
You should decide whether to use a one-sided
or two-sided confidence interval before you
look at the data. You should not use both
upper and lower one-sided confidence
intervals on the same data set; either use a
two-sided interval, or choose just one side for
a one-sided confidence interval. If in doubt,
use a two-sided confidence interval.
StatPads one-sided upper 99%
confidence interval for Quality (below) shows you that the bound is 88.21.
To compute a one-sided 99% confidence interval using StatPads main dialog box, select
One Sample as the situation and Confidence Interval as the analysis. Select your data from the
list (or use Add Data if your column of numbers is in the worksheet but is not in the list), click on
the 1-sided box of your choice, set the Confidence Level to 99%, check the Output Range, and
then select Do It.
One-sided upper confidence interval for Quality:
We are 99% sure that the
population mean for Quality
is at least
88.21
(assuming a random sample from a normal population).
Overview of StatPad Features 25
Hypothesis Test
A hypothesis test is used to decide, based
on data, whether or not the unobservable
population mean could reasonably be equal
to a given reference value. Because the
sample average represents (with statistical
error) the unknown population mean, the
result is often stated in terms of a significant
(or nonsignificant) difference between the
sample average and the reference value, both
of which are known. Random sampling from
a normal population is assumed.
StatPads hypothesis test results for
Quality (below) show a very highly
significant difference between the reference
value (given here as 87.5) and the observed average Quality score of 90.78. Results include the t
value, the p value, the practical interpretation of the results, and a formal statement of the null
hypothesis being tested.
To perform a hypothesis test using StatPads main dialog box, select One Sample as the
situation and Hypothesis Test as the analysis. Select your data from the list (or use Add Data if
your column of numbers is in the worksheet but is not in the list), specify the Reference Value,
check the Output Range, and then select Do It. Optionally, you may specify a one-sided test
(upper or lower); see next item.
The p value says that, if the population mean had been equal to the reference value, then p is
the probability of observing such a large (or larger) difference between the sample average and
the reference value. Smaller p values indicate significance because rare events are unlikely.
.
Hypothesis test for Quality:
t = 3.07
p = 0.00350
The sample average
90.78
is highly significantly different (p
26 Overview of StatPad Features
Hypothesis Test (One-Sided)
A one-sided upper hypothesis test can
decide only whether the sample average is
significantly larger than the reference value.
A one-sided lower hypothesis test can decide
only whether the sample average is
significantly less than the reference value.
You should decide whether to use a one-sided
or two-sided test before you look at the data.
You should not use both upper and lower
one-sided hypothesis tests on the same data
set; either use a two-sided interval, or choose
just one side for a one-sided hypothesis test.
If in doubt, use a two-sided test.
StatPads one-sided upper hypothesis
test results for Quality (below) show that the scores are significantly larger, on average, than the
reference value (given here as 87.5). Results include the t value, the p value, the practical
interpretation of the results, and a formal statement of the null hypothesis being tested.
To perform a one-sided hypothesis test using StatPads main dialog box, select One Sample
as the situation and Hypothesis Test as the analysis. Select your data from the list (or use Add
Data if your column of numbers is in the worksheet but is not in the list), click on the 1-sided box
of your choice, specify the Reference Value, check the Output Range, and then select Do It.
One-sided hypothesis test for Quality:
t = 3.07
p = 0.00175
The sample average
90.78
is highly significantly larger (p
Overview of StatPad Features 27
Percentile
Percentiles are landmarks in the data
that are a known percentage (of the data
values) from smallest to largest. The smallest
data value is the 0th
percentile, the largest is
the 100th
percentile, the median is the 50th
percentile, and so forth.
In StatPads percentile calculation
(below) the 85th percentile for the Quality
scores is found to be a score of 98. That is,
the score 98 is about 85% of the way (in the
ordered list of scores) from the smallest to the
largest score.
To find a percentile using StatPads main
dialog box, select One Sample as the situation and Percentile as the analysis. Select your data
from the list (or use Add Data if your column of numbers is in the worksheet but is not in the
list), provide the Percentage for which you would like the percentile, check the Output Range,
and then select Do It.
For Quality:
85 th percentile
is 98
28 Overview of StatPad Features
Percentile Ranking
The percentile ranking of a given data
value gives you the percentage of the way
along in the list of data values (from smallest
to largest) that this given data value is.
In StatPads percentile calculation
(below) the Quality score 87.5 is found to be
30% of the way from smallest to largest.
To find a percentile ranking using
StatPads main dialog box, select One
Sample as the situation and Percentile
Ranking as the analysis. Select your data
from the list (or use Add Data if your column
of numbers is in the worksheet but is not in
the list), provide the data Value for which you would like the percentile ranking, check the
Output Range, and then select Do It.
For Quality:
87.5 is the
30 th percentile
Overview of StatPad Features 29
Sampling
Random Sample Without Replacement
A random sample without replacement
is chosen from a population so that (1) all
population units are equally likely to be
chosen, (2) units are selected independently
of one another, and (3) once a unit is chosen,
it cannot be chosen again. All sampled units
are different when sampling without
replacement.
StatPads results (below) show a sample
of 5 selected at random (without replacement)
from a population of size 100. The selected
items (in order) are 19, 25, 59, 67, and 89.
This list of five numbers has also been given a
name (firstSample was chosen here) which
will appear in StatPads lists of data sets.
To select a random sample without replacement using StatPads main dialog box, select
Sampling as the situation and Sample Without Replacement as the analysis. Specify a
Population Size and a Sample Size. Provide an optional name for the resulting data in case you
plan to refer to it later, check the Output Range, and then select Do It.
Random sample of size 5
from population numbered from 1 to 100
chosen without replacement:
firstSample
19
25
59
67
89
30 Overview of StatPad Features
Random Sample With Replacement
A random sample with replacement is
chosen from a population so that (1) all
population units are equally likely to be
chosen, (2) units are selected independently
of one another, and (3) once a unit is chosen,
it is replaced so that it may be chosen again.
The sampled units may or may not all be
different when sampling with replacement.
The StatPad results below show a sample
of 5 selected at random (with replacement)
from a population of size 100. The selected
items (in order) are 43, 51, 55, 55, and 82. Note that an item (55) was chosen twice. This can
happen when sampling with replacement. This
list of five numbers has also been given a name (secondSample was chosen here) which will
appear in StatPads lists of data sets.
To select a random sample with replacement using StatPads main dialog box, select
Sampling as the situation and Sample With Replacement as the analysis. Specify a Population
Size and a Sample Size. Provide an optional name for the resulting data in case you plan to
refer to it later, check the Output Range, and then select Do It.
Random sample of size 5
from population numbered from 1 to 100
chosen with replacement:
secondSample
43
51
55
55
82
Overview of StatPad Features 31
Uniform Distribution
A uniform distribution generates
numbers, chosen independently of one
another, that are equally likely to fall
anywhere within a specified interval.
In StatPads results (below) five numbers
were selected uniformly from 35 to 45. This
list of five numbers has also been given a
name (uniformSample was chosen here)
which will appear in StatPads lists of data
sets.
To select a uniform sample using
StatPads main dialog box, select Sampling
as the situation and Uniform Distribution as
the analysis. Specify the Smallest and Largest values of the distribution. Specify the Sample
Size. Provide an optional name for the resulting data in case you plan to refer to it later, check
the Output Range, and then select Do It.
Random sample of size 5
selected from a uniform distribution from 35 to 45:
uniformSample
37.28
41.19
39.90
41.81
43.87
32 Overview of StatPad Features
Normal Distribution
A normal distribution generates
numbers, chosen independently of one
another, that follow a bell-shaped
distribution, with values most likely to fall
near the mean and the width of the bell
defined by the standard deviation (Std dev).
Observations fall within one standard
deviation of the mean about 68% of the time.
In StatPads results (below) five numbers
were selected from a normal distribution with
mean 65 and standard deviation 20. This list
of five numbers has also been given a name
(simulatedScores was chosen here) which
will appear in StatPads lists of data sets.
To select a normal sample using StatPads main dialog box, select Sampling as the
situation and Normal Distribution as the analysis. Specify the Mean and Standard Deviation
(Std dev) values of the distribution. Specify the Sample Size. Provide an optional name for the
resulting data in case you plan to refer to it later, check the Output Range, and then select Do It.
Random sample of size 5
selected from a normal distribution with mean 65 and standard deviation 20:
simulatedScores
49.78
69.27
58.02
88.10
63.90
Overview of StatPad Features 33
Binomial Distribution
A binomial distribution is used to
describe the number of times an event
happens out of n trials, where each trial was
performed independently with a fixed
probability.
In StatPads results (below) five numbers
are selected from a binomial distribution with
10 trials each having probability 0.5 of
success. In the first of the five samples, there
were 4 out of 10 successes. In the second
sample, 6 of 10 were successful.
To select a binomial sample using
StatPads main dialog box, select Sampling
as the situation and Binomial Distribution as the analysis. Specify the Number n of trials and
the Probability of each trial. Specify the Sample Size. Provide an optional name for the resulting
data in case you plan to refer to it later, check the Output Range, and then select Do It.
Random sample of size 5 selected from a binomial distribution
representing the number of successes in 10 trials, each with probability 0.5:
4
6
6
5
6
34 Overview of StatPad Features
Binomial Percentages
Binomial percentages describe the
percent or proportion of the time an event
happens out of n trials, where each trial was
performed independently with a fixed
probability.
In StatPads results (below) five binomial
percents were selected from a distribution
with 10 trials each having probability 0.5 of
success. In the first of the five samples, 0.3 or
30% of the 10 trials were successful. In the
second sample, 60% of the 10 were
successful.
To select a sample of binomial
percentages using StatPads main dialog box, select Sampling as the situation and Binomial
Percentages as the analysis. Specify the Number n of trials and the Probability of each trial.
Specify the Sample Size. Provide an optional name for the resulting data in case you plan to
refer to it later, check the Output Range, and then select Do It.
Random sample of size 5 selected from a binomial distribution
representing the percentage of successes in 10 trials, each with probability 0.5:
0.3
0.6
0.4
0.8
0.3
Overview of StatPad Features 35
Probability Calculations
Normal Probability (Greater Than)
A normal distribution generates numbers
according to a bell-shaped distribution, with
values most likely to fall near the mean and
the width of the bell defined by the standard
deviation. Observations fall within one
standard deviation of the mean about 68% of
the time. Probabilities for a normal
distribution are given by the area under the
bell-shaped curve.
StatPads result (below) shows the
probability (0.401) that the specified normal
distribution (with mean 75 and standard
deviation 20) is greater than the given value
(80).
To find a normal probability using StatPads main dialog box, select Probability as the
situation and Normal Probability as the analysis. Choose the type of probability you want
(Greater than, Less than, Between, or Not between), then give the Value(s) requested. Specify the
Mean and Standard Deviation of the normal distribution. Check the Output Range, and then
select Do It.
The probability that a normal random variable
with mean 75 and standard deviation 20
is greater than 80 is:
0.401
36 Overview of StatPad Features
Normal Probability (Between)
When you ask StatPad to find the
probability of being between (or not
between), the dialog box changes to allow
you to specify the two values, lower and
upper.
StatPads result (below) shows the
probability (0.175) that the specified normal
distribution (with mean 75 and standard
deviation 20) is between the two given values
(80 and 90).
To find a normal probability using
StatPads main dialog box, select Probability
as the situation and Normal Probability as
the analysis. Choose the type of probability you want (Greater than, Less than, Between, or Not
between), then give the Value(s) requested. Specify the Mean and Standard Deviation of the
normal distribution. Check the Output Range, and then select Do It.
The probability that a normal random variable
with mean 75 and standard deviation 20
is between 80 and 90 is:
0.175
Overview of StatPad Features 37
Binomial Probability (Equal to)
A binomial distribution describes the
number of times an event happens out of n
trials, where each trial was performed
independently with a fixed probability.
StatPads result (below) shows the
probability (0.205) that a specified binomial
distribution is exactly equal to 4. That is, the
probability is 0.205 of observing exactly 4
successes out of 10 independent trials with
probability 0.5 for each trial.
To find a binomial probability using
StatPads main dialog box, select Probability
as the situation and Binomial Probability as
the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or
Between), then give the Value(s) requested. Specify the Number n of trials and the Probability
for each trial of the binomial distribution. Check the Output Range, and then select Do It.
If you specify an Equal to value that is not a whole number, StatPad correctly reports the
resulting probability as zero because a binomial random variable gives the number of successes
(which must be a whole number).
The probability that a binomial random variable
with 10 repeated trials, each with probability 0.5
is equal to 4 is:
0.205
38 Overview of StatPad Features
Binomial Probability (This or Less)
You can also ask StatPad to find the
probability that a binomial random variable
is This value or more, This value or
less, or Between two values.
StatPads result (below) shows the
probability (0.377) that a specified binomial
distribution is 4 or less. That is, the
probability is 0.377 of observing exactly 0, 1,
2, 3, or 4 successes out of 10 independent
trials with probability 0.5 for each trial.
To find a binomial probability using
StatPads main dialog box, select Probability
as the situation and Binomial Probability as
the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or
Between), then give the Value(s) requested. Specify the Number n of trials and the Probability
for each trial of the binomial distribution. Check the Output Range, and then select Do It.
The probability that a binomial random variable
with 10 repeated trials, each with probability 0.5
is 4 or less is:
0.377
Overview of StatPad Features 39
Binomial Percent (Equal to)
A binomial percentage describes the
percent or proportion of the time an event
happens out of n trials, where each trial was
performed independently with a fixed
probability.
StatPads result (below) shows the
probability (0.117) that a specified binomial
percentage distribution is exactly equal to
70%. That is, the probability is 0.117 of
observing exactly 70% successes out of 10
independent trials (this would be 7 successes
out of the 10 trials) with probability 0.5 for
each trial.
To find a probability for a binomial percent using StatPads main dialog box, select
Probability as the situation and Binomial Percent as the analysis. Choose the type of probability
you want (Equal to, This or more, This or less, or Between), then give the Value(s) requested as
a percentage. Specify the Number n of trials and the Probability for each trial of the binomial
distribution. Check the Output Range, and then select Do It.
The probability that a binomial percentage
with 10 repeated trials, each with probability 0.5
is equal to 70% is:
0.117
40 Overview of StatPad Features
Binomial Percent (Between)
You can also ask StatPad to find the
probability that a binomial percent is This
value or more, This value or less, or
Between two values.
StatPads result (below) shows the
probability (0.171) that a specified binomial
percentage distribution is between 70% and
90%. That is, the probability is 0.171 of
observing between 70% and 90% successes
out of 10 independent trials with probability
0.5 for each trial (which, in this case,
corresponds to 7, 8, or 9 occurrences
representing 70%, 80%, or 90% successes).
To find a probability for a binomial percent using StatPads main dialog box, select
Probability as the situation and Binomial Percent as the analysis. Choose the type of probability
you want (Equal to, This or more, This or less, or Between), then give the Value(s) requested as
a percentage. Specify the Number n of trials and the Probability for each trial of the binomial
distribution. Check the Output Range, and then select Do It.
The probability that a binomial percentage
with 10 repeated trials, each with probability 0.5
is between 70% and 90% is:
0.171
Overview of StatPad Features 41
Poisson Probability (Equal to)
A Poisson distribution describes the
number of times an event happens, where the
event happens independently at a fixed mean
rate.
StatPads result (below) shows the
probability (0.0337) that a specified Poisson
distribution is exactly equal to 1. That is, the
probability is 0.0337 of observing exactly 1
occurrence of the event when we expect on
average to see 5 occurrences. The probability
is small because we expect many more (5
occurrences), on average, but may
occassionally (about 3% of the time) see just
one.
To find a Poisson probability using StatPads main dialog box, select Probability as the
situation and Poisson Probability as the analysis. Choose the type of probability you want
(Equal to, This or more, This or less, or Between), then specify the whole-number Value(s) and
the Mean rate of occurrence (which is not required to be a whole number) check the Output
Range, and then select Do It.
The probability that a Poisson random variable
with mean 5
is equal to 1 is:
0.0337
42 Overview of StatPad Features
Poisson Probability (This or Less)
You can also ask StatPad to find the
probability that a Poisson random variable is
This value or more, This value or less,
or Between two values.
StatPads result (below) shows the
probability (0.265) that the specified Poisson
distribution is 3 or less. That is, the
probability is 0.265 of observing exactly 0, 1,
2, or 3 occurrences when we expect on
average to see 5 occurrences.
To find a Poisson probability using
StatPads main dialog box, select Probability
as the situation and Poisson Probability as
the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or
Between), then give the whole-number Value(s) requested. Specify the Mean rate of occurrence
of the Poisson distribution (which is not required to be a whole number) check the Output
Range, and then select Do It.
The probability that a Poisson random variable
with mean 5
is 3 or less is:
0.265
Overview of StatPad Features 43
Exponential Probability (This or More)
The exponential distribution is a skewed
continuous distribution that is often used to
model the amount of time until a task is
completed or until an event happens. The
distribution is specified by giving its mean,
which is not required to be a whole number.
StatPads result (below) shows the
probability (0.768) that the specified
exponential distribution is 2.38 or more. That
is, the probability is 0.768 of observing 2.38
or more when we expect 9 on average.
To find an exponential probability using
StatPads main dialog box, select Probability
as the situation and Exponential Probability as the analysis. Choose the type of probability you
want (This or more, This or less, Between, or Not between), then specify the Value(s) and the
Mean, check the Output Range, and then select Do It.
The probability that an exponential random variable
with mean 9
is 2.38 or more is:
0.768
44 Overview of StatPad Features
Exponential Probability (Between)
You can also ask StatPad to find the
probability that an exponential random
variable is This value or less, Between
two values, or Not between two values.
StatPads result (below) shows the
probability (0.243) that the specified
exponential distribution is between 5.2 and
10.3. That is, the probability is 0.243 of
observing a value between 5.2 and 10.3 when
we expect 9 on average.
To find an exponential probability using
StatPads main dialog box, select Probability
as the situation and Exponential Probability
as the analysis. Choose the type of probability you want (This or more, This or less, Between, or
Not between), then specify the Value(s) and the Mean, check the Output Range, and then select
Do It.
The probability that an exponential random variable
with mean 9
is between 5.2 and 10.3 is:
0.243
Overview of StatPad Features 45
Discrete Probability
A discrete probability distribution is
characterized by two lists: a list of values and
a list of probabilities (where the probabilities
must add up to 1). StatPad computes the
Expected Value (also called the Mean) as the
weighted average of the values (using
probabilities as the weights) and also
computes the standard deviation, once you
specify these two columns of numbers.
StatPads results for a situation with
three possibilities is shown below, where the
probability is 0.2 that profit is 3
($thousands), the probability is 0.5 that profit
is 5, and probability is 0.3 that profit is 8.
These are specified as two separate columns of numbers, each with its name (Profit is a
column containing 3, 5, and 8, while ProbabilityOfProfit is a column containing 0.2, 0.5, and
0.3 which properly add up to 1). We see from the results below that the expected value is $5.5
thousand and the standard deviation (measuring the risk of this situation) is $1.8 thousand for
this discrete random variable.
To compute mean and standard deviation for a discrete random variable, using StatPads
main dialog box, select Probability as the situation and Discrete Probability as the analysis.
Select one from each of the two lists (or use Add Data if your columns of numbers are in the
worksheet but are not in the lists) being sure to correctly specify which one contains the values
and which one contains the probabilities. Next check the Output Range and then select Do It.
For the discrete random variable with values in Profit
and probabilities in ProbabilityOfProfit, we have:
5.50 Mean (or expected value)
1.80 Standard Deviation
46 Overview of StatPad Features
Two-Sample Analysis
Summaries
Summaries are used to give you selected
numbers that represent and describe your
data sets. When you have two samples,
StatPad first reports summaries for each
sample separately, then gives the average
difference and the standard error of the
average difference, indicating the sampling
variability of the average difference. Note
that the two samples are assumed to have the
same measurement units (e.g., dollars).
StatPads two-sample summaries (below)
are shown for the results of a survey sent to
customers in the East and to those in the
West.
To compute summaries for two samples using StatPads main dialog box, select Two
Samples as the situation and Summaries as the analysis. Select a data set from each list (or use
Add Data if your columns of numbers are in the worksheet but are not in the lists). You may
(optionally) click on Paired to specify that the data sets have a natural pairing if the counts are
equal for the two data sets. Next check the Output Range and then select Do It.
The Paired check-box only affects the standard error of the difference. For a paired
situation, StatPad gives the ordinary standard error for the paired differences. For an unpaired
situation, StatPad uses the large-sample formula S n S n12
1 2
2
2/ / if both counts are at least 30.
Otherwise, StatPad uses the small-sample formula (assuming equal population variabilities)
( ) ( ) / / / ( )n S n S n n n n1 12
2 2
2
1 2 1 21 1 1 1 2 .
East West Summaries
17 19 Count n
1,834 2,390 Mean or average
661 761 Standard deviation (variability of individuals)
752 836 Smallest
1,295 2,004 Lower quartile
1,931 2,294 Median
2,426 2,853 Upper quartile
2,975 4,085 Largest
160 175 Standard error (variability of sample average, if random sample) 557 Average difference, West East
239 Standard error of the difference
using the small-sample unpaired formula,
which assumes equal population variabilities.
Overview of StatPad Features 47
Histograms
Histograms are used to visually explore
data sets. The data axis is horizontal, and the
bars show how many data values are within
each interval. Data are concentrated where
bars are tall. You can see typical value,
variability, and distribution shape.
StatPads histograms are shown below
for the East and West survey data, one
histogram for each data set.
To create histograms for two samples
using StatPads main dialog box, select Two
Samples as the situation and Histograms as
the analysis. Select a data set from each list
(or use Add Data if your columns of numbers are in the worksheet but are not in the lists). Next
check the Output Range and then select Do It.
StatPad chooses a default bin width and landmark for the histogram bars. If you wish to
change these, use the Customize check-box found under One Sample, Histogram. Note that
Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed
by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to
find Minimum and Maximum as Axis Options.
0
1
2
3
4
0 1000 2000 3000
East
Fre
quency
0
5
10
0 1000 2000 3000 4000 5000
West
Fre
quency
48 Overview of StatPad Features
Box Plots
Box plots are used to visually explore
and compare data sets; they show you a
central box defined by the quartiles, with the
median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
StatPads detailed box plots are shown
below, on the same scale, for the East and
West survey data. Note that the western
values are generally somewhat higher,
although there is considerable overlap. There are no outliers.
To create box plots for two samples using StatPads main dialog box, select Two Samples
as the situation and Box Plots as the analysis. Select a data set from each list (or use Add Data
if your columns of numbers are in the worksheet but are not in the lists). Click on Detailed box
plot if you wish outliers to be displayed separately. Next check the Output Range and then select
Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile.
0 1000 2000 3000 4000 5000
East (bottom), West (top)
0 1000 2000 3000 4000 5000
East (bottom), West (top)
Overview of StatPad Features 49
Confidence Interval
A two-sample confidence interval for the
population mean difference includes this
unknown population mean difference with
known confidence, e.g., 95%, when random
sampling is used and normal distributions are
assumed.
StatPads 95% confidence interval
results (below) for the mean difference, West
minus East, tell you that the bounds of the
interval are 71.16 and 1,042.42.
To compute a two-sample confidence
interval using StatPads main dialog box,
select Two Samples as the situation and Confidence Interval as the analysis. Select a data set
from each list (or use Add Data if your columns of numbers are in the worksheet but are not in
the lists). You may (optionally) change the Confidence level (from the default 95%). You may
also (optionally) click on Paired to specify that the data sets have a natural pairing if the counts
are equal for the two data sets. Next check the Output Range and then select Do It.
The two-sample confidence interval is based on the standard error of the difference,
described previously under Two Sample, Summaries. If unpaired, random sampling from each of
two normal populations is assumed (also assuming equal population variabilities if the small-
sample standard error is used). If paired, random sampling from a normal population is
assumed for the differences formed from the two measurements on each unit sampled.
Confidence interval for the difference:
West East We are 95% sure that the
population mean difference is between
71.16 and 1,042.42
using the small-sample unpaired standard error,
which assumes equal population variabilities, and also
assuming random samples from normal populations.
50 Overview of StatPad Features
Hypothesis Test
A two-sample hypothesis test is used to
decide, based on data, whether or not the
unobservable population means could
reasonably be equal to each other. Because
the sample averages represent (with
statistical error) their respective unknown
population means, the result is often stated in
terms of a significant (or nonsignificant)
difference between the sample averages, both
of which are known.
StatPads two-sample hypothesis test
results for the East and West survey (below)
show a significant difference between the two
regions (East and West) on average. Results
include the t value, the p value, the practical interpretation of the results, and a formal statement
of the null hypothesis being tested.
To perform a two-sample hypothesis test using StatPads main dialog box, select Two
Samples as the situation and Hypothesis Test as the analysis. Select a data set from each list (or
use Add Data if your columns of numbers are in the worksheet but are not in the lists). You may
(optionally) click on Paired to specify that the data sets have a natural pairing if the counts are
equal for the two data sets. Next check the Output Range and then select Do It.
The two-sample hypothesis test is based on the standard error of the difference, described
previously under Two Sample Summaries. If unpaired, random sampling from each of two
normal populations is assumed (also assuming equal population variabilities if the small-sample
standard error is used). If paired, random sampling from a normal population is assumed for the
differences formed from the two measurements on each unit sampled.
The p value says that, if the population means had been equal to each other, then p is the
probability of observing such a large (or larger) difference between the sample averages.
Smaller p values indicate significance because rare events are unlikely.
Hypothesis test for East and West:
t = 2.33
p = 0.026
The sample averages
1,834 and 2,390
are significantly different (p
Overview of StatPad Features 51
Many-Sample Analysis
Summaries
Summaries are used to give you selected
numbers that represent and describe your
data sets. When you have many samples,
StatPad reports summaries for each sample
separately.
StatPads many-sample summaries
(below) are shown for the quality scores of
four suppliers (defining four samples). For
example, supplier B had 35 scores listed, with
an average of 85.14.
To compute summaries for many samples
using StatPads main dialog box, select Many
Samples as the situation and Summaries as
the analysis. Select your data sets from the list (or use Add Data if your columns of numbers are
in the worksheet but are not in the list). Next check the Output Range and then select Do It.
SupplierA SupplierB SupplierC SupplierD Summaries
20 35 15 25 Count n
90.97 85.14 76.00 89.35 Mean or average
4.68 4.84 3.73 5.05 Standard deviation (variability of individuals)
82.41 74.31 71.11 75.97 Smallest
88.57 81.61 73.10 86.77 Lower quartile
90.13 84.97 75.22 88.80 Median
93.00 89.05 79.56 93.24 Upper quartile
99.63 94.61 82.44 98.54 Largest
1.05 0.82 0.96 1.01 Standard error (variability of sample average, if
random sample)
52 Overview of StatPad Features
Histograms
Histograms are used to visually explore
data sets. The data axis is horizontal, and the
bars show how many data values are within
each interval. Data are concentrated where
bars are tall. You can see typical value,
variability, and distribution shape.
StatPads many-sample histograms
(below) are shown for the quality scores of
the four suppliers. Some of the horizontal
scales have been changed using Excel chart
commands (see below) because Excels
choice did not show enough detail.
To create histograms for many samples
using StatPads main dialog box, select Many Samples as the situation and Histograms as the
analysis. Select your data sets from the list (or use Add Data if your columns of numbers are in
the worksheet but are not in the list). Next check the Output Range and then select Do It.
StatPad chooses a default bin width and landmark for the histogram bars. If you wish to
change these, use the Customize check-box found under One Sample, Histogram. Note that
Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed
by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to
find Minimum and Maximum as Axis Options.
0
2
4
6
8
70 80 90 100
SupplierA
Fre
quency
0
5
10
15
70 80 90 100
SupplierB
Fre
quency
0
1
2
3
4
70 75 80 85
SupplierC
Fre
quency
0
5
10
70 80 90 100
SupplierD
Fre
quency
Overview of StatPad Features 53
Box Plots
Box plots are used to visually explore
and quickly compare data sets; they show you
a central box defined by the quartiles, with
the median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
StatPads many-sample detailed box
plots (below) are shown for the quality scores
of the four suppliers, arranged on the same
scale for easy comparison. There is one box
plot for each supplier. Suppliers A and D seem to have the highest scores overall, while supplier
C has the lowest. Supplier D has a low outlier score. The horizontal scale was changed using
Excel chart commands (see below) because Excels choice did not show enough detail.
To create box plots for many samples using StatPads main dialog box, select Many
Samples as the situation and Box Plots as the analysis. Select your data sets from the list (or use
Add Data if your columns of numbers are in the worksheet but are not in the list). Click on
Detailed box plot if you wish outliers to be displayed separately. Next check the Output Range
and then select Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile. Note that Excel (not StatPad) chooses the minimum and maximum horizontal
scale. These may be changed by leaving StatPad by hitting the Esc key or selecting Cancel, then
double-clicking on the axis to find Minimum and Maximum as Axis Options.
70 80 90 100
bottom to top: SupplierA, SupplierB, SupplierC,
SupplierD
70 80 90 100
bottom to top: SupplierA, SupplierB, SupplierC,
SupplierD
54 Overview of StatPad Features
F Test for One-Way ANOVA
The F test for one-way ANOVA (analysis
of variance) is used to decide, based on data,
whether or not the unobservable population
means could reasonably be equal to each
other. Because the sample averages represent
(with statistical error) their respective
unknown population means, the result is often
stated in terms of a
Top Related