How to Reject Outliers in Data_ 4 Steps (With Pictures) - WikiHow

Post on 02-Jun-2017

217 views 2 download

Transcript of How to Reject Outliers in Data_ 4 Steps (With Pictures) - WikiHow

3/23/2014 How to Reject Outliers in Data: 4 Steps (with Pictures) - wikiHow

http://www.wikihow.com/index.php?title=Reject-Outliers-in-Data&printable=yes 1/3

How to Reject Outliers in DataExperimental data must be scrutinized for outliers in order to draw meaningful

conclusions from it. In the simplest of cases, this is achieved by computing the

mean and the standard deviation using all the data points and rejecting any that

are over 3 standard deviations away from the mean.

However, as the number of samples in the dataset increases, the probability of

seeing extreme samples also increases. To account for the increased

likelihood of coming across extreme values, the following modifications are

suggested.

1

2

3

4

Compute the mean using all the data points, including suspected outliers.

Compute the standard deviation using

For each data point, xi, compute, in a separate column, the number of

standard deviations that each data point is away from the mean. Use the

following steps to calculate the probability of each data point occurring:

For each z > 0, compute Nα, the area under the normal distribution curve between

z and ∞, in a separate column. You may do this in Excel using N* (1 - the

normsdist()) function, or using the following formula:

For each z < 0, compute Nα, the area under the normal distribution curve between

-∞ and z, in a separate column. You may do this in Excel using N* the normsdist()

function, or using the following formula:

If Nα < 0.05, reject the data point as an outlier.

The figure below shows a series of data points with the first two intentionally

set to be visibly different from the others. There were 80 data points, with a mean

of 1122.6 and a standard deviation of 1.430.

The low outlier was 1117, with a computed z=3.899. The Nα value was 0.004,

which is less than 0.05, so this point may be safely rejected as an outlier.

The high outlier was 1128, with a computer z=3.794. The Nα value was 0.006,

which is less than 0.05, so this point may also be safely rejected as an outlier.

Steps

3/23/2014 How to Reject Outliers in Data: 4 Steps (with Pictures) - wikiHow

http://www.wikihow.com/index.php?title=Reject-Outliers-in-Data&printable=yes 2/3

Save

Add your own methodName your method

Add your steps using an ordered list. For example:1. Step one2. Step two3. Step three

If outliers occur, the reason for the outlier should be identified prior to

discarding it. If a value is a data entry error or from another process it

should be corrected if possible rather than deleting it. If the value is from

the process or population you are studying and is not a data entry error it

should not be deleted. It is a part of the natural variability in the data and

should be included in quantifying the variability.

This procedure assumes the values generated by the process or

population follow a normal distribution. Although measurement errors

may follow a normal distribution in many cases, many populations and

processes may not follow a normal distribution. As a result the

procedure described in this article may result in incorrectly deleting

values from the data. Also even with data that is normally distributed

some values beyond 3 standard deviations will occur with a large

number of observations.

It is not considered good statistical practice to discard outliers without

strong cause. Discarding outliers without cause typically results in

underestimating the actual variability of the process that generates the

data. Outliers typically occur from three possible causes:

Data entry error.

Values from another population or process.

Actual unusual values in the data.

Tips

Warnings

Article Info

3/23/2014 How to Reject Outliers in Data: 4 Steps (with Pictures) - wikiHow

http://www.wikihow.com/index.php?title=Reject-Outliers-in-Data&printable=yes 3/3

Thanks to all authors for creating a page that has been read 37,901 times.

Categories: Probability and Statistics

Recent edits by: Teresa, Luv_sarah, Lucky7