A Simulation Study to Examine the Bias of Some Sample ...

12
Applied Mathematical Sciences, Vol. 15, 2021, no. 4, 189 - 200 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2021.914276 A Simulation Study to Examine the Bias of Some Sample Measures of Skewness Nana Kena Frempong Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology-Kumasi, Ghana Ransmond Opoku Berchie Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology-Kumasi, Ghana Richard Baidoo Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology-Kumasi, Ghana Benjamin Abijah Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology-Kumasi, Ghana Osei Yaa Oforiwaa-Amanfo Department of Statistics and Actuarial Science Kwame Nkrumah University of Science and Technology-Kumasi, Ghana This article is distributed under the Creative Commons by-nc-nd Attribution License. Copyright © 2021 Hikari Ltd. Abstract In the last two decades, a number of modified and new measures of skewness have been introduced for population data. Tajuddin (2016) attempted to examine the performance of different measures of skewness. In this paper, we seek to examine the performance of Holgersson, Pearson, Classical and Tajuddin Skewness measures based on bias by a simulation study. From the Monte Carlo simulation using the Inverse transform method, the sample skewness measure

Transcript of A Simulation Study to Examine the Bias of Some Sample ...

Page 1: A Simulation Study to Examine the Bias of Some Sample ...

Applied Mathematical Sciences, Vol. 15, 2021, no. 4, 189 - 200

HIKARI Ltd, www.m-hikari.com

https://doi.org/10.12988/ams.2021.914276

A Simulation Study to Examine the Bias of

Some Sample Measures of Skewness

Nana Kena Frempong

Department of Statistics and Actuarial Science

Kwame Nkrumah University of Science and Technology-Kumasi, Ghana

Ransmond Opoku Berchie

Department of Statistics and Actuarial Science

Kwame Nkrumah University of Science and Technology-Kumasi, Ghana

Richard Baidoo

Department of Statistics and Actuarial Science

Kwame Nkrumah University of Science and Technology-Kumasi, Ghana

Benjamin Abijah

Department of Statistics and Actuarial Science

Kwame Nkrumah University of Science and Technology-Kumasi, Ghana

Osei Yaa Oforiwaa-Amanfo

Department of Statistics and Actuarial Science

Kwame Nkrumah University of Science and Technology-Kumasi, Ghana

This article is distributed under the Creative Commons by-nc-nd Attribution License.

Copyright © 2021 Hikari Ltd.

Abstract

In the last two decades, a number of modified and new measures of skewness

have been introduced for population data. Tajuddin (2016) attempted to examine

the performance of different measures of skewness. In this paper, we seek to

examine the performance of Holgersson, Pearson, Classical and Tajuddin

Skewness measures based on bias by a simulation study. From the Monte Carlo

simulation using the Inverse transform method, the sample skewness measure

Page 2: A Simulation Study to Examine the Bias of Some Sample ...

190 Nana Kena Frempong et al.

proposed by Tajuddin (1999) performs better based on least bias on Weibull

model. The Classical measure can be used to compute skewness of any income

and wealth data with sample size below 100. And for larger income dataset (100

and above), the Pearson measure can be used to estimate skewness with minimal

bias.

Keywords: Skewness, Monte Carlo, Bias

1 Introduction

During the late 19th century, it was an extensive practice among statisticians to

treat any frequency distribution as normal. Histogram data displaying

plurimodality were typically fitted with normal mixtures, skewness was removed

at the outset by transformations to normality. Many inference about the population

distribution in modern times rely on non-normal response that means data used do

not need transformations to satisfy normality assumptions. The Classical measure

of skewness, “γ = “, measured by the standardized third moment, the Pearson

measure (1905), “ξ”, which were proposed some past decades and used in many

applications both have been criticized in literature. Tajuddin (1999) argued that

the Pearson measure is not a reliable measure in the presence of lesser skewed

distributions. G. Brys, M. Hubert, and A. Struyf (2003), highlighted the Classical

measure may be strongly affected by just a single outlier. Even though Doane and

Seward (2011) have recommended the use of Pearson measure “ξ” over other

measures since it is the only way to measure skewness when we do not have the

original sample data and easy to implement. Holgersson (2010) established that

the Pearson measure does not uniquely determines symmetry for general

distribution, hence proposed a modified skewness measure “ ”, which is a

function of both classical and Pearson. A recent paper by Tajuddin (2016)

examined the performance of some sample measures of skewness for a number of

distributions using a simulation study. Tajuddin criticized the Classical, Pearson,

Holgersson measures alongside other measures to suffer in the presence of

outliers. However, the basis of comparison using sensitivity of outliers has a

restrictive space and cannot be overly accepted. The use of bias, efficiency, etc. of

recommending a particular measure was not evident in his paper. The focus is on

which of these sample measures have minimal bias considering different

distributions. The objective of this paper is to estimate skewness and examine the

performance of the sample measures using bias estimates under different skewed

distributions using a simulation study. In section 2, statistical techniques used in

analyzing and implementing the simulations are described. Section 3 presents the

results and detailed discussion of the outputs from the simulations. In section 4,

we present the conclusion of the paper.

Page 3: A Simulation Study to Examine the Bias of Some Sample ...

A simulation study to examine the bias of sample measures of skewness 191

2 Methodology

Skewness is a measure of the asymmetry of the probability distribution of a

real-valued random variable about its mean. For the purpose of this study, we

examine the bias of Classical, Pearson (1905), Tajuddin (1999) and Holgersson

(2010) measures introduced extensively by Tajuddin (2016).

2.1 Review of Sample Measures of Skewness

The following section reviews the existing sample measure of skewness. The

theoretical definitions and the sample definitions are presented.

2.1.1 Classical Measure

Theoretically, the Classical Measure is defined: 3 3

3

XE

(2.1)

where µ is mean, is the third central moment, and σ is standard deviation.

The sample version of (2.1) designated by “C” is defined:

3

1

1.52

1

n

i

i

n

i

i

n X X

C

X X

(2.2)

Where X is the sample mean with size n.

2.1.2 Pearson Measure

The Pearson Measure denoted theoretically by ξ is defined as:

m

(2.3)

where m is the population median. The sample version of (2.3) denoted by “P” is

defined as;

X medianP

s

(2.4)

Where X is the sample mean and s is an unbiased estimator of .

2.1.3 Holgersson Measure

The theoretical measure of skewness suggested by Holgersson is defined as:

33E X m (2.5)

with median m and σ is the population standard deviation. The sample version of

(2.5) denoted by “H” is defined as:

Page 4: A Simulation Study to Examine the Bias of Some Sample ...

192 Nana Kena Frempong et al.

3

1

n

i

i

X median

Hn

, where (2.6)

2.1.4 Tajuddin Measure

Tajuddin’s measure is defined as;

= 2F(µ)-1 (2.7)

The sample version of (2.7) is obtained after removing the median value from the

sample and then considering;

T = (2.8)

2.2 Bias

The bias of an estimator is the difference between the expected value of the

estimator and the true value of the parameter being estimated. The theoretical bias

of an estimator (relative to its parameter ) is defined as;

( ) ( )n nBias T E T (2.9)

The sample bias of an estimator is given as;

Bias [G] =

Where the true parameter and G is the sample measure. We use delta as the

measure of deviations away from the true parameter value. In these problems, the

shape parameters of the two distributions were varied for the simulations.

Delta (δ) = |a- | where is the initial shape parameter and a is the varying

shape parameter of the distribution.

2.3 Simulation Study

To implement the simulation, we considered two positively skewed continuous

distributions. The choice of these distributions is because of broader application in

the area of medicine, engineering and economics. We illustrate the techniques

with the Weibull and Pareto distributions. Kalbfeisch (1985) showed extensively

the properties of these two distributions.

The PDF, CDF and Moments of the Weibull are shown below:

, x ≥ 0

Page 5: A Simulation Study to Examine the Bias of Some Sample ...

A simulation study to examine the bias of sample measures of skewness 193

, x ≥ 0

,

Where a > 0 is the shape parameter and λ > 0 is the scale parameter of the

distribution.

The Moment Generating Function is given as;

, for λ = 1,2, 3, …

The PDF, CDF and moments of the Pareto are shown below:

, x ≥ 0

, for x ≥ σ

where k is the shape parameter and σ is the scale parameter.

The coefficient of skewness is given as;

, for k > 3

The Moment Generating Function of the Pareto distribution is given as;

Random samples from the Weibull and Pareto distributions were generated using

the Monte Carlo simulation technique. Specifically, the Inverse Transform

method that generates random samples based on the inverse CDF of the uniform

distribution. The implementation of the simulations was done in RStudio version

1.0.143. One thousand sample of sizes n = 20, 50 and 100 are obtained from the

Weibull and Pareto distributions. For each sample, the four sample measures C, P,

H, T are computed with the average measures computed for 1000 samples. The

estimated values of the skewness measures are compared with the corresponding

population values to estimate the bias.

3 Results

In this section, we present detailed discussion of simulation studies based on the

findings of the simulated data.

3.1 Skewness of Weibull Distribution

Table 3.1 shows the simulations for different skewness measures for different

sample sizes over varying shape parameter of the Weibull distribution. For each

value of a (the shape parameter) with fixed scale parameter (λ = 1), the true popu-

Page 6: A Simulation Study to Examine the Bias of Some Sample ...

194 Nana Kena Frempong et al.

lation skewness is shown as True Value in the tabulated results.

Table 3.1: Estimated Average Skewness of Weibull (1,a) with different Sample

Size a n H C P T

0.1 20 50

100

True Value

4.6047 6.6102

8.9573

69899.9265

3.7338 6.0314

8.5319

69899.9195

0.2747 0.1883

0.1400

0.0023

0.813 0.884

0.916

0.9784

0.2 20 50

100

True Value

4.4008 6.0518

7.7589

190.4922

3.3066 5.2635

7.1292

190.3028

0.3392 0.2533

0.2052

0.0630

0.719 0.781

0.807

0.8522

0.4 20 50

100

True Value

3.9397 5.0067

5.7935

13.2023

2.6069 3.8502

4.7059

12.3402

0.4066 0.3634

0.3454

0.2801

0.538 0.574

0.590

0.6029

0.5 20 50

100

True Value

3.6317 4.5031

5.0842

9.1084

2.3207 3.2712

3.9085

8.0498

0.4005 0.3853

0.3715

0.3398

0.464 0.496

0.505

0.5138

1 20 50

100

True Value

2.2243 2.5412

2.7325

6.94945

1.3206 1.6140

1.7792

6.0000

0.2814 0.2952

0.3055

0.3069

0.239 0.256

0.263

0.2642

2 20 50

100

True Value

0.8201 0.9108

0.9352

13.7208

0.4728 0.5651

0.5904

13.3717

0.1010 0.1124

0.1132

0.1159

0.082 0.080

0.089

0.0881

Key observations from Table 3.1;

For all the measures except P, the average skewness values for a ≤ 2 are

no more than the True skewness values. The measure P has its True values

been larger than the estimated skewness values at a ≥ 1.

The estimated skewness values for all the measures with the exception of

P, generally increases and gets closer to the population values as the

sample size increases.

For C and H measures, at a < 1, the True values decreases exponentially

with an increase in the values of a. The True values however increases at a

= 2.

The P measure behaves oddly, its skewness values for both the sample and

population measures increases with increase in a, for a < 1.

Page 7: A Simulation Study to Examine the Bias of Some Sample ...

A simulation study to examine the bias of sample measures of skewness 195

Generally, the estimated values for measure T increases as the sample size

increases for a < 2. At a = 2, the estimated value decreases for the sample

size 50.

3.1.1 Absolute Bias of Weibull Distribution

The absolute bias associated with each sample estimate is computed and displayed

in Table 3.2. The values were computed by subtracting the True Values in Table

3.1 from the sample estimates in the same table. The absolute of these results are

taken and presented in Table 3.2.

The following observations can be made from Table 3.2.

The absolute bias values decrease as a increases up to 1 for H, C and T

measures for all sample sizes. However, at a = 2 the absolute bias for H

and C increases. This is as a result of the True value of these measures

increasing at a = 2 from Table 3.1.

Table 3.2: Absolute Bias of Sample Measures for Weibull (1, a) with different

Sample sizes. a n H C P T

0.1 20 50

100

69895.3218 69893.3163

69890.9692

69896.1857 69893.8881

69891.3876

0.2724 0.1860

0.1377

0.1654 0.0944

0.0624

0.2 20 50

100

186.0194 184.4404

182.7333

186.9962 185.0393

183.1736

0.2762 0.1903

0.1422

0.1332 0.0712

0.0452

0.4 20 50

100

9.2626 8.1956

7.4088

9.7333 8.4900

7.6343

0.1265 0.0833

0.0653

0.0649 0.0289

0.0129

0.5 20 50

100

5.4767 4.6053

4.0242

5.7291 4.7786

4.1413

0.0607 0.0455

0.0317

0.0498 0.0178

0.0088

1 20 50

100

4.7252 4.4083

4.2170

4.6794 4.3860

4.2208

0.0255 0.0117

0.0014

0.0252 0.0082

0.0012

2 20 50

100

12.9007 12.8100

12.7856

12.8989 12.8066

12.7813

0.0149 0.0035

0.0027

0.0061 0.0081

0.0009

However, for P measure the absolute bias values marginally increased

from a = 0.1 to a = 0.2 and then decreases as a increases at a > 0.2 over

each sample size.

As the sample size increases, the absolute bias decrease at each level of a

for all measures.

Page 8: A Simulation Study to Examine the Bias of Some Sample ...

196 Nana Kena Frempong et al.

From figure 1 (upper panel), it can be observed that;

The bias of P statistic shows a steady decline from δ ≤ 0.4, then rise at δ =

0.5. The bias declines sharply at δ = 1 for sample size 20 until a steady

increase as δ increases. Moreover, the bias gets closer to 0 at δ = 1.4 and δ

= 1.6 for sample size 50 and 100 respectively.

Bias of T decreases steadily for δ ≤ 0.4, and then gradually declines to 0 at

δ > 0.5.

From figure 1(lower panel), bias of C shows a sharp decrease from δ = 0.1

to δ = 0.2 and trails down to 0 as δ increases for all sample sizes.

Figure 1: Bias plots of sample skewness measures for Weibull

The bias of H decreases sharply from δ = 0.1 to δ = 0.2, there was also a

sharp decline from δ = 0.2 to δ = 0.3, then gradually increase for δ > 0.5

for all sample sizes.

3.2 Skewness of Pareto Distribution

The Pareto distribution is a skewed, heavy-tailed distribution that is usually used

to model the distribution of incomes and describe the allocation of wealth among

individuals in the theory of economics.

Page 9: A Simulation Study to Examine the Bias of Some Sample ...

A simulation study to examine the bias of sample measures of skewness 197

Table 3.3 shows the simulated skewness for different skewness measures over

different sample sizes and different shape parameter of the Pareto distribution. For

each value of k (the shape parameter) with fixed scale parameter (σ = 1), the true

population skewness is shown as True Value in the tabulated results.

Table 3.3: Estimated Average Skewness of Pareto (1, k) with different Sample

Sizes k n H C P

4.1

20 50

100

True Value

4.607073 6.821715

9.422526

2.9456675

3.725002 6.286283

9.049177

9.149889

0.2774955 0.1745165

0.1230689

-1.313202

4.2

20 50

100

True Value

4.633934 6.810550

9.446849

1.8903061

3.760779 6.277746

9.076998

8.822819

0.2750581 0.1736626

0.1219333

-1.398706

4.4

20 50

100

True Value

4.627718 6.879753

9.482156

-0.2830036

3.763452 6.358331

9.114652

8.315869

0.2724791 0.1700825

0.1211623

-1.571824

4.5

20 50

100

True Value

4.638873 6.883433

9.476721

-1.4307318

3.780531 6.363539

9.110023

8.116099

0.2706314 0.1696082

0.1209194

-1.659337

5.0

20 50

100

True Value

4.661283 6.855861

9.637297

-8.2028360

3.826743 6.338895

9.286892

7.436128

0.2636517 0.1687025

0.1156306

-2.104795

6.0

20 50

100

True Value

4.665908 7.010276

9.641479

-29.9231864

3.844712 6.523988

9.294849

6.804138

0.2597022 0.1589670

0.1143992

-3.024070

Table 3.3 presents the estimated sample skewness values and True Values of the

different measures of skewness, precisely the Holgersson measure (H), Pearson

measure (P) and the Classical measure (C). The Tajudin measure (T) was not

considered due to convergence issues with the simulations.

Key observations from Table 3.3;

For the H measure, as n increases the sample estimates tend to increase

and get farther from the True Values. The True values also decreases and

take negative values as the shape parameter (k) increases.

Page 10: A Simulation Study to Examine the Bias of Some Sample ...

198 Nana Kena Frempong et al.

At k = 4.1 the C measure values increase and approach the True Value as

the sample size (n) increases from 20 to 100. However, for k ≥ 4.2, the

estimated values for sample size 100 tends to be larger than the True

Values.

For P Measure, the True Values decreases as k increases. However, the

estimated values decrease and approaches the True values as the sample

size increases at each k.

3.3.1 Absolute Bias of Pareto Distribution

The absolute bias associated with each sample estimate is computed and displayed

in Table 3.4. The values were computed by subtracting the True Values in Table

3.3 from the sample estimates.

Table 3.4: Absolute Bias of Pareto (1, k) with different Sample Sizes k n H C P

4.1 20 50

100

1.661406 3.876047

6.476858

5.424887 2.8636059

0.1007119

1.590697 1.487718

1.436271

4.2 20 50

100

2.743628 4.920244

7.556543

5.062040 2.5450734

0.2541793

1.673764 1.572368

1.520639

4.4 20 50

100

4.910721 7.162756

9.765160

4.552417 1.9575387

0.7987829

1.844303 1.741907

1.692987

4.5 20 50

100

6.069605 8.314165

10.907453

4.335568 1.7525595

0.9939242

1.929969 1.828945

1.780257

5.0 20 50

100

12.864119 15.058697

17.840133

3.609385 1.0972331

1.8507636

2.368446 2.273497

2.220425

6.0 20 50

100

34.589094 36.933462

39.564665

2.959426 0.2801503

2.4907107

3.283772 3.183037

3.138469

The following observations can be made from Table 3.4.

For H measure, the absolute bias values increase as k increases for each

sample size. Also, the absolute bias gets larger as the sample size increases

from 20 to 100 for all k.

For C measure, the absolute bias decrease as k increases for each

individual sample size. Also, the absolute bias gets smaller as the sample

size increases from 20 to 100 for k ≤ 4.5. However, for k > 4.5, the

absolute bias tends to increase as n increases from 50 to 100.

Page 11: A Simulation Study to Examine the Bias of Some Sample ...

A simulation study to examine the bias of sample measures of skewness 199

For P measure, the absolute bias values increase as k increases for each

individual sample size. Again, the absolute bias decreases marginally as

the sample size increases from 20 to 100 for each k.

Figure 2: Bias Plots of Sample Measures for Pareto (Type I) Distribution

From figure 2, it can be observed that;

Changes in the shape parameter results to an increase in the bias of the

H measure for Pareto distribution. Also, the bias gets larger as the

sample size increases.

As the change in the shape parameter increases, the bias of the C

measure decreases for sample sizes 20 and 50. However, for sample

size 100 the bias increases as change in the shape parameter increases.

The P measure has bias increasing as the change in shape parameter

increases. Also, the bias gets smaller as sample size increases.

4 Conclusion

4.1 Summary of Findings

The plot of bias of Tajuddin measure (T) portrays its adequacy for estimating

skewness of the Weibull distribution with minimal bias. We observed that bias of

Page 12: A Simulation Study to Examine the Bias of Some Sample ...

200 Nana Kena Frempong et al.

T at each shape parameter gets closer to zero (0) as sample size increases. It shows

a positively skewed plot as expected of a Weibull distribution of shape parameter

a ≤ 2.6. The T measure seems to perform well when bias is considered.

In the case of the Pareto distribution (Type I), we observe that bias of all the three

measures were sensitive to sample size. The absolute bias is smaller for the C and

P measures as compared to the H measure. The Classical measure tends to

estimate skewness of the Pareto better for smaller samples (<100). However, the

Pearson measure performs well for moderate sample size (>100) and above even

though the bias increases as the shape parameter increases.

We conclude that, for any Weibull model, the Tajuddin measure performs better

than the other measures in estimating skewness. The Classical measure can be

used to compute skewness of any income and wealth data with sample size below

100. And for larger income dataset (100 and above), the Pearson measure can be

used to estimate skewness.

References

[1] Brys, G., Hubert, M. and Struyf, A., A comparison of some new measures of

skewness, in: Developments in Robust Statistics, ICORS 2001, eds. R. Dutter, P.

Filzmoser, U. Gather, and P.J. Rousseeuw, Springer-Verlag Heidelberg, 2003, 98-

113. https://doi.org/10.1007/978-3-642-57338-5_8

[2] Doane, D.P. and Seward, L.E., Measuring skewness: A forgotten statistic,

Journal of Statistics Education, 19 (2) (2011), 1-18.

https://doi.org/10.1080/10691898.2011.11889611

[3] Holgersson, H.E.T., A Modified Skewness Measure for Testing Asymmetry,

Communications in Statistics - Simulation and Computation, 39 (2010), 335-346.

https://doi.org/10.1080/03610910903453419

[4] Kalbfleisch, J.G., Probability and Statistical Inference, Vol. 2: Statistical

Inference, Springer, 1985. https://doi.org/10.1007/978-1-4612-1096-2

[5] Pearson, K., Contributions to the mathematical theory of evolution. II. Skew

variation in homogeneous material, Philos. Trans. Roy. Soc. Lond., A186 (1895),

343–414. https://doi.org/10.1098/rsta.1895.0010

[6] Tajuddin, I.H., A simple measure of skewness, Statistica Neerlandica, 50

(1996), 362-366. https://doi.org/10.1111/j.1467-9574.1996.tb01502.x

[7] Tajuddin, I.H., A comparison between two simple measures of skewness,

Journal of Applied Statistics, 26 (1999), 767-774.

https://doi.org/10.1080/02664769922205

[8] Tajuddin I.H., A simulation study of some sample measures of skewness, Pak.

J. Statist., 32 (1) (2016), 49-62.

Received: October 5, 2020; Published: April 7, 2021