A Mathematical Programming Approach to Stratified Random ... · A Mathematical Programming Approach...

20
i Cairo University Faculty of Economics and Political Science Department of Statistics A Mathematical Programming Approach to Stratified Random Sampling Prepared by Dina Mohsen Mohamed Sabry Supervised by Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum Professor of Statistics Professor of Statistics Department of Statistics Department of Statistics Dr. Mahmoud Mostafa Rashwan Assistant Professor of Statistics Department of Statistics A Thesis Submitted to the Department of Statistics, Faculty of Economics and Political Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics 2012

Transcript of A Mathematical Programming Approach to Stratified Random ... · A Mathematical Programming Approach...

i

Cairo University

Faculty of Economics and Political Science

Department of Statistics

A Mathematical Programming Approach to

Stratified Random Sampling

Prepared by

Dina Mohsen Mohamed Sabry

Supervised by

Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum

Professor of Statistics Professor of Statistics

Department of Statistics Department of Statistics

Dr. Mahmoud Mostafa Rashwan

Assistant Professor of Statistics

Department of Statistics

A Thesis Submitted to the Department of Statistics, Faculty of Economics and

Political Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics

2012

ii

A Mathematical Programming Approach to Stratified Random Sampling

Abstract

When applying stratified sampling, the problem of allocating the sample to different strata

arises. Many classical methods are available to allocate the sample to the different strata.

Nevertheless, mathematical programming methods have many advantages and can handle the

allocation problem while overcoming the limitations of the classical methods. Thus, there

have been many attempts by researchers to apply mathematical programming in the field of

sampling. Most of these attempts concentrate on minimizing the variances of the overall

estimators when optimally allocating the sample to the different strata. However, none of the

models focuses on minimizing the variances of the estimators within the strata and this is

what this study aims to deal with. In many practical situations, the purpose of the study could

be to estimate overall estimators in addition to separate estimators within each stratum.

Hence, the present study targets minimizing the coefficients of variation of the overall

estimators in addition to the coefficients of variation of the estimators within the strata when

optimally allocating the sample. This creates a multiple objective problem that needs to be

dealt with using the appropriate approach. As a result, this study adopts a goal programming

approach that tries to tackle this problem in multivariate surveys by maximizing the precision

of the overall estimators in addition to the precision of the estimators within each stratum

under a fixed cost. Integer programming is used to guarantee integer values for the optimal

allocation. The proposed approach is compared with three of the classical methods of

allocation in addition to five mathematical programming models suggested in the literature

using a simulation study. Based on the criteria used for comparison, it is shown that the

suggested models have the highest efficiency in obtaining the estimators within the strata in

certain cases.

Keywords: Multivariate Stratified Sampling; Optimum Allocation; Goal Programming.

Supervised by

Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum

Professor of Statistics Professor of Statistics

Department of Statistics Department of Statistics

Dr. Mahmoud Mostafa Rashwan

Assistant Professor of Statistics

Department of Statistics

A Thesis Submitted to the Department of Statistics, Faculty of Economics and Political

Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics

2012

iii

Name: Dina Mohsen Mohamed Sabry Youssef

Nationality: Egyptian

Date and Place of Birth: 9/12/1985, Giza – Egypt

Degree: Master of Science

Specialization: Statistics

Supervisor:

Prof. Ramadan Hamed Mohamed Prof. Reda Ibrahim Mazloum

Professor of Statistics Professor of Statistics

Department of Statistics Department of Statistics

Dr. Mahmoud Mostafa Rashwan

Assistant Professor of Statistics

Department of Statistics

Title of the Thesis:

A Mathematical Programming Approach to Stratified Random Sampling

Summary of the Thesis:

The main objective of this study is to introduce goal programming models that try to

tackle the problem of sample allocation in stratified random sampling by taking into account

the precision of the overall estimators in addition to the precision of the estimators within the

strata under a fixed budget. Hence, the present thesis focuses on the formulation of the

proposed models. Moreover, the proposed models are compared with other models presented

in the literature through a simulation study. The performance of the models is evaluated using

three criteria that measure the efficiency of the models in obtaining the overall estimators in

addition to the estimators within the strata.

The present thesis is divided into five chapters which are organized in the following manner:

Chapter 1: Introduces the main objectives of this study in addition to outlining the contents

of the thesis.

Chapter 2: Illustrates a review on stratified random sampling, in addition to some of the

classical methods of sample allocation. Moreover, the notations that are to be used throughout

the thesis are to be demonstrated in this chapter as well.

iv

Chapter 3: Presents a review on various mathematical programming approaches suggested in

the literature that deal with the problem of sample allocation in stratified random sampling.

Chapter 4: Introduces the proposed goal programming approach discussing the criteria that

are to be used for comparison in addition to the simulation study conducted and the

conclusions reached from the simulation.

Chapter 5: Discusses the main concluding remarks reached and presents some points for

future work.

v

Acknowledgments

I would like to express my most profound gratefulness and appreciation to Prof. Ramadan

Hamed for his patience, guidance and continuous help during the preparation time of this

thesis.

Also, my deepest gratitude goes to Prof. Reda Mazloum for her support, care and

co-operation in providing me with her knowledge and expertise whenever needed.

I would also like to genuinely and sincerely thank Dr. Mahmoud Rashwan who never

hesitated in helping and assisting me. Dr. Mahmoud was very supportive, encouraging and

always provided me with positive energy that motivated me during the tough times of my

research.

A warm and heartfelt indebtedness and thankfulness goes to my family especially my parents

who were always there for me and for their unconditional love and support throughout my

whole life.

Last but not least, I would like to dedicate a very special thanks to my professors, colleagues

and friends at the faculty of Economics and Political Science for their continuous support.

vi

Table of Contents

Chapter 1: Introduction .............................................................................................. 1

1.1 Research Objective ............................................................................................... 2

1.2 Thesis Outline ....................................................................................................... 3

Chapter 2: Review on Stratified Random Sampling ................................................ 4

2.1 Stratified Random Sampling ................................................................................ 4

2.2 Types of Sample Allocation ................................................................................. 6

2.3 Sample Allocation with More than One Variable ................................................ 9

Chapter 3: Review on Mathematical Programming Approaches to Sample

Allocation in Stratified Random Sampling .......................................... 11

3.1 Univariate Case .................................................................................................. 12

3.2 Multivariate Case (correlation is not taken into account) .................................. 13

3.2.1 Cost As An Objective .................................................................................. 13

3.2.2 Precision As An Objective ........................................................................... 14

3.3 Multivariate Case (correlation is taken into account) ........................................ 24

3.4 Precision of Stratum Estimators ......................................................................... 25

Chapter 4: The Suggested Mathematical Programming Approach ..................... 26

4.1 The Suggested Mathematical Programming Approach ...................................... 27

4.1.1 The Suggested Objectives ............................................................................ 27

4.1.2 The Proposed Models .................................................................................. 28

4.1.3 The Criteria for Comparison ........................................................................ 32

4.2 Simulation Study ................................................................................................ 33

4.2.1 The Design of the Simulation Study ............................................................ 33

4.2.2 Data generation ............................................................................................ 35

4.2.3 Software Packages ....................................................................................... 37

4.3 Simulation Results .............................................................................................. 39

4.3.1 Mean of Relative Efficiencies (MRE) .......................................................... 40

4.3.2 Total Sample Size ........................................................................................ 42

4.3.3 Mean of Coefficients of Variation (MCV) ................................................... 42

4.3.4 Relative Mean Index (RMI) ......................................................................... 48

4.3.5 The Effect of Varying the Budget on the Models’ Performance ................. 49

Chapter 5: Conclusions and Further Research ....................................................... 52

References.. ................................................................................................................. 54

vii

List of Tables

Table 4.1 : Summary of the Models under Comparison with the Proposed Approach….32

Table 4.2 : Simulation Design…………………..………………………………….........35

Table 4.3 : Combination 1: 2x2 (2 strata and 2 variables)…………………………........36

Table 4.4 : Combination 2: 3x2 (3 strata and 2 variables)…………………………........36

Table 4.5 : Combination 3: 4x2 (4 strata and 2 variables)…………………………........36

Table 4.6 : Combination 4: 2x3 (2 strata and 3 variables)…………………………........37

Table 4.7 : Combination 5: 3x3 (3 strata and 3 variables)…………………………........37

Table 4.8 : Combination 6: 4x3 (4 strata and 3 variables)…………………………........37

Table 4.9 : Mean of Relative Efficiencies (MRE)…………………………………........40

Table 4.10 : Total Sample Size “ ”………………………………………………..........42

Table 4.11 : Mean of Coefficients of Variation in the 2 Strata Case…….......……….…43

Table 4.12 : Mean of Coefficients of Variation in the 3 Strata Case………...………….44

Table 4.13 : Mean of Coefficients of Variation in the 4 Strata Case…………...……….46

Table 4.14 : Relative Mean Index (RMI)……..……………………………………........48

Table 4.15 : Mean of Relative Efficiencies (MRE) Under Different Budgets…..………50 Table 4.16 : Total Sample size “ ” Under Different Budgets…..…………………....... 51

viii

Glossary of Notation Total number of units in stratum

Total population size

Number of units in the sample drawn from stratum

Total sample size

Value obtained for the th unit in the th stratum

Stratum weight

True population mean in stratum

Sample mean in stratum

True population variance in stratum

Sample variance in stratum

Overall population mean

Overall sample mean

( ) Variance of the sample mean in stratum

( ) Variance of the overall sample mean

Sample size in the th stratum for the th variable

Value obtained for the th unit in the th stratum for the th variable

True population mean of the th variable in stratum

Sample mean of the th variable in stratum

True population variance of the th variable in stratum

Sample variance of the th variable in stratum

( ) Variance of the sample mean of the th variable in stratum

( ( )) Variance of the overall sample mean of the th variable

Total budget

Fixed cost

Cost per sampling unit in the th stratum

Weights representing the importance of the th variable

( ( )) Individual desired variance of the overall sample mean of the th variable

( ( )) Compromise variance of the overall sample mean of the th variable under optimum

compromise strata sample sizes

( ) Compromise variance of the sample mean of the th variable in the th stratum under

optimum compromise strata sample sizes

Positive deviation: The amount of deviation for a given goal by which it exceeds the

aspired level (target)

Negative deviation: The amount of deviation for a given goal by which it is less than

the aspired level (target)

The lower bound on the sample size that is to be drawn from the th stratum

1

Chapter 1

Introduction

“Sampling is the process by which inference is made to the whole by examining

only a part”. Sample surveys are conducted on different cultural and scientific aspects

[18]. The use of sampling surveys arose from the need to minimize the time and effort

that is greatly consumed when using complete enumeration. Moreover, although the

cost per observation in sample surveys is higher than in complete enumeration; the

overall cost of the sample survey will be much less. Furthermore, sometimes

obtaining data by complete enumeration is not possible as in destructive tests such as

testing the life of electric bulbs and haematological testing [18].

In addition, more comprehensive (and frequent) data can be obtained using

sampling surveys as it is possible to make use of the highly trained and competent

personnel or the specialized equipment that are limited in availability. Hence, sample

surveys offer more scope and flexibility regarding the types of information that can be

collected which are impractical to obtain using complete enumeration [5].

Furthermore, sample surveys can produce more accurate results as opposed to

complete enumeration. And this is because the volume of work in surveys that rely on

sampling is much less. So, it is possible to employ staff of higher quality and more

careful supervision of the processing of the results can be provided [5].

Nevertheless, there are situations where complete enumeration appears to be

essential; for example, when basic information is needed for every unit such as

counting the population for census purposes and a voter’s list [18]. In addition,

sampling may not be useful in case the population is small or the variance in the

variable being measured is high [1].

In practice, post-enumeration sample surveys are usually conducted in order to

evaluate and supplement censuses by assessing the coverage and the errors that will

inevitably take place. Hence, it can be observed that sample surveys are often used in

conjunction with censuses and as a result sampling and complete enumeration are

“complementary and, in general, not competitive” [18].

Many sampling designs are available when conducting surveys. One of the most

frequently used designs is stratified sampling. In this design the population is divided

into separate sub-populations called strata. The main problem that faces researchers

2

when applying this design is to determine the sample size that is to be selected from

each stratum. This is known as the sample allocation problem.

This allocation problem was dealt with by many classical methods such as: equal

share allocation, proportional allocation and optimum allocation. In the optimum

allocation method, the allocation to the different strata is determined by minimizing

the variance of the overall estimator for a given total cost or minimizing the cost for a

given level of precision (measured by the variance of the overall estimator). However,

classical methods sometimes suffer from limitations such as: the inability to optimize

several objectives simultaneously, producing non-integer values for the sample sizes

and in some cases, producing a sample size larger than the corresponding stratum

size. Nonetheless, mathematical programming has many tools that can overcome

these limitations faced by classical methods. Thus, many researchers tried to tackle

this problem using mathematical programming approaches.

Most of the mathematical programming models available in the literature deal with

the allocation problem in the multivariate case. In these models, the allocation is

considered to be optimum if it minimizes the variances of the overall estimators

subject to a fixed cost or if it minimizes the total cost subject to a given level of

precision. However, none of the models concentrate on the minimization of the

variances of the estimators within the strata. In many surveys, it is sometimes the

objective of the study to obtain overall estimators in addition to separate estimators

within the strata. Hence, the precision of both overall estimators and estimators within

the strata should be taken into account when finding the optimal allocation.

In the following section, the main research objectives are introduced and section

1.2 will outline the main contents of the thesis.

1.1 Research Objective:

This study targets developing a goal programming approach that tackles the

allocation problem in multivariate surveys by maximizing the precision of the

overall estimators in addition to the precision of the estimators within each

stratum under a fixed cost. Integer programming is applied to guarantee integer

values for the sample sizes. The performance of the proposed approach is

compared with three of the classical methods of allocation in addition to five

mathematical programming models available in the literature using a simulation

study.

3

1.2 Thesis Outline:

Chapter 1: Presents an introduction to the thesis.

Chapter 2: Presents a review on stratified random sampling, stating the main reasons

for using stratified random sampling in addition to the properties of the estimators and

the main notations that are to be used throughout the study. Moreover, some of the

different classical methods of sample allocation are demonstrated in this chapter.

Chapter 3: Illustrates a review on the previous research that applies mathematical

programming to deal with the allocation problem in stratified random sampling. The

previous literature is divided into models conducted in the univariate case,

multivariate case without taking the correlation between the variables into account

and then the multivariate case while taking the correlation into consideration. Finally,

the chapter will end with a brief review on some of the attempts that take the precision

of the estimators within the strata into account.

Chapter 4: Introduces the suggested goal programming approach discussing the

suggested objectives, the different proposed models and the criteria used for

comparison. Moreover, this chapter demonstrates the design of the simulation study,

the procedures used for data generation and the different software packages used in

conducting the simulation. Finally, the chapter will end with an analysis of the main

results obtained from the simulation.

Chapter 5: Summarizes the main conclusions reached based on the performed

simulation study. In addition, the chapter will show some recommended points for

further research.

4

Chapter 2

Review on Stratified Random Sampling

The present chapter will first consider a review on stratified random sampling

indicating the reasons that may lead to the stratification of a population into distinct

sub-divisions (strata) and the notations that are to be used throughout this study.

Furthermore, the general properties of the estimators used will be dealt with in this

chapter. Finally, this chapter will consider the different types of allocating the total

sample to the different sub-populations and it will illustrate an allocation method

used in case of having more than one important variable.

2.1 Stratified Random Sampling:

There are different sampling designs available when conducting surveys. The

simplest design that is considered to be the basic sampling technique is simple

random sampling. In this sampling design each unit in the population has the same

chance of selection. Simple random sampling forms the basis of most of the other

designs [5], [18].

Another technique of sampling which is the most frequently used is stratified

sampling where the population is divided into suitable sub-populations that are

internally homogeneous but heterogeneous with respect to each other. There are many

reasons for dividing the population into distinct sub-populations: [2], [5], [16], [18]

1- When the variability in the population is very large, the use of stratified

sampling appears to be advantageous. Moreover, if it is required to give a

larger weight to some units that are uncommonly occurring in the population

(such as respondents with very high income) then, stratified sampling is of

significance in this case.

2- Stratified sampling can produce estimates for each stratum of the population

separately, such as estimates for each geographical sub-population.

3- When using stratified sampling there is the benefit of utilizing the flexibility

of using different sampling techniques in the different strata. For example,

simple random sampling or systematic random sampling could be applied in

the different strata.

4- Stratified sampling produces more precise estimates than those produced by

simple random sampling of the same size (especially when the measurements

within the strata are homogenous).

5

5- The cost per observation may be reduced when using stratified sampling (the

cost per observation includes the cost of the interviewer, time and travel)

6- Administrative convenience may command the use of stratified sampling. For

instance, the agency conducting the survey may have field offices, each of

which can supervise the survey for a part of the population.

In stratified sampling, the population consists of units, and it is divided into

non-overlapping sub-populations (called strata) of sizes units.

The values of ( ) are known in advance and when the strata have

been determined, a sample is drawn from each stratum independently and the sample

sizes are denoted by respectively.

Throughout this study, it is going to be taken for granted that the strata have

already been determined, the technique used in the different strata is simple random

sampling, and that sampling is done without replacement. Furthermore, this study will

only be concerned with the estimation of the mean.

Notation and Properties of the Estimators:

Throughout this study, the notation of Cochran (1977) [5] will be adopted, where

the subscript denotes the stratum and denotes the unit within the stratum:

total number of units in stratum ,

total population size ,

number of units in the sample drawn from stratum ,

total sample size ,

value obtained for the th unit in the th stratum ,

stratum weight ,

true population mean in stratum ,

sample mean in stratum ,

∑ ( )

true population variance in stratum ,

6

In stratified sampling, the population mean is denoted by and has the following

formula:

∑ ∑

(2.1)

An unbiased estimator for the population mean is ( stands for stratified),

where,

(2.2)

Since as previously mentioned, sampling is done independently in the different

strata, hence:

( ) ∑ ( )

(2.3)

And provided that simple random sampling is applied in the different strata (which

is the case in our study), thus:

( )

(

)

(

) (2.4)

As a result, the variance of the estimator in stratified random sampling has the

following formula:

( ) ∑

(

)

(2.5)

2.2 Types of Sample Allocation:

In stratified random sampling, the problem of finding the values of the sample

sizes in the respective strata (i.e. allocating the sample) arises. There are several

methods of allocation such as: optimum allocation, Neyman allocation, equal share

allocation, proportional allocation and predetermined allocation. In this section the

different types of allocation are briefly discussed.

1- Optimum Allocation:

The allocation of the sample to the different strata is determined by either

minimizing the variance of the estimator ( ) for a given total cost “ ” or

minimizing the cost for a given level of precision (i.e. ( ) ). The

simplest form of the cost function is:

7

∑ (2. )

where is the cost per sampling unit in the th stratum, is the total budget

available and is the overhead (fixed) cost. There are other forms for the cost

function, however, only the linear form will be considered in this study.

The optimum allocation formula (in terms of the total sample size ) has

the following form:

( √ )

∑ ( √ )

(2. )

Hence, we can conclude from this formula that the sample size in a certain

stratum increases as the size of the stratum increases, as the variability

within the stratum increases and as the cost per unit in the stratum

decreases.

The previous formula is in terms of the sample size which may not be

known in advance. Thus, if the cost is fixed then the optimum values of can

be substituted in the cost function giving the following form:

( )∑ ( √ )

∑ ( √ )

(2. )

On the other hand, if the variance of the estimator is fixed

(say ( ) ) then the optimum values of can be substituted in ( )

giving,

(∑ √ )∑ ( √ )

( ⁄ )∑

(2. )

It should be noted that the values of are unknown. Hence, they are either

obtained from previous studies or estimated from a pilot investigation.

2- Neyman Allocation:

If the cost per unit is assumed to be equal for all the strata

(i.e. ) then, the cost function is reduced to:

(2.1 )

Hence, for a given total cost, the total sample size is of the following form:

( )

(2.11)

8

And the optimum allocation formula becomes [from equation (2.7)]:

∑ (2.12)

This type of allocation is known as “Neyman allocation” [5].

3- Equal Share Allocation:

This type of allocation divides the total sample into equal shares for the

different strata in the population,

(2.13)

Given that the total cost is fixed and takes the linear form (2.6), the total

sample size takes the following form:

∑ (2.14)

4- Proportional Allocation:

Here, the total sample is allocated to the different strata in proportion to the

total number of units in the sub-populations (i.e. is proportional to ),

(2.15)

In this type of allocation we select the same proportion of units from each

stratum.

For a given total cost, the linear cost function (2.6) gives the total sample

size in proportional allocation as follows [18]:

( )

∑ (2.1 )

where

.

If on the other hand, the cost per observation is equal for all the strata,

yielding the cost function (2.10) then the sample size will be given by the

formula (2.11).

5- Predetermined Allocation:

Predetermined allocation divides the total sample size (which could be

determined in a subjective way) among the different strata according to the

researcher’s judgement.

9

2.3 Sample Allocation with More than One Variable:

In all the previously presented types of allocation, it was assumed that there is only

one important variable that we base the allocation upon. However, this is usually not

the case since sample surveys usually include more than one important variable. And

an optimum allocation for one variable will not necessarily be optimum for another

[5]. Many researchers suggested solutions to this problem such as Chatterjee and

Yates (see [5]). However, in this study only one method is to be presented which is

“Cochran’s average”.

Cochran’s Average (i.e. compromised optimal allocation) :

A few of the most important variables are to be chosen to optimally

allocate the sample. Let the subscript denote the variable where (

). As mentioned earlier, equation (2.7) gives the optimum allocation

in terms of the total sample size , and equation (2.8) gives the total sample

size in case of a fixed total cost. Substituting (2.8) in (2.7) we get

( )( √ )

∑( √ ) (2.1 )

which represents the optimum allocation under a fixed total cost.

By applying this formula for each variable separately, we get the optimum

individual strata sample sizes,

( )( √ )

∑( √ ) (2.1 )

where,

value obtained for the th unit in the th stratum for the th

variable ,

true population mean of the th variable in stratum ,

sample mean of the th variable in stratum ,

∑ ( )

true population variance of the th variable in stratum .

10

The individual strata sample sizes given by (2.18) are to be averaged over

all the variables giving an optimum compromise allocation that takes all the

variables into account, i.e.:

(2.1 )

In all the previous methods of allocation, there is no guarantee that the resulting

optimum allocation will be integer. This requires rounding of the values of the sample

sizes in the different strata which could provide a total cost that exceeds the total

budget specified (in case of a fixed total cost), hence providing infeasible solutions.

Moreover, in the previous allocation methods the problem of oversampling can occur

(oversampling happens when the sample size in one or more strata is larger than the

stratum size [6]). As noted by [5], the optimum allocation formula can produce an

in some strata that are larger than the corresponding number of units in the stratum

and this problem has happened in practice on several occasions. This problem arises

only when the overall sampling fraction (i.e.

) is large and the variability in some

strata is greater than the others [5].

Therefore, other alternatives to the classical methods have been applied that are

thought to overcome the previous problems. Hence, there have been many attempts by

researchers to apply mathematical programming in the field of sampling and this is

what the next chapter will discuss.

11

Chapter 3

Review on Mathematical Programming Approaches to Sample Allocation in Stratified Random Sampling

From the previous chapter, it can be seen that classical methods of sample

allocation offer only one objective subject to one constraint when optimally allocating

the sample. This can therefore be viewed as a limitation. Also, as stated before,

classical methods suffer from the problem of producing non-integer sample sizes for

the different strata. This could lead to infeasible solutions [i.e. having a total cost that

exceeds the total specified budget (in case of fixing the cost)] due to rounding.

Moreover, the problem of oversampling can be faced when using the classical

methods of allocation. Hence, the use of mathematical programming appears to be

advantageous as it can overcome these limitations.

Mathematical programming has several advantages over classical methods. First, it

offers the ability to optimize several objectives simultaneously and it has the benefit

of assigning priorities to different objectives. Also, several constraints could be

suggested. Second, mathematical programming can guarantee that the optimal

allocation has integer values by the use of integer programming. Third, it can ensure

that oversampling doesn’t occur. Accordingly, this chapter will illustrate a review on

the different mathematical programming approaches to sample allocation suggested in

the literature.

As previously mentioned, mathematical programming tools offer researchers the

advantage of optimizing more than one objective at the same time. And this is one of

the main benefits that has been utilised by many authors in the field of sampling. In

the coming sections, some of the mathematical programming models that were

suggested in the literature to determine the optimal sampling scheme are presented.

In most cases, we may want to estimate parameters for more than one variable;

therefore those variables should all be taken into consideration as the key variables

when determining the optimal strata sample sizes. Hence, the review will begin with

the models that were developed in the univariate case, the multivariate case without

taking the correlation between the variables into account, and then the multivariate

case where the correlation was taken into consideration. The present chapter will

finally end with a brief review on some attempts that take the precision of the

estimators within the strata into account.

12

Thus, the classification of the literature will be as follows:

All these cases will be presented in the following sections.

3.1 Univariate Case:

This section presents different mathematical programming models dealing with the

allocation problem when only one variable is of interest.

Arthanari and Dodge [2] presented a review on the use of mathematical

programming for optimal allocation of sample sizes in stratified random sampling.

They formulated the problem of obtaining statistical information on population

characteristics based on sample data as an optimization problem.

In the univariate case, the authors considered the problem of having strata where

it was assumed that the samples were drawn independently from different strata. The

problem of choosing optimal ’s is known as the “optimal allocation problem”. In

such a problem ’s are the decision variables and the objective can be the

minimization of the variance of the estimator of the variable under study (in this case

the estimator is ) with the restriction on the fixed total sample size . Hence, the

problem is formulated as:

Minimize ( ) ∑

(

)

, (3.1)

subject to ∑ , (3.2)

, integer, , (3.3)

is the true population variance in stratum and as mentioned before, its value is

either known from prior studies of the same kind or estimated from pilot

investigations.

Mathematical Programming Models

Univariate Multivariate

(no correlation)

Cost as an Objective

Precision as an Objective

Approach A Approach B Approach C

Multivariate (with

correlation)