his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and...

58

Transcript of his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and...

Page 1: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative
Page 2: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative
Page 3: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

Farouk A. Benghezal has been teaching management science, statistics, operations management, and other business courses for more than thirty years. He received his BA and MA in Economics from the University of Algiers. He was awarded an AMIDEAST scholarship to study at Michigan State University, where he completed an MSc in Operations Research and Statistics and a PhD in Management Science. He has previously taught at Michigan State University, the School of Statistics and Planning in Algiers, the University of Algiers, and several universities in the United Arab Emirates (Ajman University of Science and Technology, Abu Dhabi University, and the University of Sharjah). In addition, he has held various positions at research institutions and consultancy firms. He is currently based at The Ameri-can University in the Emirates.

Dr Benghezal has previously authored a textbook, Programmation Linéaire (linear programming), in French. He has published numerous papers on modeling and has co-authored several monographs.

ABOUT THE AUTHOR

Page 4: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

BRIEF CONTENTS

CHAPTER 1: Introduction to Statistics 2

CHAPTER 2: Descriptive Statistics 56

CHAPTER 3: Probability Concepts and Theory 120

CHAPTER 4: Discrete Probability Distributions 162

CHAPTER 5: Continuous Probability Distributions 212

CHAPTER 6: Sampling Distributions 254

CHAPTER 7: Estimation and Confidence Intervals 282

CHAPTER 8: One-Sample Hypothesis Tests 320

CHAPTER 9: Inference from Two Samples 372

CHAPTER 10: Chi-Square Tests 424

CHAPTER 11: Analysis of Variance 462

CHAPTER 12: Simple Linear Regression 518

CHAPTER 13: Nonparametric Tests (Part A) 566

CHAPTER 14: Statistical Quality Control 620

ADDITIONAL CHAPTERS ON CD:

CHAPTER 15: Multiple Linear Regression 2

CHAPTER 16: Nonparametric Tests (Part B) 40

CHAPTER 17: Time Series, Forecasting, and Index Numbers 72

Page 5: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

ix CONTENTS

PREFACE XVACKNOWLEDGMENTS XX

CHAPTER 1 INTRODUCTION TO STATISTICS 2What is Statistics? 4Data Collection 6Concepts in Statistics 7Levels of Data Measurement 10Types of Statistics 11 Descriptive Statistics 11 Inferential Statistics 11Sampling Methods 12 Simple Random Sampling 12 Systematic Sampling 13 Stratified Sampling 14 Cluster Sampling 14Frequency Distribution 15 Qualitative Data 15 Quantitative Data 16Check your Understanding 20Graphic Presentations of a Frequency Distribution 22 Bar Chart 22 Histogram 23

Technology: Template for Histograms 24

Frequency Polygon 26Technology: Template for Frequency Polygons 26

Ogive 27Technology: Template for Ogives 28

Pie Chart 29Technology: Template for Pie Charts 30

Stem and Leaf Display 32Other Graphic Presentations of Data 35 Time Series 35

Technology: Template for Time Series 37 Scatter Plots 37

Technology: Template for Scatter Plots 40 Pareto Chart 41

Technology: Template for Pareto Chart 42Check your Understanding 44Chapter Summary 46Key Terms 46Solved Problems 47 Problem A 47 Problem B 47 Problem C 49Problems 49Miniprojects 54

CHAPTER 2 DESCRIPTIVE STATISTICS 56Measures of Central Tendency 58 Mean 59 Weighted mean 62 Median 65 Midrange 68 Mode 68 Geometric Mean 70 Trimmed Mean 73 Harmonic Mean 74

Technology: Template for Measures of Central Tendency 75

Check your Understanding 75Measures of Dispersion 77 Range 78 Variance 79 Standard Deviation 84 Coefficient of Variation 85 Chebyshev’s Theorem 87

Technology: Template for Measures of Dispersion 88

Check your Understanding 89Measures of Location 90 Z-Score 90 Percentile 92 Quartiles 97

Technology: Template for Percentile Graphs 98

Exploratory Data Analysis 98 Outliers 98 Box Plots 102Measures of Shape 103 Skewness 103 Kurtosis 105

Technology: Template for Measures of Location and Shape 106

Check your Understanding 107Chapter Summary 108Key Terms 109Key Formulas 110Solved Problems 110 Problem A 110 Problem B 111 Problem C 111 Problem D 112Problems 114Miniprojects 119

CHAPTER 3 PROBABILITY CONCEPTS AND THEORY 120The Concept of Probability 122 Classical Approach 124 Empirical Approach 124 Subjective Approach 125Counting Rules 125 The Multiplication Rule 126 The Permutation Rule 127 The Combination Rule 129

Technology: Template for Counting Rules 130

Check your Understanding 131Laws of Probabilities 132 Addition Law of Probability 135 Conditional Law of Probability 138 Relationship among Joint, Conditional and Marginal Probabilities 141

Technology: Template for Conditional Probabilities 147

Check your Understanding 148Posterior Probabilities and Bayes’ Theorem 150

Technology: Template for Bayesian Probabilities 153

Check your Understanding 154Chapter Summary 155Key Terms 155Key Formulas 156Solved Problems 156

CONTENTS

Page 6: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

X CONTENTS

Problem A 156 Problem B 156 Problem C 157Problems 158Miniprojects 161

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS 162Random Variables 164Probability Distribution 165Discrete Probability Distributions 167 Mean, Variance and Standard Deviation of a Probability Distribution 169

Technology: Template for Discrete Random Variables 173

The Binomial Distribution 173 Binomial Probability Tables 176 Mean of the Binomial Distribution 177 Variance of the Binomial Distribution 178

Technology: Template for the Binomial Distribution 178

Check your Understanding 182The Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186

Technology: Template for the Negative Binomial Distribution 187

The Geometric Distribution 188 Mean and Variance of the Geometric Distribution 189

Technology: Template for the Geometric Distribution 190

Check your Understanding 191The Hypergeometric Distribution 192 Mean and Variance of the Hypergeometric Distribution 194

Technology: Template for the Hypergeometric Distribution 196

The Poisson Distribution 197 Poisson Probability Tables 198 Mean and Variance of the Poisson Distribution 199 Poisson Approximation to the Binomial 200

Technology: Template for the Poisson Distribution 201

Check your Understanding 202Chapter Summary 203Key Terms 203Key Formulas 203Solved Problems 205 Problem A 205 Problem B 205 Problem C 206Problems 206Miniprojects 210

CHAPTER 5 CONTINUOUS PROBABILITY DISTRIBUTIONS 212The Uniform Distribution 214 Mean and Variance of the Uniform Distribution 217

Technology: Template for the Uniform Distribution 218

The Exponential Distribution 219 Mean and Variance of the Exponential Distribution 222

Technology: Template for the Exponential Distribution 222

Check your Understanding 223

The Normal Distribution 225 Standard Normal Table 227 Finding Probabilities of the Normal Distribution 229 Finding Values of Z Given Probabilities 232 The Inverse Transformation 234 Approximation of the Binomial Distribution by the Normal Distribution 237

Technology: Template for the Normal Distribution 241Technology: Template for the Normal Approximation to Binomial Distributions 243

Check your Understanding 244Chapter Summary 246Key Terms 246Key Formulas 246Solved Problems 247 Problem A 247 Problem B 247 Problem C 248 Problem D 248Problems 248Miniprojects 252

CHAPTER 6 SAMPLING DISTRIBUTIONS 254Sampling 256 Population Parameters and Sample Statistics 256 Reasons for Sampling 257 Random Sampling 258Sampling Distribution of the Mean 259The Central Limit Theorem 262

Technology: Template for the Sampling Distribution of the Mean 266

Check your Understanding 266Sampling Distribution of the Sample Proportion 268

Technology: Template for the Sampling Distribution of the Proportion 272

The Correction Factor 272Technology: Template for Finite Correction Factor 275

Check your Understanding 276Chapter Summary 276Key Terms 277Key Formulas 277Solved Problems 278 Problem A 278 Problem B 278Problems 279Miniprojects 281

CHAPTER 7 ESTIMATION AND CONFIDENCE INTERVALS 282Estimation 284Confidence Interval for the Population Mean 286 Confidence Interval for the Population Mean when � is Known 287 Finite Correction Factor 289

Technology: Template for Confidence Intervals for Means with � known 290

Confidence Interval for the Mean when � is Unknown 291

Technology: Template for Confidence Intervals for Means with � unknown 297

Check your Understanding 298Confidence Interval for a Proportion 299

Page 7: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

xi CONTENTS

Technology: Confidence Intervals for Proportions 301

Confidence Interval for the Variance 302 Using the Chi-Square Table 303 Confidence Intervals with the Chi-Square Distribution 304

Technology: Confidence Intervals for Variances 306

Check your Understanding 306Estimation of the Sample Size 307 Sample Size for Estimating μ when � is Known 307 Sample Size for Estimating μ when � is Unknown 308 Sample Size when Estimating the Population Proportion 310

Technology: Template for Sample Size Determination 311

Check your Understanding 311Chapter Summary 312Key Terms 313Key Formulas 313Solved Problems 314 Problem A 314 Problem B 314 Problem C 314Problems 315Miniprojects 318

CHAPTER 8 ONE-SAMPLE HYPOTHESIS TESTS 320Hypothesis Testing: a Preview 322Hypothesis Testing Procedure 324Types of Hypothesis Tests 328 One-Tailed Test 329 Two-Tailed Test 330Test for a Population Mean with Known Variance 331 The Critical Value Approach 332 The p-Value Approach 335

Technology: Template for Hypothesis Test on the Mean with Known Variance 338

Test for a Population Mean with Unknown Variance 339

Technology: Template for Hypothesis Test on the Mean with Unknown Variance 343

Check your Understanding 343Test for a Population Proportion 345

Technology: Template for Hypothesis Tests for Proportions 347

Check your Understanding 348Test for a Population Variance 349

Technology: Template for Hypothesis Tests for Variances 352

Check your Understanding 353Confidence Interval versus Hypothesis Test 354Test of Type II Errors 355

Technology: Template for Beta and Power 361

Check your Understanding 362Chapter Summary 362Key Terms 363Key Formulas 363Solved Problems 364 Problem A 364

Problem B 364 Problem C 365Problems 366Miniprojects 370

CHAPTER 9 INFERENCE FROM TWO SAMPLES 372One–sample versus Two-sample Test 374Testing the Difference between Two Means 375 Testing the Difference between Two Means for Large and Independent Samples with Known Variances 376 Testing the Difference between Two Means for Large and Independent Samples with Unknown Variances 378 Testing the Difference between Two Means for Small and Independent Samples with Unknown and Unequal Variances 379 Testing the Difference between Two Means for Paired Samples 382 Confidence Intervals for the Difference of Two Means 385 Confidence Intervals for the Difference between Two Means for Paired Samples 386

Technology: Templates for Testing the Difference between Two Means 387

Check your Understanding 389Testing the Difference between Two Proportions 391 Confidence Intervals for the Difference of Two Proportions 396

Technology: Template for Testing the Difference between Two Proportions 397

Check your Understanding 398Testing the Difference between Two Variances 399 Use of F-Tables 400 The F-Test for Two Population Variances 401

Technology: Template for Testing the Difference between Two Variances 406

Check your Understanding 407Testing the Difference between Two Means for Small and Independent Samples when the Variances are Unknown and Equal 407 Confidence Intervals for Means with Equal Variances 410

Technology: Template for Testing the Difference between Two Means for Small Samples and Equal Variances 411

Check your Understanding 411Chapter Summary 412 Key Terms 413Key Formulas 413Solved Problems 415 Problem A 415 Problem B 416 Problem C 416Problems 418Miniprojects 423

CHAPTER 10 CHI-SQUARE TESTS 424Test for Goodness of Fit 426 Application to a Uniform Distribution 427 Application to a Multinomial Distribution 429 Application to a Normal Distribution 431

Page 8: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CONTENTSxii

Application to a Poisson Distribution 433 Technology: Templates for Goodness-of-Fit Test 436Check your Understanding 438Contingency Analysis: a Chi-Square Test for Independence 440

Technology: Template for Contingency Analysis: a Chi-Square Test for Independence 444

Contingency Analysis: a Test for Homogeneity of Proportions 447

Technology: Template for Contingency Analysis: a Test for Homogeneity of Proportions 449

Check your Understanding 451Chapter Summary 453Key Terms 453Key Formulas 453Solved Problems 454 Problem A 454 Problem B 455 Problem C 456Problems 457Miniprojects 460

CHAPTER 11 ANALYSIS OF VARIANCE 462One-Way Analysis of Variance 464

Technology: Template for One-Way ANOVA 471

Multiple Comparison Tests 471Test of Homogeneity of Variances 475

Technology: Template for One-Way ANOVA with Tukey-Kramer Criterion 476

Check your Understanding 479Randomized Complete Block ANOVA 482

Technology: Template for Randomized Complete Block ANOVA 489

Two-Way ANOVA with Replication 489 A Word about Interaction 496

Technology: Template for Two-Way ANOVA 498

Check your Understanding 499Chapter Summary 501Key Terms 502Key Formulas 502Solved Problems 503 Problem A 503 Problem B 504Problems 505Miniprojects 516

CHAPTER 12 SIMPLE LINEAR REGRESSION 518Linear Regression: a Preview 520Simple Linear Regression 521 Scatter Diagram 523 Least-squares Line 524

Technology: Template for Simple Linear Regression 527

Check your Understanding 529The Standard Error 531The Coefficient of Determination 533The Coefficient of Correlation 535Inference about the Regression Relationship 537 Tests of Hypotheses 537 Confidence Intervals 541

Analysis of Variance and the F-test of the Regression Model 541Check your Understanding 543Prediction of Y Using the Regression Model 545Analysis of Residuals 546 Normality Assumption 547 Constant Variance Assumption 547 Independence Assumption 548

Technology: Template for Linear Regression Model 551

Check your Understanding 552Chapter Summary 553Key Terms 553Key Formulas 553Solved Problems 555 Problem A 555 Problem B 557Problems 558Miniprojects 565

CHAPTER 13 NONPARAMETRIC TESTS (PART A) 566Nonparametric Tests 568The Sign Test 569 Tests on Categorical Data 569 Tests on the Median 574

Technology: Templates for the Sign Test 577

The Runs Test 579 Small Samples 579 Large Samples 582

Technology: Template for the Runs Test 584

Check your Understanding 585The Wilcoxon Signed-Rank Test for Paired Data 586 Small Samples 588 Large Samples 592

Technology: Template for the Wilcoxon Signed-Rank Test 594

The Mann–Whitney U-Test for Independent Samples 596 Small Samples 597 Large Samples 602

Technology: Template for the Mann–Whitney U-Test 605

Check your Understanding 606Chapter Summary 607Key Terms 607Key Formulas 608Solved Problems 609 Problem A 609 Problem B 609 Problem C 610Problems 612Miniprojects 618

CHAPTER 14 STATISTICAL QUALITY CONTROL 620A Brief History of Modern Quality Management 622Tools of Total Quality Management 623 Process Map 624 Check Sheets 624 Histograms 625 Scatter Diagrams 625 Pareto Analysis 626

Page 9: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

xiii CONTENTS

Cause-and-Effect Diagrams 626 Control Charts 627Check your Understanding 628Statistical Process Control 629 Causes of Variation 629Statistical Process Control Charts 630 Statistical Process Control Charts for Variables 632

Technology: Template for X and R Charts 637Technology: Template for X and MR Charts 641Technology: Template for X and S Charts 644

Check your Understanding 645 Statistical Process Control Charts for Attributes 647

Technology: Template for the p Chart 649Technology: Template for the c Chart 652

Check your Understanding 652Chapter Summary 654Key Terms 655Key Formulas 655Solved Problems 657 Problem A 657 Problem B 658 Problem C 659Problems 660Miniprojects 666

CHAPTER 15 MULTIPLE LINEAR REGRESSION 2Multiple Linear Regression Model 4The F-Test for Overall Significance 7The Coefficient of Determination 9Significance Tests for Regression Parameters 10Confidence Intervals for Regression Coefficients 12

Technology: Multiple Linear Regression with Excel 12Technology: Multiple Linear Regression with Minitab 14

Check your Understanding 15Prediction using the Multiple Regression Model 17Binary Independent Variables 18Multicollinearity 23Model Building 27 Stepwise Regression 27 Forward Selection 29 Backward Elimination 30Check your Understanding 30Chapter Summary 31Key Terms 32Key Formulas 32Solved Problems 33 Problem A 33 Problem B 34Problems 35Miniprojects 38

CHAPTER 16 NONPARAMETRIC TESTS (PART B) 40The Kruskal–Wallis Test 42

Technology: Template for the Kruskal–Wallis Test 46

Check your Understanding 48The Friedman Test 50

Technology: Template for the Friedman Test 54

Check your Understanding 55The Spearman Rank Correlation Test 56

Technology: Template for the Spearman Rank Correlation Test 60

Check your Understanding 61Chapter Summary 62Key Terms 62Key Formulas 63Solved Problems 63 Problem A 63 Problem B 64 Problem C 66Problems 67Miniprojects 71

CHAPTER 17 TIME SERIES, FORECASTING, AND INDEX NUMBERS 72Forecasting 74Qualitative Forecasting Methods 75 Group Averaging 75 Group Consensus 76 Historical Analogy 76 Delphi Method 76Time Series Forecasting Methods 77 Time Series Forecasting Based on Averages 78

Technology: Template for Simple Moving Average 88Technology: Template for Weighted Moving Average 89Technology: Templates for Single Exponential Smoothing 89

Check your Understanding 91 Time Series Forecasting Based on Trend 93

Technology: Template for Simple Linear Regression 97

Exponential Trend Model 97Quadratic Trend Model 101

Technology: Templates for Double Exponential Smoothing 105Technology: Templates for Ratio–to–Moving–Average Model with Seasonality 105

Check your Understanding 107 Time Series Forecasting Based on Seasonal Patterns 109

Technology: Templates for Linear Trend Model with Seasonality 121

Check your Understanding 124Causal Models 126Controlling the Forecast 129 Tracking Signal 129 Control Chart 130

Technology: Template for Tracking Signal and Control Chart 130

Check your Understanding 133Index Numbers 133 Unweighted Aggregate Price Index 134 Weighted Aggregate Price Index 136 Laspeyres Price Index 136 Paasche Price Index 137 Fisher’s Ideal Price Index 140

Technology: Template for Index Numbers 140

Page 10: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

XIV CONTENTS

Check your Understanding 141Chapter Summary 142Key Terms 143Key Formulas 144Solved Problems 146 Problem A 146 Problem B 147 Problem C 149

Problems 150Miniprojects 156

ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS 668LIST OF APPENDIX TABLES A2BIBLIOGRAPHY B2

Page 11: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

Statistics is playing an increasingly vital role in practically all professions, and some familiarity with this subject is now an essential element of any college education. Most colleges include in their curricula a semester of study of statistics. The need to accommodate an increasing list of academic requirements often necessitates that the coverage be succinct. Keeping these condi-tions in mind, this book provides students with a first exposure to the main ideas of modern Statistics.

This book introduces and develops statistical methods in business contexts, and is intended for the needs of those working or studying in the area of business. It presents the general principles, but also supports them with many worked examples. Most of the examples have been designed to demonstrate the relevance of statistics to making decisions in a day-to-day business environment. In each chapter, application-oriented problems are provided to test the student’s ability to use the tools learned to solve typical problems. The applications cover different areas of business, accounting, economics, finance, management, marketing, opera-tions, and more.

Examples and problems are drawn from a wide range of applications from all facets of life and specifically from business. Many examples and problems are based on real data from the Arab world. Sources of real data are mentioned.

Managers make decisions every day. Some are routine and involve little thought, but many are more complex and depend on numbers to suggest and justify subsequent courses of action. Good data include information, and when carefully interpreted, increase knowledge. Statistical methods coupled with sound organizational practice can be a key to good man-agement. Throughout the text the discussion of statistical methods is emphasized along with practical business applications to help students see the significance of statistics to their daily lives.

At the end of the book, an English–Arabic glossary gives brief explanations of the key terms. Each key term is translated into Arabic.

Statistics for Business is written with the Arab student in mind. Names of persons, compa-nies, and places refer to the student’s environment. The book is clearly written, with a vocabu-lary that is readable and understandable for students whose mother tongue is not English. However, the book is kept to scientific standards. The text emphasizes the clarity and conci-sion of the presentation. The approach is user-friendly and easy to understand. There are no formal proofs in the text. This book avoids using formal mathematical derivations and proofs and instead keeps the mathematics at a minimum to motivate the reader.

All students have either personal computers or access to computing facilities in a cam-pus lab. Statistics for Business problem solving is spreadsheet-oriented wherever possible. Use of spreadsheets has become a primary medium of instruction in Statistics. Microsoft Office is the standard and Excel, in particular, is ideal for manipulating quantitative data. This book aims to provide students with the skills to use Excel as a spreadsheet tool, as this is likely to be the prevalent software they will employ in the workplace. Excel templates are presented in special ‘Technology’ sections within each chapter. Screen captures are used to help the stu-dent become familiar with the nature of the software output.

Statistics for Business is written for one- or two-semester courses in statistics. The text is intended for students who do not have a background in mathematics. The only prerequi-site is knowledge of elementary algebra.

PREFACE

Page 12: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

KEY FEATURESxvi

KEY FEATURES

This book places an emphasis on problem solv-ing. I believe people learn best when provided with motivation and structure. In order to facil-itate the learning process, several pedagogical features are incorporated:

• Examples throughout the text are pre-sented with a step-by-step approach to enable students to follow techniques easily and then solve other problems. More than one example is provided for each new con-cept. More than 240 examples are presented and solved in 17 chapters. They cover all aspects of the concepts introduced. Each example is followed by a clear and con-cise solution that develops the step-by-step methodology

• Throughout the text, important formulas and main results are highlighted, signaling to readers that this material is particularly relevant for their understanding.

• Learning objectives are presented at the beginning of every chapter, providing students with clear goals and direction, whilst also pro-viding a list of the key topics that are covered.

• Each chapter begins with an opening case, which describes an interesting and relevant real-world application to the material covered in the chapter. Four of these openers include real data from the Arab world.

CHAPTER 2DESCRIPTIVE STATISTICS

10MEASURES OF SHAPE

CHECK YOUR UNDERSTANDING

2.29 Consider the following score set 48, 45, 10, 26, 33, 40, and 47. What value corre-sponds to the 60th percentile?

2.30 A final exam for a history course has a mean of 81 and a standard deviation of 3. Find the corresponding z-score for each raw data value.

a) 83 d) 73b) 76 e) 79c) 90

2.31 Dr. Souad gives a 15-point test to 10 students. The scores are

13, 7, 10, 1, 2, 3, 14, 6, 5, 8

a) Find the percentile rank of the grade 7.

b) Find the percentile rank for the score of 10.

2.32 Assume that the data shown below represent the number of hours 12 part-time employees at Muskham Shopping Center (Ramallah, Palestine) worked during the week before and after Eid:

Before 27, 32, 29, 35, 13, 9, 28, 27, 21, 32, 15, 21

After 13, 15, 19, 9, 23, 21, 29, 11, 15, 12, 9, 15

Construct a box plot for each set of data and compare the distributions.

2.33 The total number of goals scored in one Football World Cup is as follows.

Number of goals 0 1 2 3 4 5 7

Frequency 4 12 12 18 11 6 1

Compute

a) the first quartile,b) the third quartile.

2.34 Construct a box plot for the number of TV sets sold during a randomly selected week at Carrefour: 6, 10, 21, 3,

7, 13, 1. Comment on the shape of the distribution.

2.35 Golden Pizza claims your pizza is free if the delivery time is more than 30 minutes. An investigator monitored 20 consecutive deliveries. The deliv-ery times in minutes are reported below.

15 21 18 26 24 42 9 27 17 28

30 18 25 33 19 24 22 17 26 27

a) Construct a box plot for the time it takes to deliver a pizza.

b) Does the distribution show any out-liers?

2.36 The accumulated quantity of gold by December 31, 2010 (in tons) at the following Arab countries’ central banks is displayed in the following table.

Saudi Arabia 322.9

Lebanon 286.8

Algeria 173.6

Libya 143.8

Kuwait 79.0

Egypt 75.6

Syria 25.8

Morocco 22.0

Jordan 12.8

Qatar 12.4

Tunisia 6.8

Bahrain 4.7

Yemen 1.6

Mauritania 0.4

www.gold.org/government_affairs/gold_reserves

a) Construct a box plot.b) Are there any outliers?

REVIEW PROBLEMS

FIGURE 2.7

Kurtosis Types

Mesokurtic Leptokurtic

Platykurtic

TECHNOLOGY TEMPLATE 2.4

TEMPLATE FOR MEASURES OF LOCATION AND SHAPE

If we want to obtain the data value that corresponds to a given percen-tile, we enter the percentile in cell C21, C22, or C23 and the result appears in the adjacent cell D21, D22, or D23. If we want to obtain the percentile for a given data value, we enter the data value in cell F21, F22, or F23 and the result appears in the adjacent cell G21, G22, or G23. The quartiles and IQR are com-puted in cells D26:D29. The coefficients of skewness and kurtosis are computed respectively in cells D33 and D34.

CHAPTER 10CHI-SQUARE TESTS

28

where n is the sample size and � is the probability that an element belongs to that category if the null hypothesis is true.

In a goodness-of-fit test,the number of degrees of freedom is

df = k - 1

where k is the number of categories (or outcomes) for the experiment. In our case, k � 4, the number of entrances.

Next, we present the procedure for performing a goodness-of-fit test, which involves the same five steps that were used in the previous chapters. The chi-square statistic is computed as follows:

�2 = ak

i =1 (Oi - Ei)2

Ei 10.1

We just saw an application of thedistribution. In the following sections, wto multinomial, normal, and Poisson di

APPLICATION TO A MULThe example presented in the previouof-fit test when the expected distributto compare sample frequencies withdistribution is defined by k probabilple, the theoretical probabilities of

.The follow

where

Oi � observed frequency for category i,Ei � expected frequency for category i, which is given by n�i where �i is the theoretical probability of an element being in category i.

The chi-square test for goodness of fit is always a right-tailed test because the Oi � Ei values are squared.

To perform a goodness-of-fit test, the sample size should be large enough so that the expected frequency for each category is at least 5. This is known as Cochran’s Rule.

TABLE 10.2

Computations for goodness-of-fit test

Entrance A B C

Observed frequency Oi 96 108 8

Expected frequency Ei 100 100 10

(Oi � Ei)2 16 64 25

(Oi � Ei)2/Ei 0.16 0.64 2.

STEP 4 Decision rule. A 5% significaright tail of the chi-square ddegrees of freedom is

df = k -

From the chi-square table (Adf � 3 and an area of 0.05 isgreater than the critical valu

STEP 5 Make a decision. Since the ccritical value 7.815, we do nthe number of people enter uniformly distributed.

EXAMPLE 10.1Recall the above example of door entrances to a mosque.

At the 5% significance level, can we reject the null hypothesis that the distribu-tion of people entering the mosque is uniform across the four entrances?

Solution

STEP 1 State the hypotheses. Because the number of people entering the mosque across the four entrances is supposed to be the same (i.e. uni-formly distributed), we can write the null and alternative hypotheses asH0: distribution of people is uniform across the four entrances;H1: distribution of people is not uniform across the four entrances.

STEP 2 Select �, the level of significance: � = 0.05.STEP 3 Select the test statistic. We use Formula 10.1 to compute the test statistic:

�2 = ak

i=1 (Oi - Ei)2

Ei

The observed frequencies are given in Table 10.1. The expected frequency for each category is 100. Table 10.2 shows the computations necessary to obtain �2.

• Check Your Understanding problems are found at the end of each major section of the chapter, giving students the chance to practise what they have learnt and ensure understanding before moving on to the next topic.

• For each chapter, a set of Excel spread-sheets for the Technology sections can be used to solve problems. The Excel tem-plates are available on the CD that accompa-nies the text. Although templates are useful and efficient aids to solving problems, illus-trations of manual calculations have been retained so that students can calculate man-ually any result found in the templates.

WHY USE STATISTICSIn Tunis, citizens living downtowto find parking spaces for their businesses taking up the parking sthat are rented to employees duHowever, these garages are not enare insufficient for residents and p

The municipality had develothe erection of a few high-rise offifor their employees). The result oin the number of people comingmunicipality decided to install pap.m. with a maximum stay of twoadditional resources for its budgetcame downtown for business, but

A petition of downtown isuggesting that they provide parowner would pay a monthly subsday in designated spaces for resthe mayor decided to conduct downtown was close to 3,700. TheThe proposal was to allow car oduring the day in designated spacthe percentage of residents favorinthat he would accept the proposathe sample favoring the proposal.

Using sampling distributionthe likelihood of this happeningdistribution methods that would e7.21% chance that he will accept t

1. Review sampling methods.

2. Distinguish between population parameters

and sample statistics.

3. Explain the central limit theorem.

4. Use the sampling distributions of X and pn .

LEARNING OBJECTIVES

In this chapter, we focus on the properties of two important sampling dis-

tributions: the sampling distribution of the sample mean and the sampling

distribution of the sample proportion. We also present the central limit theorem,

which plays a crucial role in statistical analysis.

SAMPLING DISTRIBUTIONS

C H A P T E R S I X

Page 13: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

xvii KEY FEATURES

MINIPROJECTS

MINIPROJECT 1.1Consider data file “DJIA 2000–2006”. In this mini-project, you are asked to compute the change in percentage between two consecutive values of

the closing price.

a) Construct a frequency distribution of the percentage change of the closing price.

b) Draw a histogram.c) Draw a frequency polygon.

MINIPROJECT 1.2For this miniproject, you are asked to collect a data set of interest to you that you will use for this chapter and the following chapters. Your data set should contain at least one qualitative variable and at least one quantita-tive variable. The data should contain between 50 and 100 observations. Examples of data sets could be cars. Quantitative variables could be price, mileage, age of a car, etc. Qualitative variables could include the model, type (compact, SUV, minivan, etc.). Another exam-ple could be demographics data where quantitative variables may consist of income, family size, birth and death rates, etc. The qualitative variables may include the regions, cities, gender, etc. Other examples from sports or other areas may be used for this miniproject.

Write a short report that answers the following questions:

a) Describe the variables that you collected infor

h) Build a histogram, a pie chart, and a stem and leaf.

MINIPROJECT 1.3Below are 50 names of students with their major taking a General Statistics class in Ibn Khaldun Business School. See data file “Ibn Khaldun”.Select a sample of nine students from this

alphabetical list by using

a) a simple random sample,b) systematic sampling,c) a cluster sample.

Try to ensure that every student has an equal chance of being selected. Which sampling method seems most appropriate?

MINIPROJECT 1.4Consider the data file Qatar-Labor-Force-Sample-Survey-2009 . Answer the following questions.

a) Construct a pie chart for the labor force “economically active” using Table 1.

b) Construct a relative frequency distribution of the educational status using Table 8.

c) Construct a cumulative frequency distribution of Qatari females according to their occupation using Table 19.

d) Construct a Pareto chart of non-Qatari females ac-

• At the end of the chapter, a chapter summary reviews the main topics covered, a key terms list gives all bolded terms in the chapter, key formulas are listed to make them easy for a reader to locate, and Solved Problems pro-vide a comprehensive review of the concepts and procedures for tackling a problems.

• Chapter Review Problems at the end of the chapter contain problems that are more involved and cover all sections. Experience has shown that when students are asked to solve the problems at the end of each section, they tend to have less difficulty because they already know which formulas are directly involved in the solution.

• At the end of each chapter, Miniprojects provide supplementary illustrations of the applications of statistics. Their purpose is to give students the opportunity to carry out research projects in statistics by using appro-priate statistical procedures. There are 76 real cases (54 from the Arab world) out of 97 Miniprojects.

CHAPTER 5CONTINUOUS PROBABILITY DISTRIBUTIONS

6

CHAPTER SUMMARY

• In this chapter, we have discussed contin-uous probability distributions described by continuous probability curves. In this case, a probability is represented by an area under the probability curve. We stud-ied three continuous probability distribu-tions: uniform, exponential, and normal.

• For the uniform distribution, the height of the curve is the same over an interval defined by two parameters a and b, the lower and upper limits; the mean of this distribution is the average of its param-eters (Formula 5.1).

• The exponential distribution describes waiting times between occurrences of two events; its parameter is �, the mean number of occurrences per unit of time (Formula 5.4). The exponential distribu-tion and the Poisson distribution are re-lated: the parameter � is the mean of the

Poisson distribution and 1/� the mean of the exponential distribution; both of these distributions are widely used in analysis of waiting times.

• The normal distribution, defined by two parameters , its mean, and � , its standard deviation, has a bell-shaped curve (Formula 5.9). Because different values of and � give different normal distributions, we apply the transformation Z = (X – )/� to get a standard normal distribution with = 0 and � = 1.

A table is available for the standard nor-mal distribution. The normal distribution can be used to approximate a binomial distribution when n� and n(1 – �) are both at least 5. Because the binomial dis-tribution is a discrete distribution, we use a correction for continuity by adding or sub-tracting 0.5 to the X value being analyzed.

KEY TERMS

Approximation to the binomial distribution 237

Density function 215

Distribution function 215

Exponential distribution 219

Normal distribution 225

Standard normal distribution 228

Uniform distribution 214

KEY FORMULAS

Uniform Distribution

f(X) = 1/(b - a) for a … X … b for f(X) = 0 otherwise 5.1

Mean = (a + b)/2 5.2

Variance �2 = (b - a)2/12 5.3

Exponential Distribution

f(X) = �e-�x for X 7 0 5.4

E(X) = 1/� 5.7

V(X) = 1/�2 5.8

SOLVED PROBLEMS

PROBLEM AA taxi driver claims that the minimum time it takes him to reach downtown Ama) Assuming a uniform distribution, what is the upper limit if the probabilit

driver more than 15 minutes to reach downtown is 0.40?b) What is the probability density function?

SOLUTIONa) We use Goal Seek.

Set cell B3 to 7 and cell A9 to 15. In ‘Set cell’ enter D9, in ‘To value’ enter 0.4, and in ‘By changing cell’ enteb = 20.33 minutes.

b) f(X) = 1/(20.33 - 7) = 0.075 for X between 7 and 20.33.

PROBLEM BAssume that it takes an exponential time with a mean of 2.5 minutes to answthe call center of Garyounis University in Benghazi in Libya.a) What is the probability density function of X?b) What is the probability that the length of a phone call is no more than 4 c) What is the probability that the length of a phone call is at least 3 minuted) What is the probability that the length of a phone call is between 2 and 5e) What is the probability that the length of a phone call is at most 45 secon

SOLUTION� = 1/2.5 = 0.4.

a) f(X) = 0.4e-0.4X for X 7 0.b) P(X … 4) = 0.7981.c) P(X Ú 3) = 0.3012.d) P(2 … X … 5) = 0.314.e) P(X … 0.75) = 0.2592.

Normal Distribution

f(X) = e-

1

2 aX - �b2

�22�

for - 6 X 6 +

Mean = Variance = �2

Standard normal variable: Z = X -

Normal Approximation to the Binomial Distribution

= n�

�2 = n�(1 - �)

CHAPTER REVIEW PROBLEMS

2.37 Prima Sport was recently opened in Lattakia to provide equipment and supplies to players and teams. During the six months of operations, the manager kept track of the number of purchases made by customers each day. Assume she selected a random sample of 30 days. Below is the number of invoices issued for the sample.

14 14 19 19 21 23 24 23 22 25 24 27 28 29 27

25 24 27 30 33 30 28 29 31 38 31 28 24 29 23

Compute the mean, median, mode, mid range, 10% trimmed mean, and the average growth rate for these data and the standard deviation.

2.38 A Bank conducted a study of its customers to de-termine the total credit card debt. Suppose that a sample of 40 customers shows the following results in dollars.

1,245 3,723 5,216 968 2,218 0 647 1,941

2,261 1,845 3,167 0 3,824 3,316 5,830 1,792

2,010 4,390 3,978 1,549 4,056 947 0 4,210

420 2,045 4,174 3,156 0 2,843 339 759

3,621 1,268 0 0 1,642 867 0 1,324

a) Compute the mean, median, mode, and midrange for these data

2.41 File Syria Stock Exchange represents data on the volume traded between January 3, 2011 and March 22, 2011

Source: www.dse.sy.

a) What is the daily average of volume traded?b) What is the standard deviation of volume traded?c) Develop a box plot.d) Are there any outliers?

2.42 Consider Problem 2.2 on p. 81. Compute the mean and the standard deviation. What can you say about the shape of this distribution?

2.43 Consider the aircraft movement in Manama air-port (Problem 1.52 on p. 55). Compute the mean, median, mode and range.

2.44 Consider the case of expatriate students in Riyadh schools (Problem 1.55 on p. 56). Compute the mean and the standard deviation.

2.45 The following table contains the annual returns (in percent) of two stocks.

Stock A Stock B Stock A Stock B

21.5445 –3.3014 2.6765 2.9415

16.0590 –0.3612 –4.5580 19.1065

15.2640 5.8565 10.7590 13.5680

2.9415 13.2330 –2.8090 2.6765

Page 14: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

OTHER FEATURES

Instructors can access a variety of print, digital, and presentation resources available with this text in downloadable format at the Instructors Resource Center, accessible via the link: www.pearsoned.co.uk/awe/benghezal. Registration is simple and gives you immediate access to new titles and new editions. As a registered faculty member, you can download resource files and receive immediate access to and instructions for installing course man-agement content on your campus server.

The following supplements are available for download to instructors adopting this textbook:

• Solutions Manual• TestGen (test-generating program)• PowerPoint slides

While the book is comprehensive, its organization allows instructors to choose topics and depth of coverage as desired. The text contains more material than one could normally hope to cover in a one-semester course. Topics presented near the end of each chapter can be considered optional and hence be skipped without loss of continuity.

One of the primary objectives in writing this book is to provide you, the reader, with a book that enhances your learning experience in quantitative business analysis. However, the degree of success you achieve in your quantitative business analysis studies will depend in large measure on the effectiveness of your learning habits. The better you can explain the “how” and “why” of key concepts the more thorough will be your understanding.

Several other features make this book a valuable resource. Answers to odd-numbered problems are given at the end of the book. Extensive solutions are provided for some problems, so that students may review the tools and procedures used to solve these problems. A set of statistical tables is also provided in the appendix. A comprehensive index at the end of the book enables readers to use the book as a reference for their continued studies, while a bibliography provides a current selection of more advanced books and interesting articles.

CD

INSTRUCTOR SUPPLEMENTS

An accompanying CD contains Excel templates as wel as Excel data sets for problems and Miniprojects indicated by the CD icon. This can save the student from having to enter data by hand, which takes up valuable time and increases the chances of error. The CD also contains the chapter slides, and the additional material, Chapters 15 through 17, are available on the CD.

Page 15: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

I am grateful to my colleagues at the University of Algiers, Ajman University of Science and Technology, Abu Dhabi University and the University of Sharjah for their advice and help. I should not forget all those former students who had a hard time with Statistics and other quantitative topics. I am sure they will recognize themselves because I used their names in this book. I want to thank Dr. Maher who kindly agreed to translate glossary terms into Arabic.

The Pearson staff deserves my gratitude for their professionalism, guidance, suggestions and encouragement throughout what has been a hard, but rewarding, task. Special thanks to Rasheed Roussan, the Pearson Acquisitions Editor, who contacted me to participate in the Arab World Publishing Program. I extend my most heartfelt thanks to Sophie Bulbrook, my development editor who has done a fantastic job. I would also like to thank Kate Sherington, Project Editor, and Fay Gibbons, Editor.

I wish to acknowledge the continuing support, understanding, patience, and encourage-ment that I receive so generously from all the members of my family. My sons, Amin, Sami, and Rochdi, provided the necessary inspiration to undertake and complete this project.

I would like to thank the following reviewers for their thoughtful comments and suggestions:

Dr. Idries Al-Jarrah, University of Jordan, Jordan

Fadi Awawdeh, Hashemite University, Jordan

Edgard A. Rizk, Lebanese German University, Lebanon

Professor Fathi M. Allan, United Arab Emirates University, UAE

Professor Medhat Hassanein, American University in Cario, Egypt

Dr. George Fahmy Rezk, Arab Academy for Science & Technology, Egypt

Prof. Dr. Akram M. Chaudhry, University of Bahrain, Bahrain

Dr. Kastoori Srinivas, Osmania University, India

Farouk Benghezal

ACKNOWLEDGEMENTS

Page 16: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

1. Define estimators and describe their

properties.

2. Determine confidence intervals for a mean.

3. Determine confidence intervals for a

proportion.

4. Determine confidence intervals for a

variance and a standard deviation.

5. Compute the minimum sample size for

estimating a parameter given a confidence

level.

LEARNING OBJECTIVES

In this chapter, we focus on the properties of estimators. We will use the

estimates obtained from sampling to determine confidence interval estimates

for parameters.

ESTIMATION AND CONFIDENCE INTERVALS

C H A P T E R S E V E N

Page 17: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

283 CHAPTER 7 ESTIMATION AND

CONFIDENCE INTERVALS

WHY USE STATISTICS?Almarai is the largest integrated dairy foods company in the Arab world. It produces a wide range of dairy products and juice. Almarai Laban, one of their dairy products, is available in four different sizes: 2 liters, 1 liter, 500 ml, and 200 ml. As part of quality assurance, the company conducts tests every day on Laban to make sure that it meets the product standards. The nutrition standards per 100 ml serving are as shown in the following panel:

How could Almarai be sure that its laban meets these standards? One way would be to take a sample of 100 ml of Laban at different times every day. Suppose that a quality engineer, Saud, measures the content of calcium in the sample. A sample of 14 measurements yields the

following: 99.7, 100.2, 100.3, 99.6, 100, 101.4, 99.5, 100.2, 99.7, 98.9, 100.3, 100.1, 99.6, and 100.1 mg of calcium. In this example, eight values are above the standard requirement (100 mg) and 6 values are below it. Is Saud going to reject the day’s production because there are 6 values below the industry standards?

There are tolerance limits within which products are accepted by the Dairy Association: Laban’s calcium content must be between 99 and 101 mg. One observation is below the lower limit of 99 and one is above the upper limit. Does this mean that Almarai has failed to meet the industry standards? How is the quality engineer going to use these data to check whether today’s Laban production meets the industry standards? If the standards are not met, Saud can conclude that the process of producing Laban is out of adjustment.

To make a decision about accepting the day’s production, Saud can com-pute the following values: the sample mean of 99.97 mg and the standard de-viation of 0.568 mg. Using these two values he is able to conclude that the day’s production of Laban meets the industry standards. How did he do that when the mean 99.97 is below the standard? In this chapter, we will learn that the statistics 99.97 and 0.568 are defined as point estimates. Saud computed the following in-terval: [9.64, 10.30], which he compared to the industry tolerance limits (99 and 101 mg). Then, he concluded that the production meets the industry standards for calcium content and the production process does not need to be adjusted. The interval [9.64, 10.30] is a 95% confidence interval. In this chapter, you will learn how to compute confidence intervals.

Nutrition Information (per 100 ml serving)

Vitamin D3: 400 I.U/L

Calcium: 100mg

Carbohydrate: 4.7g

Protien: 3.1g

Source: "Our Products: Fresh Laban," Almarai website, www.almarai.com/main.html#/en/Our%20Products/4.

Page 18: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

284

Estimation is one important aspect of inferential statistics.

Generally, populations are large, and a sample allows us to collect data from the population of interest and determine an estimate of the true population para-meters. One question arises: How large should the sample size be in order to make an accurate estimate of a population parameter? The answer to this question de-pends on factors such as the desired precision and the probability of making an accurate estimate. The sample size will be determined according to the desired level of accuracy we want our estimate to have. We will be able to show that the mean of a random sample is a sufficient estimator of the mean of a normal population with known variance, and this implies that there is nothing to be gained for this purpose by actually specifying the individual values of the sample or the order in which they are obtained.

In Chapter 6, we used samples to obtain values such as the sample mean, sam-ple variance, sample standard deviation, and sample proportion. These statistics give us an idea about the parameters of the population. For example, the sample mean is a statistic that tells us what the value of the population parameter is – if the sample properly represents the population. In general, we calculate a sample mean or a sample proportion to make an inference about a population parameter (mean or proportion �). Different samples give different sample means or sample proportions.

For example, suppose 20 owners of Mercedes Class C200 cars are asked to drive on a highway for a distance of 100 km. The fuel consumption of each car is recorded. Average consumption of fuel for this sample is computed to be 7.2 liters/100 km. So in this example, 7.2 liters/100 km would be the estimate of fuel consumption of the Mercedes Class C200, the population. If three other samples of the same size were taken, we may get different estimates, such as 6.9, 7.6, and 6.7 liters/100 km. These values are estimates of the population mean . Because these estimates are different, it means that that they are subject to errors.

To overcome this problem, and based on our knowledge of the sampling dis-tribution (using the central limit theorem of Chapter 6), we can develop an interval estimate constructed around the sample mean so that we are reasonably sure that this interval contains the population mean. This interval is known as a confidence interval, which has a specific probability of containing the population parameter we want to estimate: .

ESTIMATION

Assignment of value(s) to a population parameter based on a value of a sample statistic.

ESTIMATE

A statistic obtained from a sample to infer the value of a population parameter.

ESTIMATION

A point estimate is the value of the estimator in a given sample.

Page 19: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

ESTIMATION 285

For example, assume that you go to the airport to pick up your father, who is coming home from a business trip. You check the arrivals board to see if the flight is late. Fortunately, the flight is not late but you notice that 3 other flights are late out of the 15 flights displayed on the board. The proportion of late flights is 3/15 = 0.2. Thus, you have just computed a point estimate of the proportion of late flights from a sample of 15 arrivals. If we use a sample mean to estimate the mean of a population, a sample proportion to estimate the parameter � of a binomial distribution, or a sample variance to estimate the variance of a population, we are in each case using an estimate of the parameter in question.

In this section, we present some important properties of a good estimator: unbi-asedness, efficiency, consistency, and sufficiency.

Let us look at unbiasedness first.

In other words, it would seem desirable that the expected value of an estima-tor be equal to the parameter it is supposed to estimate. Recall that in Example 6.1 on p. 259 we computed the mean of all possible samples of size 2 taken from the population (refer to Table 6.3 on p. 260). When we computed the mean of the sample means, we reached the following conclusion:

The mean of the sample means, X-, is equal to the population mean .

Therefore, the sample mean X is an unbiased estimator of the population mean . This indicates that if we keep taking samples from this population and compute X for each of the samples, in the long run the average value of the Xs will be the parameter .

The next property of good estimators is efficiency.

If we compare two estimators, one estimator is relatively more efficient than the other if its variance is smaller.

Another desirable property of estimators is consistency.

UNBIASEDNESS

The property of an estimator that its expected value is equal to the population parameter it estimates.

EFFICIENCY

The property of an estimator that it has a relatively small variance.

POINT ESTIMATE

A value of a sample statistic that is used to estimate a population parameter.

CONSISTENCY

The property of an estimator that its probability of being close to the parameter it estimates increases as the sample size increases.

Page 20: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

286

When we select a sample from a population, we compute the mean of this sample by dividing the sum of the values by the sample size. If a second sample is taken and a mean is calculated, it is very likely that the second sample will provide a differ-ent mean. Further samples will yield more (different) values for the sample mean. We note that the population stays the same during this process. Therefore, the point estimate assigns a value to , the population parameter, that will almost always be different from the true parameter. So, instead of assigning a single value to a popu-lation parameter, an interval is constructed around the point estimate and then a probabilistic statement that this interval contains the population parameter is made.

The probabilistic statement is given by the confidence level.

An interval constructed based on a confidence level is called a confidence interval.

For example, as we have seen in Chapter 6, the sample mean X has a variance �

X

2 = �2/n. As the sample size n increases, the variance of X decreases, and therefore the probability of being close to the parameter increases: a consistent estimator con-verges towards the parameter it estimates.

The last property of good estimators is sufficiency.

For example, we will be able to show that the mean of a random sample is a sufficient estimator of the mean of a normal population with known variance; this indi-cates that there is nothing to be gained by specifying the order in which the individual values of the sample were obtained. Similarly, we will be able to show that the sample proportion is a sufficient estimator of the parameter � of the binomial distribution; this means that there is nothing to be gained by actually specifying the order in which suc-cesses and failures were obtained.

We want our estimators such that sample mean, sample proportion, and sam-ple variance to have the properties of a good estimator (unbiasedness, efficiency, consistency, and sufficiency) so that we can use them to make inferences about the population parameters , �, and �2.

SUFFICIENCY

The property of an estimator that it utilizes all the information that is contained in a sample.

CONFIDENCE LEVEL

A degree of certainty, expressed as a percentage, that an interval would include the population parameter (for example, a 95% confi-dence level).

CONFIDENCE INTERVAL

A range of values within which we can declare, with some level of confidence, the population parameter lies.

CONFIDENCE INTERVAL FOR THE POPULATION MEAN

Page 21: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

287 CONFIDENCE INTERVAL FOR THE POPULATION

MEAN

When calculating confidence intervals for population means, we need to con-sider whether the standard deviation � is known or not. We will look at both cases: where � is known and where � is unknown.

CONFIDENCE INTERVAL FOR THE POPULATION MEAN WHEN � IS KNOWN

To present the concept of confidence interval by means of an example, let us con-sider again the sampling distribution of X for random samples of size n taken from a population with a mean of and a known standard deviation of �. We know (by the central limit theorem) that the random variable X is normally distributed with mean and standard deviation

�1n when n is at least 30. The following Z-formula for

sample means can be used to find probabilities:

Z = X -

�2n

6.4

The value of can be obtained by rearranging this formula algebraically:

= X - Z�>2n

As the sample mean can be greater than or less than the population mean , Z can be positive or negative. Therefore, we write the preceding expression in the form

= X { Z�>2n

The interval in which is contained is

X - Z�>2n … … X + Z�>2n

Figure 7.1 displays this interval; the area under the curve is the confidence level or probability that is contained within the confidence interval.

FIGURE 7.1

Confidence Interval for � for

Known � This area is theconfidence level

This is the confidence interval

X + Z σ nX – Z σ n

The value of Z depends on the probability with which we want to fall in this inter-val. We know that the area under the normal curve is 0.95 between Z = {1.96. The probability that the mean is located in the interval between Z = {1.96 is shown in Figure 7.2:

PaX - 1.96 �>2n … … X + 1.96 �>2n b = 0.95

This indicates that 0.05 of the area under the normal curve is located in the tails.Let us denote by z�/2

the point on the horizontal axis under the standard normal curve that yields a right-hand tail equal to �>2. As illustrated in Figure 7.2, the area under the standard normal curve to the right of z�/2

is �/2 and, by symmetry of the standard normal distribution, the area under the curve to the left of –z�/2

is also �>2. Therefore, the area under the standard normal curve between-z�/2 and + z�/2 is 1- �.

Page 22: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

288

When a large sample is selected from a population that is not normally distrib-uted, we can use the results of the central limit theorem, which states that for large samples (n Ú 30) the sample mean X is normally distributed with mean and standard deviation �/2n whatever the shape of the population. If � is unknown, then S, the sample standard deviation, can be substituted for �; when n is large, we can use Formula 7.2:

In general, the confidence interval is expressed as a percentage: for example, 95% or 99%.

A (1 - �)100% confidence interval for when � is known and sam-pling is done from a normal population is the interval bounded by

X { z�/2

�1n 7.1

EXAMPLE 7.1Dr. Abbas from the University of Baghdad selects a sample of 16 grades from a population that he assumes to be normally distributed with a standard de-viation of 3. The sample mean turns out to be 76. Construct a 95% confidence interval for the population mean .

Solution

We have X = 76, � = 3, and n = 16. A 95% confidence interval gives a cut-off value z�>2 = 1.96:

PaX - 1.96 �/1n … … X +1.96 �/1n b = 0.95

The confidence interval is

76 - 1.96 3/216 … … 76 + 1.96 3/216

74.53 … … 77.47

There is a 95% chance that such an interval would include the population mean .

FIGURE 7.2

A 95% Confidence Interval for � for

Known �

0.95

0.0250.025

X – 1.96 σ n X + 1.96 σ n

Formula 7.1 can be used as long as the population is normally distributed and the standard deviation is known. Moreover, there is no restriction on the sample size, whether small or large. The value z�/2

is called the cut-off value for the (1 - �)100% confidence level.

Page 23: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

289 CONFIDENCE INTERVAL FOR THE POPULATION

MEAN

A (1 - �)100% confidence interval for when � is known and the sample size is large (n Ú 30) is the interval bounded by

X { z�/2

S1n 7.2

where S is the sample standard deviation.

EXAMPLE 7.2Assume that a sample of 49 accounts of credit card holders from Gulf International Bank yields a monthly mean balance of $675 and a standard deviation of $121. Construct a 90% confidence interval for the population mean .

Solution

Here we have X = 675, S = 121, and n = 49. Since the sample size is large (n Ú 30), we can use the normal distribution approximation and apply Formula 7.2. For a 90% confidence interval:

Z�/2 = {1.645

PaX - 1.645 S/2n … … X + 1.645 S/2n b = 0.95

The confidence interval is

675 - 1.645a121/249b … … 675 + 1.645a121/249b646.6 … … 703.4

Gulf International Bank can be 90% confident that the average monthly bal-ance of credit card holders is between $646.60 and $703.40.

FINITE CORRECTION FACTOR

Recall from Chapter 6 that if the sample is taken from a finite population, a finite population correction factor may be used to increase the accuracy of the solution. In the case of confidence intervals, the finite correction factor is used to reduce the width of the interval. As stated in Chapter 6, if the sample size is less than 5% of the population, the finite correction factor does not significantly change the solution. The following is Formula 7.1 modified to include the finite correction factor:

X { Z�>2 �1n

CN - nN - 1

7.3

Page 24: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

290

EXAMPLE 7.3Imagine that a sample of 40 employees is taken from Kwality Ice Cream, a Saudi ice-cream manufacturer that employs 500 people. Suppose that a random sample indicates that the average number of days of absence in a year is 5.3. Records show that the company has experienced in the past a standard devia-tion of 3.2 days of absence. Construct a 95% confidence interval to estimate the average number of days of absence of all employees in this company.

Solution

This example involves a finite population. The sample size is n = 40, which is greater than 5% of the population (N = 500). The sample mean is X = 5.3 days, and the population standard deviation is � = 3.2 days. The z�/2

value for a 95% confidence interval is 1.96. Substituting into Formula 7.3, we obtain

5.3 { 1.96 3.2140

C500 - 40500 - 1

= 5.3 { 0.95

The 95% confidence interval for the mean number of days of absence for the population of employees in this company is

4.35 … … 6.25

Without the finite correction factor, the result would have been

4.31 … … 6.29

The interval is wider.

The square root of (500 – 40)/(500 – 1) is 0.96. Multiplying the standard error

a1.96a 3.2/140 b b by this factor reduces the standard error by 4% (i.e. 1 – 0.96).

This reduction in the size of the standard error yields a smaller range of values for estimating the population mean. The larger the sample size the greater will be the reduction in the standard error. For example, if the sample size is 80, you can check that the reduction in the standard error is 8.3%.

If the sample size is large enough (n Ú 30) and the population standard devia-tion is unknown, we can substitute in Formula 7.3 the sample standard deviation S for the population standard deviation �.

TECHNOLOGY

CONFIDENCE INTERVALS FOR MEANS WITH � KNOWN

TEMPLATE 7.1A

Page 25: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

291 CONFIDENCE INTERVAL FOR THE POPULATION

MEAN

CONFIDENCE INTERVAL FOR THE MEAN WHEN � IS UNKNOWN

In this section, we consider the case of a small sample taken from a population that is normally distributed with an unknown standard deviation.

STUDENT’S t-DISTRIBUTIONWhen the population standard deviation is known, the sampling distribution of the mean has only one unknown parameter: its mean . This is estimated by X . In real sampling situations, however, the population standard deviation � is rarely known. The reason for this is that both and � are population parameters. When we select a sample from a normal population with the purpose of estimating its unknown mean , the other parameter of the population, the standard deviation �, is very unlikely to be known. If we use S, the sample standard deviation, as an estimate of the normal population parameter �, we obtain the following random variable:

Figure 7.3 presents the case where the standard deviation is known: we use the z-statistic. The population standard deviation, sample size and sample mean are entered in cells C4, C5, and C6 respectively. The lower and upper limits (cells C10:D13) of the confidence interval are automatically computed for different levels of confidence (cells B10:B13). If the population is not normally distributed but the sample size n is at least 30, we use Formula 7.2 and enter in cell C4 the sample standard deviation as an estimate of the population stand-ard deviationi.

If the population is finite and the sample size is greater than 5% of the population size, we use the finite correction factor to compute the confidence interval. We enter the population size in cell J4. The finite correction factor is automatically calculated in cell J5 and incorporated in to the formula that computes the confidence intervals.

iYou can check this for Example 7.2.

t = X -

S1n

7.4

This random variable follows a distribution known as the t-distribution with (n – 1) degrees of freedom.ii

t-DISTRIBUTION

A distribution that describes the sample of data in small samples (n 6 30) taken from a normal population with an unknown standard deviation.

DEGREES OF FREEDOM

The number of observations in a sample (n) minus the number of parameters being estimated.

iiIn 1908, W. S. Gosset discovered the distribution of the random variable in Formula 7.3 and published his findings under the pen name Student.

Page 26: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

292

We will explain shortly the concept of degrees of freedom. The t-distribution has the following characteristics.

a) The t-distribution and the standard normal distribution are shown graphically in Figure 7.3. Both distributions are continuous and bell-shaped, but the t-distribu-tion is flatter than the normal distribution. This is because the standard deviation of the t-distribution is larger.

FIGURE 7.3

Normal and t- Distributions

Standard normal distribution

-distribution with = 22 -distribution with = 6

0

b) The t-distribution is characterized by its degrees-of-freedom parameter, denoted by df. For each degree of freedom df = 1, 2…, there is a corresponding t-distribution. Therefore, there is not only one t-distribution but rather a “family” of t-distributions. The mean of the t-distribution is zero, but the standard deviation depends on the degrees of freedom, and for df 7 2 it is equal to df/(df � 2).

c) As the number of degrees of freedom increases, the t-distribution approaches the standard normal distribution, because the errors in using S to estimate � decrease with larger samples. This is shown in Figure 7.3.

DEGREES OF FREEDOMPreviously, we stated that the t-distribution has (n–1) degrees of freedom. The degrees of freedom are associated with the sample standard deviation S. Recall that the sample standard deviation S given by Formula 2.14 is equal to

S = Sa (X -X)2

n-1

When we compute S, we need the n values for X and the sample mean X , which is the sum of the n values of X divided by the total number of values n. If we know (n–1) values of X and the sample mean X , we are able to determine the nth value. The nth value is equal to

nX - an-1

i = 1Xi

In this expression, we can freely select (n–1) values and the nth is obtained if we know X . For example, assume we know X = 8 and only five values out of six: 7, 12, 9, 5, and 10. The last value is computed as

6(8) - (7 + 12 + 9 + 5 + 10) = 6

The nth value depends on our choice for the (n–1) values. If the values selected freely are 4, 12, 8, 6, and 14, then the last value is 4.

Page 27: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

293 CONFIDENCE INTERVAL FOR THE POPULATION

MEAN

This is the reason why the number of degrees of freedom is (n–1): the nth one is determined by the fact that we know the statistic X .

USING THE t-TABLETo find a value in the t-distribution table (Appendix D) requires that we know the sample size n. The t-distribution table is a compilation of many t-distributions, where each line represents a different sample size. However, the sample size must be con-verted to degrees of freedom before determining a table value. For each distribution, the table gives values that correspond to areas under the curve. To find a value of t, we use Appendix D, a portion of which is reproduced in Table 7.1.

A value of 2.3646 corresponds to seven degrees of freedom. This value also corresponds to

• a confidence level of 95%• a one-tailed � of 0.025• a two-tailed � of 0.5

Figure 7.4 gives the location of the t-statistic and the corresponding probabilities.

The area under the t-curve between –2.3646 and 2.3646 is 95%. The area under the t-curve to the right of 2.3646 corresponds to one tail, i.e. � = 0.025. Finally, the areas under the t-curve to the right of 2.3646 and to the left of –2.3646 correspond to two tails, i.e. � = 0.05.

TABLE 7.1

A portion of the t-distribution

Confidence intervals

50% 80% 90% 95% 98% 99%

One-tailed � 0.25 0.1 0.05 0.025 0.01 0.005

df Two-tailed � 0.5 0.2 0.1 0.05 0.02 0.01

1 1 3.0777 6.3137 12.706 31.821 63.656

2 0.8165 1.8856 2.9200 4.3027 6.9645 9.9250

3 0.7649 1.6377 2.3534 3.1824 4.5407 5.8408

4 0.7407 1.5332 2.1318 2.7765 3.7469 4.6041

5 0.7267 1.4759 2.0150 2.5706 3.3649 4.0321

6 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074

7 0.7111 1.4149 1.8946 2.3646 2.9979 3.4995

8 0.7064 1.3968 1.8595 2.3060 2.8965 3.3554

9 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498

10 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693

11 0.6974 1.3634 1.7959 2.2010 2.7181 3.1058

12 0.6955 1.3562 1.7823 2.1788 2.6810 3.0545

13 0.6938 1.3502 1.7709 2.1604 2.6503 3.0123

Page 28: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

294

CONFIDENCE INTERVALS WITH THE t-DISTRIBUTIONWhen the standard deviation � is unknown, the sample size is small, and the sample is drawn from a normal distribution, we use Formula 7.4

t = X-

S/2n

to build a confidence interval to estimate the mean . This formula can be manipu-lated to yield the following boundaries for the interval containing :

X { t�/2, n-1

S1n

where t�>2, n-1 corresponds to the cut-off value for the specified (1-�)100% confi-dence level for (n–1) degrees of freedom.

FIGURE 7.4

Values of t and Probabilities df = 7

0.025 0.025

1 – α = .95

2.3646–2.3646

α/2 α/2

A (1 -�)100% confidence interval for the mean when the standard deviation � is unknown and sampling is done from a normal popula-tion with sample size n less than 30 is the interval bounded by

X { t�/2, n-1

S1n 7.5

EXAMPLE 7.4The following data represent the average production of drinking water (in liters/second) per well in a sample of 24 Algerian governorates.

17.59 10.04 12.59 13.01 5.53 18.06

13.42 11.60 10.02 21.64 15.93 15.26

23.93 15.23 12.82 25.44 12.61 42.19

11.67 20.33 25.70 9.00 26.40 10.48

Source: Ministere des Ressources en Eau, www.ons.dz

Assume that the data are normally distributed; construct a 90% confidence in-terval for the population mean of this set of data.

Page 29: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

295 CONFIDENCE INTERVAL FOR THE POPULATION

MEAN

Solution

First, we compute the mean and standard deviation of this sample of 24 data:

X = 16.69, S = 7.9

The number of degrees of freedom for t is 24 - 1 = 23. The t-value for a 90% confidence interval when the number of degrees of freedom is 23, obtained from the t-table (Appendix D), is t0.05;23 = 1.714. The confidence interval has the limits

16.69 { 1.714a 7.9124

b = 16.69 { 2.76

13.93 … … 19.45

P(13.93 … … 19.45) = 0.90

We are 90% confident that the average production of drinking water per well is between 13.93 and 19.45 liters/second.

EXAMPLE 7.5Suppose that Europcar Aden, a car rental agency in Yemen, collected the follow-ing mileages for a sample of 15 cars rented in January 2011. What assumption should the manager make before constructing a 95% confidence interval for the population mean? Interpret the results. Note that data are given in km/day.

85 48 74 146 75 92 59 67

72 114 230 83 55 172 95

Solution

First, the manager must assume that the population of mileages is approxi-mately normally distributed in order to use the t-distribution to compute the confidence interval. To construct the confidence interval he needs to compute the sample mean and sample standard deviation:

X = 97.8, S = 49.6

The number of degrees of freedom for t is 15 - 1 = 14. The t-value for a 95% confidence interval when the number of degrees of freedom is 14, obtained from the t-table (Appendix D), is t0.025;14 = 2.145. The confidence interval has the limits

97.8 { 2.145a 49.6415b

70.7 … … 124.1

P(70.7 … … 124.1) = 0.95

Europcar Aden could therefore be 95% confident that the average daily mile-age is between 70.7 and 124.1 km.

Page 30: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

296

Recall the chapter opening case of Almarai, where a sample of 14 observations on the calcium content of laban gives a sample mean of 99.97 mg and a standard deviation of 0.568 mg. If we assume that the sample is taken from a normal popula-tion with unknown variance, we can use the t-distribution to compute the 95% con-fidence interval:

X = 99.97 S = 0.568

The number of degrees of freedom for t is 14 - 1 = 13. The t-value for a 95% confidence interval when the degrees of freedom is 13, obtained from the t-table (Appendix D), is t0.025;13 = 2.1604. The confidence interval is computed as

99.97 { 2.1604 a 0.568413

b = 99.97 { 0.34

99.63 … … 100.31

There are tolerance limits accepted by the Dairy Association requiring that the calcium content of laban be between 99 and 101 mg. The confidence interval [99.63, 100.31] is inside the tolerance limits accepted by the Dairy Association; therefore, Almarai is meeting the required standard.

Students sometimes have difficulty deciding whether to use z�>2 or t�>2 values when finding confidence intervals for the mean:

• As stated previously, when � is known, z�>2 values can be used no matter what the sample size is, as long as the variable is normally distributed or n Ú 30.

• When � is unknown and n Ú 30, S can be used as in Formula 7.2, and z�>2 values can be used.

• Finally, when � is unknown and n 6 30, we use S and t�>2 values as in Formula 7.5, as long as the variable is approximately normally distributed.

This is summarized in Figure 7.5.

FIGURE 7.5

When to Use Z- or t-Distributions

Is the population normal?

No

NoNo

Use an appropriatenonparametric test Use the Z distribution Use the t distribution

Is the populationvariance known?

Use the Z distribution

Is n ≥ 30?

Yes

YesYes

Page 31: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

297 CONFIDENCE INTERVAL FOR THE POPULATION

MEAN

TECHNOLOGY

CONFIDENCE INTERVALS FOR MEANS WITH � UNKNOWN

Figure 7.7 presents the case where the standard deviation is unknown and the sample is taken from a population that is normally distributed: we use the t-statistic. The sample size, the sample mean, and the sample standard deviation are entered in cells C17, C18, and C19 respectively. The confidence interval is automatically computed for different levels of confidence (cells B23:B26).

EXAMPLE 7.6A sample of hospitals in 20 regions of Saudi Arabia yielded a national average length of stay for inpatients of 3.8 days, with a standard deviation of 0.96 days. Assume that the distribution is normal. Construct a 95% confidence interval for the length of stay in Saudi hospitalsiii.

(Continued )

TEMPLATE 7.1B

iii Source: 2009 Health Statistical Yearbook

Page 32: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

298

CHECK YOUR UNDERSTANDING

7.1 Find a confidence interval for , as-suming that each sample is taken from a normal population.

a) X = 14, � = 4, n = 5, 95% confidence level.

b) X = 22, � = 6, n = 12, 90% confidence level.

c) X = 56, � = 9, n = 22, 99% confidence level.

7.2 Find the t-values for the following � levels and degrees of freedom:

t0.10,29 t0.025,13 t0.05,18t0.90,20 t0.95,25 t0.10,40

7.3 Suppose that a German car manufac-turer wants to estimate the average km/liter highway rating for a new model. Suppose that a previous study indicated that the standard devia-tion for similar models is 2.5 km/liter. A random sample of 81 highway runs yields a sample mean of 15.34 km/liter. Find a 95% confidence interval for the population mean km/liter highway rating.

7.4 A gas station owner in Abu Dhabi would like to estimate the mean number of lit-ers of gasoline sold to his customers. From past records, he selects a random sample of 70 sales and finds a mean of

36 liters and a standard deviation of 8 liters. Find a 99% confidence interval for the population mean. What did you assume about the population?

7.5 Alexandria Travel Service owns 500 rental cars. The manager is interested in estimating the mean number of kilometers for which cars are used dur-ing weekends. She selects a random sample of 35 cars for a particular week-end. The survey indicates an average of 128 km and a standard deviation of 31.5 km. Construct a 90% confidence inter-val for the population mean.

7.6 A survey of 26 Indian executives reveals that they spend an average of 52 hours in the office. The standard deviation is 4.3 hours. Assume that the sample is selected from a normal population. Find a 95% confidence interval for the population mean. Can we assume that the mean of the population is 54 hours?

7.7 A sample of six hotels shows that the average rate of occupancy of all types of hotels is 53.6% with a standard devi-ation of 7.28%. Assume the data comes from a normal distribution; what is the 95% confidence interval for the popu-lation mean rate of occupancy?Source: Oman Ministry of Tourism

REVIEW PROBLEMS

Solution

X = 3.8, S = 0.96, n = 20

The number of degrees of freedom for t is 20 - 1 = 19. The t-value for a 95% confidence interval when the degrees of freedom is 19, obtained from the t-table (Appendix D), is t0.025,19 = 2.093. The confidence interval is given by

3.8 { 2.093a 0.96119

b = 3.8 { 0.46

3.34 … … 4.26

In this case, there can be 95% confidence that the length of stay is between 3.34 days and 4.26 days.

Page 33: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

299 CONFIDENCE INTERVAL FOR A PROPORTION

Methods similar to those presented in the previous sections can be applied to es-timate the population proportion. In Chapter 5, we mentioned that for a binomial random variable, approximation by a normal distribution works well and, in general, offers a good estimate when both n� and n(1-�) are greater than 5. In Chapter 6, we also indicated that the central limit theorem applies to sample proportions provided that n� 7 5 and n(1-�) 7 5. The mean of a sample of proportions over all samples of size n randomly selected from a population is � (the population proportion), and the standard deviation of the sample proportion is (Formula 6.9 on p. 269)3�(1 - �)>nProbabilities concerning sample proportions are computed using Formula 6.6 on p. 269, namely

Z = pn - �

C�(1 - �)

nwherepn = sample proportion = x/n� = population proportionx = number of successesn = sample size

In Formula 6.6, we are trying to estimate �. However, the standard deviation of the sample proportion 2� (1-�)/n requires that � be known. To overcome this problem, we must estimate the standard deviation of the sample proportion by sub-stituting the sample proportion pn for the population proportion �, but this is valid only for large samples.

CONFIDENCE INTERVAL FOR A PROPORTION

7.8 Forty-nine items are randomly selected from a population of 400 items. The sam-ple mean is 38 and the sample standard deviation is 8. Find a 90% confidence interval for the population mean.

7.9 The following data represent a sample of the monthly average amount of spend-ing ($) per tourist in 2010 in Jordan.

3,261.84 1,730.98 5,557.91 1,374.25

2,445.00 1,794.79 4,318.81 2,144.99

1,743.35 3,247.81 2,492.62 2,553.72

Sources: Jordan Ministry of Tourism and Central Bank

Assume a normal distribution; con-struct a 90% confidence interval for the population mean of monthly spending.

7.10 Habib Cortas selects a sample of 18 2-liter bottles of orange juice to check whether the filling machine is operat-ing correctly. The sample, taken from a normal population, gives a mean of 1.96 liters and a standard deviation of 0.05. Construct a 98% confidence in-terval for the mean content of a 2-liter bottle. Is the filling machine out of adjustment?

A (1 - �)100% confidence interval for the population proportion � is an interval bounded by

pn { z�>2Cpn(1 - pn)

n 7.6

where the sample proportion is pn = x/n (with x being the number of successes in a sample of n trials).

Page 34: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

300

If the sample is selected from a finite population and if the sample size is greater than 5% of the population, we incorporate the finite correction factor

CN - nN - 1

into Formula 7.6 as we did in Formula 7.3.

We illustrate the use of Formula 7.6 in the following example.

EXAMPLE 7.7Imagine that a market research firm based in Dubai wants to estimate the share that local companies have in the Gulf market for food products. If a study of 110 randomly selected consumers revealed that 40 of them buy local products, find a 95% confidence interval for the share of local products in the market.

Solution

We have n = 110, x = 40. The sample proportion estimate is

pn = 40/110 = 0.364

A 95% confidence interval for � has the limits

pn { z�>2Cpn(1 - pn)

n= 0.364 { 1.96C0.364(1 - 0.364)

110= 0.364 { 1.96(0.0459)

= 0.364 { 0.09

The firm can be 95% confident that local products represent anywhere from 27.4% to 45.4% of the market.

EXAMPLE 7.8A sample shows that the Lebanese Central Bank has increased the interest rate 11 times from January to February and decreased it 17 times. The data concern the period between January 1982 and December 2010.iv Find a 90% confidence interval for the proportion of interest rate increases.

Solution

We have n = 28, x = 11. The sample proportion estimate is

pn = 11/28 = 0.393

A 90% confidence interval for � is bounded by

pn { z�>2Cpn(1 - pn)

n= 0.393 { 1.645C0.393(1 - 0.393)

28= 0.393 { 1.645(0.092)

= 0.393 { 0.15

We are 90% confident that interest rate increases represent anywhere from 24.3% to 54.3% of the changes.

iv Source: Banque du Liban.

Page 35: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

301 CONFIDENCE INTERVAL FOR A PROPORTION

EXAMPLE 7.9Imagine that an airline asked all its passengers boarding a flight out of Egypt to fill in a questionnaire about the company. The number of questionnaires completed was 796. Hafez, a marketing manager, selected a sample of 45 ques-tionnaires and recorded the answers to the question about the quality of food served on board. Nine passengers responded that the food was excellent. Construct a 98% confidence interval for the proportion of passengers that found the food excellent.

Solution

We have N = 796, n = 45, x = 9. The sample proportion estimate is

pn = 9/45 = 0.2

Since n/N = 45/796 = 0.057 7 0.05, we must use the correction factor to get the 98% confidence interval for �:

pn { z�>2Cpn(1 - pn)

n CN - nN - 1

= 0.2 { 2.33C0.2(1 - 0.2)

45 C796 - 45796 - 1

= 0.2 { 0.135

Hafez can be 98% confident that the proportion of passengers that found the food served on board excellent lies in the range from 6.5% to 33.5%.

TECHNOLOGY

CONFIDENCE INTERVALS FOR PROPORTIONSThe sample size and sample proportion are entered in cells C4 and C5 respectively. The confidence intervals are given in cells C9:D12 for differ-ent confidence levels. This template has provision for the finite correction factor too.

TEMPLATE 7.2

Page 36: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

302

The chi-square distribution with various degrees of freedom is displayed in Figure 7.6. There is a theorem in Statistics, the proof of which is beyond this text, that states that if S2 is the variance of a random sample of size n drawn from a normal population with mean and standard deviation �, then

(n - 1)S2

�2

has a chi-square distribution with (n–1) degrees of freedom. The chi-square distribu-tion’s shape varies according to the number of degrees of freedom, as illustrated in Figure 7.6.

In this section, we explain how to obtain a confidence interval for the variance and the standard deviation. For example, when items that fit together – such as pipes – are manufactured, it is essential to keep the variations in the diameters as small as possible; otherwise the items will not fit correctly and will be scrapped. In Chapter 2, we used Formula 2.12 to compute the sample variance, namely

S2 = a(X - X)2

n - 1

where the sum of squared deviations from the mean,a (X - X)2, is divided by (n - 1) rather than by n. The reason concerns the degrees of freedom for the deviations.

In the previous section, the t random variable offered us a method of construct-ing confidence intervals for the mean of a normal population when the standard deviation is unknown and is replaced by its estimate S. Another such continuous dis-tribution that allows us to find the confidence interval of the variance and standard deviation of a normal distribution is the chi-square distribution: �2(pronounced ‘kai square’).

CONFIDENCE INTERVAL FOR THE VARIANCE

CHI-SQUARE DISTRIBUTION

A skewed continuous distribution whose shape depends on the number of degrees of freedom.

FIGURE 7.6

Chi-Square Distributions = 1

= 5

= 15

0 χ2

Page 37: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

303 CONFIDENCE INTERVAL FOR THE VARIANCE

USING THE �2 TABLE

Appendix E gives values of the chi-square statistic �2 with different degrees of free-dom, for given tail probabilities. A truncated version of the table is shown in Table 7.2 for different tail values. The table provides �2 for different right tails (probabilities) and for given degrees of freedom (df).

TABLE 7.2

A portion of the �2 distribution

df

0.995 0.99 0.975 0.95 0.9 0.1 0.05 0.025 0.01 0.005

1 0.00005 0.0002 0.001 0.0039 0.0158 2.7055 3.8415 5.0239 6.6349 7.8794

2 0.0100 0.0201 0.0506 0.1026 0.2107 4.6052 5.9915 7.3778 9.2104 10.597

3 0.0717 0.1148 0.2158 0.3518 0.5844 6.2514 7.8147 9.3484 11.345 12.838

4 0.207 0.2971 0.4844 0.7107 1.0636 7.7794 9.4877 11.143 13.277 14.86

5 0.4118 0.5543 0.8312 1.1455 1.6103 9.2363 11.07 12.832 15.086 16.75

6 0.6757 0.8721 1.2373 1.6354 2.2041 10.645 12.592 14.449 16.812 18.548

7 0.9893 1.239 1.6899 2.1673 2.8331 12.017 14.067 16.013 18.475 20.278

8 1.3444 1.6465 2.1797 2.7326 3.4895 13.362 15.507 17.535 20.09 21.955

9 1.7349 2.0879 2.7004 3.3251 4.1682 14.684 16.919 19.023 21.666 23.589

10 2.1558 2.5582 3.247 3.9403 4.8652 15.987 18.307 20.483 23.209 25.188

EXAMPLE 7.10

Using the chi-square distribution with 10 degrees of freedom, find P(�2 7 18.307) and P(�2 6 3.247).

Solution

Table 7.2 shows the right-tail area (probability).

P(�2 7 18.307) = 0.05

This probability is shown in Figure 7.7.

df = 10

18.307 χ2

α = 0.05

0

FIGURE 7.7

P(�2+ 18.307) � 0.05

(Continued )

Page 38: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

304

Next, we will use the chi-square distribution to construct confidence intervals for variances and standard deviations.

CONFIDENCE INTERVALS WITH THE �2 DISTRIBUTION

To derive a confidence interval for �2, we use the fact that (n - 1)S2

�2 follows a chi-

square distribution with (n–1) degrees of freedom; therefore, the sampling distribu-tion for S2 can be defined using the chi-square distribution:

�2 = (n - 1)S2

�2 7.7

We can also write

�2 = (n - 1)S2

�2

A (1 - �)100% confidence interval for the population variance �2 (where the population is normally distributed) is

(n - 1)S 2

��>22 … �2 … (n - 1)S 2

�1-�>22 7.8

where �2�/2 is the value of the chi-square distribution with (n–1) de-

grees of freedom such that the area under the curve to its right is �/2, and �1-�>2

2 is the value of the chi-square distribution with (n–1) degrees of freedom such that the area under the curve to its left is �>2 (or the area to its right is 1-�>2).

This can be written as

�0.05,102 = 18.307

For �2 = 3.247, Table 7.2 tells us that the area to the right of 3.247 is 0.975. Because the total area is 1, the area to the left of 3.247 is 1 – 0.975 = 0.025 and so P(�2 6 3.247) = 0.025. This probability is shown in Figure 7.8.

χ2

α =.025df = 10

.975

0 3.247

FIGURE 7.8

P(�2* 3.247)

Page 39: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

305 CONFIDENCE INTERVAL FOR THE VARIANCE

EXAMPLE 7.11Flour comes in 50-kg bags. Suppose the packaging unit of Union Mills Co., based in Aleppo, Syria, is concerned about the variation in weight of 50-kg bags since it acquired new packaging equipment. Suppose that a random sample of 15 bags yields the following weights:

51.2 47.5 50.8 51.5 49.5 51.1 51.3 50.7 46.7 49.2 52.1 48.3 51.6 49.2 51.5

Find a 90% confidence interval for �2 and for �. Assume that the bag weights are normally distributed.

A previous study made by a Turkish company shows that a 95% confi-dence interval for their population standard deviation is [1.35, 2.78]. How do you compare the two companies?

Solution

The mean and standard deviation for these data are

X = 50.15, S = 1.65

The corresponding 90% confidence interval for �2 is

(15 - 1)(1.65)2

�0.052 … �2 …

(15 - 1)(1.65)2

�0.952

(14)(1.65)2

23.685 … �2 …

(14)(1.65)2

6.571

1.61 … �2 … 5.80

The corresponding 90% confidence interval for � is

1.27 … � … 2.41

The packaging unit can estimate with 90% confidence that the population stan-dard deviation of the weight of flour bags is between 1.27 and 2.41 kg.

We cannot compare the two companies’ results because the confidence levels are different: the first is 90% and the second is 95%. To make comparisons, we must use a common confidence level. The 95% confidence level for Union Mills Co. is [1.21, 2.60], which is smaller than the Turkish company’s result These results suggest that Union Mills Co. is performing bet-ter than the Turkish company.

A higher confidence level results in a wider confidence interval.Caution: The confidence intervals of the variance and standard deviation re-

quire that the population be normally distributed. We cannot use the central limit theorem for S2. The confidence intervals will not be accurate if the population is not normal.

Page 40: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

306

TECHNOLOGY

CONFIDENCE INTERVALS FOR VARIANCES

The sample size and sample variance are entered in cells C4 and C5 re-spectively. Confidence intervals are shown in cells C9:D12 for different confi-dence levels. This template has provision for calculating confidence intervals for standard deviations; see cells H9:I12.

CHECK YOUR UNDERSTANDING

REVIEW PROBLEMS7.11 Use the chi-square table to determine

the following values:

a) �0.1,102

b) �0.025,302

c) �0.95,152

d) �0.01,262

7.12 For each of the following cases, find pn and construct a confidence interval.

a) n = 60, x = 35, confidence level 90%b) n = 150, x = 85, confidence level 95%c) n = 120, x = 72, confidence level 98%d) n = 90, x = 42, confidence level 99%

7.13 For each of the following situations, check if the sample size is large enough to use the normal distribution to make a confidence interval for �:

a) n = 50, pn = 0.35b) n = 150, pn = 0.06c) n = 350, pn = 0.45d) n = 70, pn = 0.08

7.14 Suppose Al-Baghdadia TV has recently introduced a new television series for its afternoon programming. The pro-gram manager wants to know how the audience likes the series. She ran-domly selects 100 people who watched this series during the previous week. Of the sample, 71 people liked the series. Construct a 95% confidence in-terval for the proportion of all people who like this series.

7.15 A sample of 20 observations selected from a normal distribution yields a sample variance of 33. Construct a

TEMPLATE 7.3

Page 41: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

307 ESTIMATION OF THE SAMPLE SIZE

confidence interval for �2 for each of the confidence levels in a) to c):

a) 99%b) 95%c) 90%d) What happens to the confidence in-

terval of �2 as the confidence level decreases?

7.16 A nationwide study claimed that 21% of the 13- to 15-year-old age group in Jordan are smokers. A sample of 120 teenagers selected from a large school revealed that 19% were smokers. Find a 95% con-fidence interval for the proportion and compare this with the results of the study.Source: Jordan Times

7.17 Faisal, the quality engineer of a man-ufacturer of light bulbs, periodically takes a random sample of 25 light bulbs for testing the lifetime. A sample yields a variance of 4,600 hours. Assume that the life of light bulbs is normally dis-tributed. Construct a 95% confidence

interval for the variance of lifetimes of all light bulbs of this manufacturer.

7.18 Professor Mazbout’s one-hour lectures vary in length. A sample of 20 of these lectures yields a standard deviation of 2.5 minutes. Assume that Professor Mazbout’s lectures are normally dis-tributed. Construct and interpret a 98% confidence interval for the population variance and standard deviation of the lengths of all one-hour lectures by Professor Mazbout.

7.19 The sugar content (in grams) of a ran-dom sample of 25 cl containers of or-ange juice are the following.

3.6 4.5 6.2 4.3 6 5.3 4.6 5.7 3.9 3.7

7.1 4.7 5.8 3.9 5.7 6.6 4.5 5.1 5.3 4.9

Construct a 99% confidence interval for the population standard deviation. Assume a normal distribution of sugar content.

A large sample generally provides a better representation of the population than does a smaller one. But acquiring a large sample can be costly and time consum-ing: why obtain a sample of size n = 500 if a sample of size n = 200 will provide sufficient accuracy in estimating a population parameter such as mean, proportion, or variance? This section demonstrates how to determine what sample size is neces-sary for estimating the different parameters.

SAMPLE SIZE FOR ESTIMATING WHEN � IS KNOWN

When we want to estimate the mean of a population, the sample size can be deter-mined by using the Z-formula for sample means to solve for n:

Z = X -

�1n

6.4

How close do we want our sample estimate X to be to the unknown parameter ? The answer lies in the error of estimation that results from the sampling process.

ESTIMATION OF THE SAMPLE SIZE

Page 42: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

308

The more we increase the confidence level, the larger the sample will need to be, as can be seen from Formula 7.9.

SAMPLE SIZE FOR ESTIMATING WHEN � IS UNKNOWN

Formula 7.9 assumes we know the population standard deviation �. Most of the time, the population standard deviation � is unknown and must be determined. There are different approaches to estimating the population standard deviation.

Let us define E = 0X - 0 as the error of estimation. Substituting E into the Formula 6.4 yields

Z = E

�/2n

If we know the critical value z�>2 for a given level of confidence we can solve for the sample size.

n = z �>2

2 �2

E 2 = a z �>2 �

E b2

7.9

EXAMPLE 7.12Suppose that a Jeddah-based market research firm wants to determine the sample size required to estimate household spending on grocery products. The company specified that any estimate must be based on a 95% confidence level. Further, suppose that the margin of error must not exceed {$8. Given these requirements, what sample size is needed if the standard deviation of spending is $35?

Solution

We use Formula 7.9 to estimate the sample size n. For this, we need z�>2 for a confidence level of 95%; this corresponds to z0.025 = 1.96. Also,

E = 8, � = 35

Substituting these values into Formula 7.9, we obtain

n = a 1.96(35)

8 b2

= 73.5 � 74

Thus, to meet the requirement, a sample of 74 customers should be selected.

ERROR OF ESTIMATION

The difference between the sample mean X and the population mean .

Page 43: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

309 ESTIMATION OF THE SAMPLE SIZE

• We can select a pilot sample from the population of interest of a smaller size than the anticipated sample size. This pilot sample should provide us with an estimate of the population standard deviation. Then, the pilot sample standard deviation S is used in Formula 7.9 to obtain the sample size n.

• The second option is to use the fact that 95% of the values of a normal population are located within two standard deviations of the mean (refer to Figure 7.2). In this case, the upper bound is estimated at + 2� and the lower bound at -2�. Thus, the range is 4�. An estimate for � is range/4. For example, if the maximum value is 35 and the minimum value is 15, an estimate for � is (35 -15)/4 = 5. Some statisticians use the fact that 99% of values of a normal population fall within three standard deviations of the mean. In this case, the range is divided by 6 to obtain an estimate of �. For our example, we would have � = (35 -15)/6 = 3.33.

• Suppose we are dealing with a Poisson distribution, such as the number of arrivals at a bank; in this case we know that the standard deviation � is 1� where � is the mean arrival rate (Formula 4.17 on p. 197). If the arrival rate is 25 customers per hour, then an estimate for the standard deviation is

� = 125 = 5

EXAMPLE 7.13Ramzi, the owner of a Lebanese food company, wants to estimate the mean weight content of one kilogram tomato cans that are filled by a machine, with 95% confidence and an error of 8 grams. Assume that a pilot sample of 30 cans shows a sample standard deviation of 36 grams. What should the sample size be to estimate the mean weight content?

Solution

For a 95% confidence level we set z = 1.96. We use S = 36 in place of � and set the desired error E to 8 to obtain the required sample size:

n = a z�>2�E

b2

= [(1.96)(36)/8]2

= 77.8 or 78 cans.

EXAMPLE 7.14In Example 7.13, assume that the largest weight content is 1,085 g and the small-est is 890 g. Use a range of 6� to obtain a sample size with a 95% confidence level.

Solution

For a 95% confidence level, we set z = 1.96.

S = range/6 = (1,085 - 890)/6 = 32.5

E = 8

(Continued )

Page 44: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

310

Caution: E, the allowable error, must be expressed in the same units as X or �.

SAMPLE SIZE WHEN ESTIMATING THE POPULATION PROPORTION

Suppose we want to estimate a population proportion with an error of estimation within {E. What sample size should we require? In order to find the sample size needed to determine a confidence interval we use Formula 7.6.

The error of estimation is

E = z�>2Cpn(1 - pn)

n

The sample size n is

n = a z�>2E

b2

pn(1 - pn) 7.10

where pn = x/n is an estimate of the population proportion.

EXAMPLE 7.15What sample size would be needed to estimate the true proportion of house-holds in Muscat that own a plasma TV, with 90% confidence and an error of {2%, when a previous sample gave a proportion of 0.25?

Solution

We set E = 0.02; a 90% confidence level corresponds to z = 1.645.

The sample proportion is pn = 0.25. Applying Formula 7.10, we obtain

n = a 1.6450.02

b2

0.25(1 - 0.25)

= 1268.4

The sample size should be 1,269.

Note: The unit used for E is not a percentage but a proportion: use 0.02 instead of 2%.

The required sample size is

n = [(1.96)(32.5)/8]2

= 63.4 or 64 cans

Note that if we had used a range of 4�, the estimated standard deviation would have been S = (1,085 - 890)/4 = 48.75 and the sample size 143. Therefore, a larger estimate of the standard deviation leads to a larger sample size.

Page 45: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

311 ESTIMATION OF THE SAMPLE SIZE

TECHNOLOGY

SAMPLE SIZE DETERMINATION

This spreadsheet can be used in conjunction with Excel’s Goal Seek to find the required confidence level for a given size n of the sample and a specified error E. For example, we want to know the confidence level for a sample of size 100. We call up Goal Seek and define the ‘Set cell’ as C9; the value of the sample size is entered in ‘To value’: 100. The ‘By changing cell’ is B9. Once this is done we obtain a confidence level of 98%.

Suppose we want to know the error that corresponds to a given confidence level (say 95%) and sample size (say 100). We again use Goal Seek. The ‘Set cell’ is C11 because it corresponds to the 95% confidence level. Next, we set ‘To value’to 100. The ‘By changing cell’ corresponds to the error E: cell C6. The result is an error of 6.86v.

vYou should enter the values for each example to see how Goal Seek works.

TEMPLATE 7.4

CHECK YOUR UNDERSTANDING

REVIEW PROBLEMS7.20 Find the sample size necessary to esti-

mate when

a) � = 25 and E = 6 at a 95% confidence level,

b) � = 3.8 and E = 1.6 at a 90% confidence level.

7.21 Find the sample size necessary to esti-mate � when

a) p= = 0.35 and E = 0.04

at a 98% confidence level,

b) p= = 0.72 and E = 0.03

at a 90% confidence level.

7.22 You want to estimate the mean spend-ing of households in Aden. A pilot sam-ple of 20 families yields a standard deviation of $65. You want to be 98% confident and you want your estimate to be within $13. How many house-holds should you interview?

7.23 Amin, a quality inspector, wants to es-timate the percentage of defects from

(Continued )

Page 46: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

312

a production process. He wants to be 95% confident about the result, with an error margin of 0.03. A previous sample of 60 products yielded two defects. How large should the sample be?

7.24 Ghada, a registrar at the University of Basrah, wants to determine the size of the sample needed to estimate the proportion of students who register in the College of Business. Of last year’s sample of 48 students admitted to the university, 18 elected to study business. She wants to be 90% confident with an error of 0.1. Calculate the size of the sample for Ghada.

7.25 The mean arrival rate of customers at a certain bank last year was 25 customers per hour on Sunday mornings. How large a sample would be required to es-timate this year’s mean arrival rate with a 90% confidence interval and an error of 2? Explain your assumption about �.

7.26 Fatima wants to invest in a school. She wants to be 95% confident about the estimate of the mean number of admis-sions. She decides to take a sample of seven schools that opened in the last three years for her pilot study. The data on admissions are as follows.

284 326 290 352 315 298 274

How large should the sample be if she wants the error to be no more than 12 admissions?

7.27 Consider Problem 7.24. What could be the maximum error Ghada makes if she wanted to take a sample of 52 students at the same confidence level (90%)?

7.28 Consider Problem 7.22. What is the maximum error you can make if you take a sample of only 70 at a confi-dence level of 90%?

CHAPTER SUMMARY

• An estimator is a statistic obtained from a sample to infer the value of a population parameter. A good estimator has the fol-lowing characteristics.■ Unbiasedness: its expected value is equal

to the value of the population parameter.■ Efficiency: it has a small variance.■ Consistency: its probability of being close

to the population parameter increases as n increases.

■ Sufficiency: it utilizes all the informa-tion contained in a sample.

• We developed confidence intervals for three parameters: , � and �. When the sample size is at least 30, we applied the normal approximation (central limit theorem) to get the confidence interval for when the distribution is not normal (Formulas 7.1 and 7.2).

• If the sample size is less than 30 and the sample is taken from a normal distribution with unknown variance, we used the t-distribution (with n -1 degrees

of freedom) to obtain the confidence interval for the parameter (Formula 7.5).

• If the sample is large, we use the central limit theorem to get a confidence interval for the proportion (Formula 7.6). If the sample is taken from a finite population and the sample size is greater than 5% of the population, we incorporate the finite

correction factor CN - nN - 1

into Formula

7.6 as we did in Formula 7.3.

• To construct a confidence interval for variances, we use the chi-square distribu-tion (with n -1 degrees of freedom) if the sample comes from a normal distri-bution (Formula 7.8).

• The minimum sample size n is computed based on a given confidence level (per-centage of time that the confidence interval contains the parameter) and the error of estimation E, defined as |X - | (Formula 7.9) or |pn -�| (Formula 7.10).

Page 47: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

313 KEY FORMULAS

Chi-square distribution 302

Confidence interval 286

Confidence level 286

Consistency 285

Degrees of freedom 291

Efficiency 285

Error of estimation 307

Estimation 284

Estimate 284

Point estimate 284

Sufficiency 286

t-distribution 291

Unbiasedness 285

KEY TERMS

Confidence interval for when � is known has the limits

X { z�>2�2n

7.1

Confidence interval for when � is unknown and the sample size is large (n Ú 30) has the limits

X { z�>2 S2n

7.2

Confidence interval for when � is known and sample is taken from finite population has the limits

X { z�>2 �2n

CN -nN - 1

7.3

t-statistic t = X - >S2n

7.4

Confidence interval for when � is unknown and the sample size is small (n 6 30) has the limits

X { t�>2, n - 1 S2n 7.5

Confidence interval for the population proportion � has the limits pn { z�>2Cpn(1 - pn)

n7.6

Chi-square statistic �2 = (n - 1)S2

�27.7

Confidence interval for the population variance �2

(n -1)S2

� 2�>2 … �2 …

(n -1)S2

�21-�/2

7.8

Sample size when estimating n = z2

�>2 �2

E 2 = a z�>2 �

Eb2

7.9

Sample size when estimating � n = a z�>2Eb2

pn (1-pn ) 7.10

KEY FORMULAS

Page 48: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

314

SOLVED PROBLEMS

PROBLEM ATo attract candidates to its MBA program, a university claims to be accepting 35% of candidates. During the previous year, of the 245 candidates who applied 70 were accepted. Construct a 95% confidence interval for the proportion of acceptances to this program. What do you think of the university’s claim?

SOLUTIONpn = x/n = 70/245 = 0.286

We use Formula 7.6 to obtain the confidence interval for the population proportion. A 95% confidence interval for � is given by:

pn { z�>2Cpn(1 - pn)

n= 0.286 { 1.96C0.286(1 - 0.286)

245= 0.286 { 1.96(0.029)

confidence interval = [0.2294, 0.3426]

The 35% acceptance claim is outside the confidence interval.

PROBLEM BA Geant manager in Tunis wants to estimate the average amount of money that customers spend in the mall. Suppose that 144 customers are randomly selected and the sample results yield an average of $125 and a standard deviation of $29. Use the appropriate template to answer the following questions.a) Construct a 90% confidence interval for the mean spending in the mall.b) If the manager wanted to reduce the margin of error in part a), what options exist to do so?

SOLUTIONWe use Template 7.1B.

a) 90% confidence interval is [121, 129]

b) To reduce the margin of error, the manager can reduce the confidence level to, say, 80% and this will narrow the confidence interval to [121.9, 128.1]; or the manager can increase the sample size to, say, 200, which would shrink the confidence interval to [121.6, 128.4], assuming that the sample mean and standard deviation remain the same.

PROBLEM CIn the automotive industry, an ‘early car replacement’ is defined as replacing a car within the first three years of its life and a ‘late car replacement’ as replacing a car after at least seven years. A Gulf consumer agency surveyed 250 early replacement buyers and obtained an average of 2.7 years and a standard deviation of 0.55 years. Another sample of 220 late replacement buyers yielded a mean of 8.3 years and a standard deviation of 1.2 years.a) Compute a 95% confidence interval for early car replacements.b) Compute a 90% confidence interval for late car replacements.c) How large a sample of early car replacement buyers is required to be 90% confident that

the sample mean X is within 0.07 of ?d) How large a sample of late car replacement buyers is required to be 95% confident that

the sample mean X is within 0.1 of ?

Page 49: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

315 CHAPTER REVIEW PROBLEMS

SOLUTIONa) A 95% confidence interval for early car replacements is given by

P(X - 1.96 S/1n … … X + 1.96 S/1n ) = 0.95

P(2.7 - 1.96 0.55/1250 … … 2.7 + 1.96 0.55/1250 ) = 0.95

confidence interval = [2.63, 2.77]

b) A 90% confidence interval for late car replacements is given by

P(X - 1.64 S/1n … … X + 1.64 S/1n ) = 0.90

P(8.3 - 1.64 1.2/1220 … … 8.3 + 1.64 1.2/1220 ) = 0.90

confidence interval = [8.17, 8.43]

c) n = a 1.64(0.55)

0.07 b2

= 166.04, so the sample size should be at least 167.

d) n = a 1.96(1.2)

0.1 b2

= 554

CHAPTER REVIEW PROBLEMS

7.29 Salalah College of Business wants to install a pho-tocopy machine for staff. From experience at other colleges, the dean believes the number of docu-ments is normally distributed with a daily standard deviation of 44 copies. The machine is tested for five days and the resulting daily mean is 345 copies.

a) Give a 99% confidence interval for the mean number of pages copied per day.

b) Suppose the dean will install the copier if she can be confident that the daily average number of copies will exceed 290. Does the result of a) justify purchasing a copier? Explain.

7.30 Professor Bin Tifor gave three tests last week in a large class. The standard deviation was � = 6 for all three tests and the scores were normally dis-tributed. Below are 10 randomly selected scores on each test. Find a 95% confidence interval for the mean score on each exam. Do the con-fidence intervals overlap? If so, what does this suggest?

Test 1: 76, 69, 78, 71, 80, 72, 76, 82, 76, 70Test 2: 73, 94, 85, 83, 72, 89, 80, 77, 66, 71Test 3: 65, 64, 69, 67, 72, 64, 59, 56, 70, 64

7.31 Suppose that the finance department of Orascom group, a large Egyptian corporation with several branches, conducted a survey to determine the mean travel spending of its salespeople. If a

sample of 64 travel expenses yields a weekly average of $256 and a standard deviation of $82:

a) What is the estimate of the population mean?b) Determine a 95% confidence interval for .

Explain what it indicates.c) Assume the finance department selects

another sample of 81 travel expenses. If the sample mean and sample standard deviation remain the same, what is the 95% confidence interval for the population mean? Explain why this confidence interval is narrower.

7.32 A sample of observations selected from a nor-mal distribution yields a sample variance of 55. Construct a 95% confidence interval for �2 in each of the cases a) to c):

a) n = 12,b) n = 16,c) n = 25.d) What happens to the confidence interval of

�2 as the sample size increases?

7.33 The new manager of the Al Buhaira Bowling Club would like to know how long current mem-bers have been members of the club. He selects a sample of 45 current members. The mean length of membership of the sample is 6.38 years and the sample standard deviation is 1.85 years.

a) What is the mean of the population?

Page 50: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

316

b) Construct a 90% confidence interval for the population mean.

c) The former manager reported a mean of about 7.5 years. Does the sample information support this claim? Explain.

7.34 Suppose there are 2,500 students that are eligible to vote for the students’ council at the University of Sharjah College of Business. A sample of 150 stu-dents revealed that 92 planned to vote for the cur-rent president of the students’ council. Construct a 99% confidence interval for the proportion of eligible voters that plan to vote for the current pres-ident. From this sample information, can you con-firm that the current president will be re-elected?

7.35 Hessa is interested in estimating the average purchase amount at convenience stores in the city of Doha. She selected a random sample of 36 purchases from several convenience stores. Use the following data to construct and interpret a 90% confidence interval for the population mean purchase amount.

6 33 24 21 15 12 7 3 42 21 18 12

13 4 9 17 27 11 9 16 30 24 14 19

15 8 19 15 23 31 7 26 14 35 5 28

7.36 Leila wants to estimate the average time taken to travel to work in the city of Cairo. Using a con-fidence level of 95%, what kind of confidence intervals, based on the following random sample of commuters, can she construct?

24 29 15 19 36 29 44 28 19 47 51 37 35

20 54 12 17 33 39 18 36 49 55 24 20 17

8 21 36 29 11 65 39 28 17 12 25 35 50

7.37 Suppose Azur Airlines wants to estimate the pro-portion of business people traveling from Paris to Beirut, a new route. A sample of 196 passen-gers revealed that 128 were on a business trip. Construct a 90% confidence interval for the pro-portion of business travelers on this new route.

7.38 Assume that there are 2,320 students at an Islamic university. Currently, classrooms are segregated. To cut costs, the university management is consider-ing offering nonsegregated classes at senior level. A survey of 340 students yields 124 students who favor no segregation in classrooms. Develop a 95% confidence interval for the proportion of students who favor no segregation. Management claims that the proportion of students who favor no segrega-tion is at most 25%. What do you think of this claim?

7.39 A machine produces chips for alarms. A quality control inspector checks samples of the chips produced by the machine. If too many chips are defective, the production process is stopped to readjust the machine. If a random sample of 58 chips results in 5 defects, give a 98% confi-dence interval for the population proportion of defective chips made by this machine.

7.40 A hospital has just been informed by a major pharmaceutical group that its new drug may have some undesirable side effects, specifically an increase in heart rate. Twenty patients were se-lected who were prescribed the drug. All patients in the sample showed a heart rate of 58 prior to taking the medication. The following heart rates were recorded after taking the drug for a week.

52 72 62 72 92 78 74 54 82 87

57 70 74 83 44 56 68 80 60 74

a) Based on the sample data, construct a 90% confidence interval to estimate the mean heart rate, assuming a normal population.

b) Referring to your answer in a), can the estimate be applied to all potential patients taking the drug?

c) Referring to your computations in part a), if the average heart rate increased, determine the probability that a sample mean would be at least as large as the one obtained from the sample, as-suming that the beginning mean rate is 62.

7.41 The bad debt ratio for a bank is defined as the ratio of the amount of loans defaulted on to the total amount of loans. Suppose that a random sample of nine banks in a certain city yields the following bad debt ratios expressed in percent.

4 3 5 4 2 5 4 3 5

a) Assuming that the bad debt ratios are nor-mally distributed, determine a 95% confi-dence interval for the mean bad debt ratio.

b) The bank association claims that the average bad debt ratio for all banks of the country is 2.5 and that the mean bad debt ratio for this city is higher. Using a 95% confidence interval, can we be 95% confident that this claim is true? Using a 99% confidence interval, can we be 99% confident that this claim is true?

7.42 Musabah, a chemical engineer, would like to determine whether a new catalyst increases the output of a chemical process, which is currently 450 kg per day. To test this new catalyst, nine trials are made with the following results.

475 523 489 512 548 464 471 498 510

a) Assuming the output is normally distributed, determine a 95% confidence interval for the mean output obtained using this new catalyst.

b) Based on the confidence interval of part a), can we be 95% confident that the mean output obtained with the new catalyst exceeds 450?

c) Construct a 99% confidence interval for the population variance.

7.43 The following data are a sample of the number of liters of gasoline purchased at a gas station.

40 51 43 48 44 57 54 39 42 48 45 39 43

Page 51: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

317 CHAPTER REVIEW PROBLEMS

a) Assuming this sample is taken from a normal population, construct a 95% confidence interval to estimate the mean value of the population.

b) What is the point estimate for the mean?c) Construct a 90% confidence interval for the

population variance.

7.44 Suppose the marketing department of the Marrakech Chamber of Commerce wants to esti-mate the average number of customers who enter Semmarine Souq every 10 minutes. Aliya, a research assistant, is tasked to select 10-minute intervals and count the number of arrivals at the mall during each interval. Assume that she obtains the following data.

68 42 51 57 66 90 55 39 42 88

a) The analyst assumes the number of arrivals is normally distributed. Compute a 90% con-fidence interval for the mean arrivals for all 10-minute intervals.

b) Construct a 95% confidence interval for the population variance.

7.45 During the last two weeks of Ramadan, a fashion department store in Lebanon ran a promotion campaign to boost its sales during the pre-Eid period. The results were impressive on the first few days. Arwa, the manager, wants to estimate the average amount customers spent during this two-week period. Assume that she randomly selects a sample of 24 bills, which yields the following customer spending:

321 546 449 540 125 987 519 350 764 467 582 762

420 328 557 865 408 326 547 910 373 502 842 586

a) Assume that the data are normally distributed. Construct a 98% confidence interval for the mean spending of all customers during this two-week period.

b) Construct a 98% confidence interval for the population variance.

7.46 Suppose that Cirta Engineering, a small water pump company, produced 2,000 water pumps in 1995. In an effort to promote the quality of its product, the company decided to conduct a multiyear study of its 1995 water pumps. A sample of 250 owners of these water pumps was selected randomly. The owners were asked to contact an 800 number when the first major re-pair was required for their water pump. After sev-eral years, 214 water pump owners had reported. The other 36 were disqualified because they no longer owned the 1995 water pump. The average number of years before the first major repair oc-curred was 6.2 years with a standard deviation of 1.53 years for the 214 owners who reported. If the company wants to advertise the average number of repair-free years of life expectancy for its water pumps, what is the point estimate? Construct a 90% confidence interval for the average number of years until the first major repair.

7.47 A sample of size 14 from a normally distributed population yields the following sample statistic:

a (X-X)2 = 135.8

a) Construct a 95% confidence interval for the population variance.

b) Construct a 95% confidence interval for the population standard deviation.

7.48 A sample of size six from a normally distributed population yields the following sample statistics:

aX2 = 326 and aX = 42

a) Find a 90% confidence interval for the variance.

b) Find a 90% confidence interval for the standard deviation.

7.49 Let be the weekly wage for workers in a large construction company. A random sample of such workers (with n 7 30) yielded a 95% confidence level for of $258 to $326 using a normal distribu-tion with a known population standard deviation.

a) What is the sample mean X?b) Construct a 99% confidence interval for

based on this sample.

7.50 Hamed selected a first sample of 25 observa-tions and obtained a 90% confidence interval of [148, 175]. Then, he selected another independ-ent sample of 38 observations and obtained, at the same confidence level, a confidence interval of [156, 181]. What is the probability that neither sample includes the population mean?

7.51 Last semester, the minimum and the maximum times to complete an exam in finance were 39 and 62 minutes respectively.

a) Based on this information, how large a sample would be needed to estimate this semester’s mean time to complete the finance exam with 95% confidence and an error of 3 minutes? Explain your assumption about �.

b) What would be your answer for a 99% confi-dence interval?

7.52 Suppose you want to estimate the average age of all Toyota Corolla cars still on the road in your city. You want to be 95% confident and you want your estimate to be accurate to within 1.5 years. The Corolla was first sold in your country 35 years ago and you believe that there are no cars older than 24 years on the road. How large should your sample be?

7.53 A survey conducted among 900 two-child families revealed that 75% of them own family-size cars.

a) Use this information to determine a 90% confidence interval for the true proportion of two-child families who own family-size cars.

b) Compute the largest margin of error that could occur when estimating this proportion.

Page 52: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

CHAPTER 7ESTIMATION AND CONFIDENCE INTERVALS

318

7.54 The following data represent the daily price index changes of real estate companies listed on the Amman stock market between January 2, 2010 and March 22, 2011.

–0.1 –0.9 6.8 15.0 –29.1 59.2 –18.5

30.6 –1.3 –39.1 –41.2 –57.4 –28 8.4

–29.5 2.3 13.6 14.6 –22.2 –47.5 –45.8

0.6 19.9 –60.5 –38.8 –26.4 26.6 –12.1

12.5 –15.1 –42.1 –17.9 4.7 –7.0 8.5

–32.4 –45.6 –10.2 –16.6 14.5 3.1 31.6

–16.5 14.7 48.0 –41.8 42.5 -2.5 26.9

10.1 –22.9 –2.2 –66.0 –17.3 5.2 –25.7

Source: www.ase.com.jo

Assume that the population is normally distrib-uted. Use this information to determine a 90% confi-dence interval for the true mean price index change.

7.55 Albustan Safety supplies electronic devices for alarm systems in Bahrain. Imagine that, as part of the company’s quality control efforts, it wishes to estimate the mean number of days a particular electronic device is used before repair is needed. Suppose that a pilot sample of 40 electronic devices indicates a sample standard deviation of 200 days. The company wishes its estimates to have a margin of error of no more than 50 days and the confidence level must be 95%.

a) Given this information, how many additional devices should be sampled?

b) The pilot study was initiated because of the costs involved in sampling. Each sampled

observation costs approximately $3.20 to ob-tain. Originally, it was thought that the popula-tion’s standard deviation might be as large as 300. Determine the amount of money saved by obtaining the pilot sample. (Hint: figure the total cost of obtaining the required sample for each method).

7.56 Suppose that Naftal, an Algerian petroleum com-pany, is considering building a gas station at a given intersection. The company would like to estimate the average number of cars that go past this location per hour in the afternoon. The com-pany thinks that the number of cars passing this intersection per hour has a population standard deviation of 90 during the afternoon.

a) On how many randomly selected afternoons should the number of cars passing the inter-section be observed so that the company can be 95% confident that the estimate will be within 50 cars of the true average?

b) Suppose the company finds out that the population standard deviation of the number of cars passing the location per hour is not 90 but 143. If the company has already taken the sample of size calculated in part a), what confidence can the company have that the point estimate is within 50 cars of the true average?

c) If the company has already taken the sample of size calculated in part a) and later finds out that the population standard deviation of the number of cars passing the intersection per hour is actually 112, the company can be 95% confident that the point estimate is within how much of the true average?

MINIPROJECTS

MINIPROJECT 7.1Consider the data file Profits of Top 500 Companies for 2006.

a) Randomly sample 20 companies and find the 95% confidence interval for , the mean profit. Assume that the population is normally distributed.

b) Repeat question a) for samples of 35 and 50 companies.

c) Compare the widths of your three confidence intervals.

d) Compute the mean of the population of all companies. Which confidence intervals contain the population mean ?

MINIPROJECT 7.2A manufacturer of mixed (roasted and salted) nuts claims that the proportion of cashews in a bag is 18%.

Any bag of mixed nuts contains six types of nuts: cashews, almonds, peanuts, pecans, Brazil nuts, and hazelnuts. To verify the claim, you buy a 2.5 kg bag with a couple of friends and perform the following experiment. You select 30 samples of 20 nuts with replacement and compute the number of cashews in each sample. (Note: avoid eating nuts before finish-ing sampling.) Use the sample proportion of all the samples to compute a 95% confidence interval for the proportion of cashews in all 2.5 kg bags of mixed nuts produced by this manufacturer.

a) What percentage of confidence intervals con-tain the population proportion 0.18?

b) Is this percentage close to the confidence level of 95%?

c) What happens if your sample size increases to 40 and then to 80?

Page 53: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

319 MINIPROJECTS

MINIPROJECT 7.3For this project, select a football league of your choice (in your country or a European one). Obtain data on the heights and ages of all players in a given year.

a) Take a random sample of 20 players and find a 95% confidence interval for , the popula-tion mean age. Assume that the ages of these players are normally distributed.

b) Redo part a) for samples of size 30 and 50 respectively.

c) Compare the widths of these three confi-dence intervals.

d) Compute the mean of the population of ages of all players. Which confidence intervals contain ?

MINIPROJECT 7.4Use the data you obtained for Miniproject 7.3. Take 15 random samples of 20 players.

a) Compute a 95% confidence interval for the proportion of players who scored five goals during the season for each sample.

b) Compute the population proportion � of all players who scored five goals during the season.

c) What percentage of confidence intervals con-tain the population proportion � computed in b)?

d) Is this percentage close to the confidence level of 95%?

MINIPROJECT 7.5Use the data file Brazilian Football Players. Select the variables height and age of players.

a) Take a random sample of 20 players and find a 95% confidence interval for , the popula-tion mean age. Assume that the ages of these players are normally distributed.

b) Redo part a) for samples of size 30 and 50 respectively.

c) Compare the widths of these three confi-dence intervals.

d) Compute the mean of the population of ages of all players. Which confidence intervals contain ?

MINIPROJECT 7.6 Use the data file Brazilian Football Players. Take 15 random samples of 20 players. Select the variable that indicates which foot a player prefers to use: left or right.

a) Compute a 95% confidence interval for the proportion of players who are left-footed for each sample.

b) Compute the population proportion � of all players who are left-footed.

c) What percentage of confidence intervals contain the population proportion � com-puted in b)?

d) Is this percentage close to the confidence level of 95%?

MINIPROJECT 7.7For this project use either the ADCB Trading file or the Kuwait Stock Market file. Define a random variable as (Close Price – Open Price) for ADCB and (Close Market Index – Open Market Index) for the Kuwait Stock Market. The difference will be positive if an in-crease occurs, zero if there is no change, or negative if a decrease occurs. Take a sample (using random numbers) of 35 values of the random variable.

a) Find a 95% confidence interval for .b) Find a 90% confidence interval for �.c) What is the number of observations that you

must collect if you want to be 95% confident that your error of estimation does not exceed 0.02 for ADCB or 17 for the Kuwaiti Stock Market. What assumption did you make?

d) Define a new random variable: increase or decrease (disregard no change); let � be the proportion of increases. Find a 90% confi-dence interval for this proportion.

MINIPROJECT 7.8For this project, you need to interview at least 50 customers doing their weekly shopping in a large su-permarket. To conduct these interviews, form a team of three or four students. Prepare a questionnaire about weekly spending on groceries that contains a list of questions (no more than six, so that your interview will not take more than three minutes). Among the ques-tions ask the following:

• Does this shopping correspond to your weekly budget for groceries?

• If yes, what is the amount of the budget.• What is the size of the family you are shopping

for?• What is the family income? (Do not insist if the

customer does not want to answer.)

The data collected may be used for future mini-projects; hence it is important that you execute this task well! Note that you may be required to get the approval of the supermarket manager.Once you have your survey results, work out the following:

a) Are the data collected a population or a sam-ple? How many observations did you collect?

b) Graph the data and discuss the shape of the distribution.

c) What is the mean?d) What is the standard deviation?e) Find a 90% and a 95% confidence interval for

the average spending and what assumption you are making.

f) Find a 90% and a 95% confidence interval for the variance and state what assumption you are making.

g) What is the number of observations that you must collect if you want to be 95% confident that your error of estimation does not exceed $15? What assumption did you make?

Page 54: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

841 ANSWERS TO SELECTED ODD-NUMBERED

PROBLEMS

5.15 a) .0945. b) .7147. c) .4766.

5.17 a) .9525. b) .7967. c) .4595.

5.19 366.

5.21 .0436.

5.23 a) .6814. b) .1497. c) .4835.

CHAPTER 66.1 16 samples.

6.3 = 89.34, � = 182.55.

6.5 a) X = 20, �2 = 0.36, �X = 0.6.

b) X = 70, �2X= 0.64, �X = 0.8.

c) X = 5, �X2 = 0.0225, �X = 0.15.

d) X = 45, �X2 = 0.0156, �X = 0.125.

6.7 Normal distribution (n Ú 30). a) X = 25, �X = 6/7. b) .0099. c) .0516.

6.9 a) 25. b) 100.

6.11 a) Yes. b) No. c) Yes.

6.13 a) .84. b) - .56 c) 1.13 d) -1,141

6.15 .0005.

6.17 .0382.

6.19 .0148.

CHAPTER 77.1 a) [10.49, 17.51]. b) [19.15, 24.85]. c) [51.06,60.94].

7.3 [14.80, 15.88].

7.5 [119.55, 136.45].

7.7 [45.96, 61.24].

7.9 [2091.73, 3352.61].

7.11 a) 15.987. b) 46.979. c) 7.261. d) 45.642.

7.13 a) Yes. b) Yes. c) Yes. d) Yes.

7.15 a) 99 CI = [16.25, 91.61]. b) 95 CI = [19.09, 70.40]. c) 90 CI = [20.80, 61.97]. d) As the level of confidence decreases,

the confidence interval increases.

7.17 [2804.59, 8902.40].

7.19 [0.6914, 1.6415].

7.21 a) 770. b) 606.

7.23 138.

7.25 17, Poisson.

7.27 Goal seek, .110.

CHAPTER 88.1 a) Reject in left tail. b) Reject in both tails. c) Reject in right tail.

8.3 a) Type II error. b) Type I error.

8.5 a) H0: … 8 H1: 7 8 (claim), right-tailed test. b) H0: … 5 (claim), H1: 7 5, right-tailed test. c) H0: Ú 400 H1: 6 400 (claim), left-tailed

test.

8.7 H0: … 25, H1: 7 25 (claim), t = 3.254, reject H0.

8.9 H0: Ú 4 (claim), H1: 6 4, t = 1.688, do not reject H0.

8.11 H0: … 3000 (claim), H1: 7 3000, t = - .791, do not reject H0.

8.13 a) -1.645. b) 1.282. c) {2.575.

8.15 z = 1.371, do not reject H0.

8.17 z = -1.28, do not reject H0

8.19 �2 = 36.1909.

8.21 a) 4.5748. b) 24.769. c) 6.5706 and 23.6848.

8.23 c) �2 = 31.043, do not reject.

8.25 b) �2 = 24.5, do not reject.

8.27 a) False. b) False.

8.29 � = 0.1251.

CHAPTER 99.1 a) Independent. b) Paired samples. c) Independent.

9.3 a) Two-tailed test. c) z = -2.92. e) p-value = .0036.

9.5 z = .69, do not reject H0.

9.7 t = -2.113, do not reject H0.

9.9 a) [11.85, 23.15]. b) [50.08, 61.72]. c) [25.66, 32.94].

9.11 [57.52, 330.48].

9.13 c) p-value = .0537 do not reject H0.

9.15 a) p-value = .407, do not reject H0.

9.17 [-0.1899, -0.1101]�1 is less than �2.

9.19 p-value = .163, do not reject H0.

9.21 a) 3.49. b) 2.61.

9.23 F = 1.8025, do not reject H0.

9.25 F = 6.84, do not reject H0.

Page 55: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

842 ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS

9.27 p-value 0.1188.

9.29 a) t = -8.297 is outside the interval [-2.724 and 2.724], reject H0. b) t = -2.64, do not reject H0.

CHAPTER 1010.1 �2 = 2.810, do not reject H0.

10.3 �2 = 8.26, do not reject H0.

10.5 �2 = 122, reject H0 Type I.

10.7 �2 = 3.519, do not reject H0.

10.9 �2 = 19.104, reject H0.

10.11 �2 = 1.573, do not reject H0.

10.13 �2 = 2.11, do not reject H0.

10.15 �2 = 12.76, do not reject H0.

10.17 �2 = 25.3, not independent.

10.19 �2 = 5.08, homogeneous.

10.21 �2 = 10542, reject H0. and Group 3.

CHAPTER 1111.1

Source of

variationSum of squares

Degrees of

freedomMean

squareF-

statistic

Between treatments

352 2 176 9.17

Within treatments

748 39 19.18

Total 1100 41

11.3 F = 3.592, do not reject H0.

11.5 F = 5.44, reject H0.

11.7 a) Reject H0. b) p-value = .0000. c) Means differ significantly. d)s.

11.9 b) F = .482, do not reject H0. c) p-value = .621. d) Reject H0.

11.11

Source of

variationSum of squares

Degrees of

freedomMean

squareF-

statistic

Between blocks

223.5 5 44.7 1.4

Between treatments

75.6 3 25.2 .79

Within treatments

478.3 15 31.89

Total 777.4 23

11.13

Source of

variationSum of squares

Degrees of

freedomMean

squareF-

statistic

Factor A 279.626 2 139.813 17.6309

Factor B 409.77 3 136.59 17.205

AB interaction 33.2 6 5.533 0.697

Within- treat-ments SSE

952.7 120 7.939

Total SST 1675.3 131

11.15 F(SSBL) = 6.056, F(SSB) = 5.444, reject H0

11.17 a) 60 observations b) 3 levels for A and 4 levels for factor B c) 5 replications

d) Reject H0

CHAPTER 1212.1 a) x = number of hours. b) y = 25 + 12x. c) $85.

12.3 yn = -3.514 + 0.264x.

12.4 yn = 3091.91 + 0.103x, where y is the food spending.

12.7 a) yn = 3.77 + .27x, where y is the number of injuries. b) The slope is not meaningful.

12.9 yn = -375.126 + 301, where y is the purchase amount.

12.11 a)

12.13 a) r = .875 b) t = 5.116, reject H0.

12.15 a) S = +1559.7. b) r2 = .404, 59.6% not explained.

12.17 a) r = - .996. b) t = -36.57, reject H0.

12.19 a) Yes. b) r = .636. c) t = 2.018, do not reject H0.

12.21 a) [0.257, 0.284]. b) value = .00009, reject H0. c) r2 = .995.

12.23 a) Yes. b) r = .938. c) t = 7.643, reject H0.

12.25 a) [107.6, 118.8]. b) [92.7, 133.7].

12.27 a) t = 59.8, reject H0. b) [84.8, 87.2]. c) [80.9, 91.1]. d) DW = 2.262, nega-

tive autocolleration.

CHAPTER 1313.1 a) Reject if T Ú 11. b) Reject if T … 5. c) Reject if T … 2 or T Ú 10.

13.3 Reject if T … 1 or T Ú 8. If T = 6, we do not reject H0.

Page 56: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

A2 APPENDICES

APPENDIX A Binomial Distribution A3

APPENDIX B Poisson Distribution A11

APPENDIX C Areas for the Standard Normal Distribution A15

APPENDIX D The t-Distribution A16

APPENDIX E The �2-Distribution A18

APPENDIX F The F-Distribution A20

APPENDIX G Critical Values of the Studentized Range Distribution A30

APPENDIX H Critical Values of Hartley’s Fmax Test A34

APPENDIX I Distribution Function of the Number of Runs A35

APPENDIX J Critical Values of T for the Wilcoxon

Signed-Rank Test A38

APPENDIX K Cumulative Distribution of the Mann-Whitney U-Statistic A39

APPENDIX L Critical Values of Spearman’s Rank Correlation Coefficient A47

APPENDIX M Matrix Approach to Multiple Regression A48

APPENDIX N Random Number Table A52

APPENDIX O Table of Factors for Control Limits A54

APPENDIX P Stepwise Regression A55

APPENDICES

Page 57: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

GLOSSARY

Adjusted coefficient of determination ������������ ����a modified value of the coefficient of determination that avoids inflating R2 when new independent vari-ables are included in the regression model.

Aggregate price index ������������������used to express the relative change from a base year for a set of at least two items.

Alternative hypothesis (H1) �������������a claim (or a statement) that will be true if the null hypothesis is false.

Analysis of variance (ANOVA) ������� �� a procedure by comparing simultaneously three or more means in a single test.

Approximation to the binomial distribution �!��"#�$%&����'�(���for large values of n, the approximation works well and in general offers a good estimate when both n� Ú 5 and n(1 - �) Ú 5.

Attribute ��)��is a piece of data that is counted.

Autocorrelation �*�+���,��*�-�It occurs when successive error terms, or residuals are correlated in regression analysis.

Backward elimination ������./�����%-�procedure in regression modeling where all variables are entered at once. Nonsignificant independent vari-ables are deleted one at a time from the model. The procedure ends when all independent variables left have significant t values.

Bar chart ��0������1�2��a graphical display of categorical data represented by bars on the horizontal and vertical axes.

Bayes’ theorem 3�����40a theorem used to compute posterior probabilities by revising prior probabilities.

Beta (�) ����probability of committing a type II error.

Binary variable ��5������6�7�a variable that is assigned a value equal to either one or zero, depending on whether the observation pos-sesses a given characteristic.

Binomial distribution �!��"#�$%&���the probability distribution of r successes in n independent trials.

Blocks a set of units having similar characteristics in terms of the blocking variables.

Box plot or box and whiskers plot 3��&8�"9�:�0�;a diagram that incorporates the quartiles Q1, Q3, the median, and the two extreme values to graphically display quantitative data.

chart ����<�;used to control the number of times a particular characteristic appears in a sampling unit.

Causal forecasting model ���)�����=����>#&?model that considers several variables that are related to the variable being predicted.

Cause and effect diagram (or fishbone diagram or Ishikawa diagram) �"�@�AB��<C��"9��@�)��� @�D��<C� /��./���"�'�)����<C�the diagram looks like a fish skeleton, with the problem being the head of the fish, major causes being the “ribs” of the fish, and subcauses forming smaller bones off the ribs.

Centered moving average E���7��F3G�7����&�7�average of two consecutive moving averages.

Central limit theorem �3G�7����H�I=�����40if samples of size n are drawn randomly from a population with mean μ and standard deviation �, the sample means X are approximately normally distributed with mean μ and standard deviation �/1n for sufficiently large samples (n Ú 30) regardless of the shape of the population distribution.

Chebyshev’s theorem J�A�������40regardless of how the data are distributed, at least (1 � 1/k2) of the values will fall within k standard deviations of the mean, where k is a number greater than 1.

Check sheet ���������H�5data gathering tools that can be used for problem identification in quality control.

Chi-square distribution F�G�$����$%&*a skewed continuous distribution whose shape depends on the number of degrees of freedom.

Chi-square goodness-of-fit test K�L&����1M&������;9

Chi-square test of homogeneity F�G�$��7�N0�O��������;9test of equality of proportions across several populations.

Chi-square test of independence F�G�$��7�PQ(��-������;9a test applied to analyze the frequencies of two variables hav-ing several characteristics to check whether the two variables are independent.

Classical approach to probability �@��Q@���P���8R�the probability of an event is equal to the number of out-comes where the event occurs divided by the total number of possible outcomes.

Cluster sampling �M&(=�����=��7�a method by which the population is divided into groups, or clusters that are considered as mini populations. A random sample of m clusters is selected and a sample is collected by randomly selecting from each cluster.

Coefficient of determination (r2) ����� ����proportion of variability of the dependent variable that is explained by the independent variable.

Coefficient of variation (CV) SQ�;-�� ����the ratio of the standard deviation to the mean expressed in percentage in a set of observations.

Combinations K�L�&���the possible selections of k items from a group of n items regardless of the order of selection.

Page 58: his BA and MA in Economics from the University of Negative Binomial Distribution 184 Mean and Variance of the Negative Binomial Distribution 186 Technology: Template for the Negative

G3 GLOSSARY

Complement �����of an event E denoted E (read as E bar) is the event that includes all the outcomes for an experiment that are not in E.

Conditional probability �T�A���P���8-�the probability that an event will occur given that another event has already occurred.

Confidence interval �(U���:���La range of values within which we can declare with some confidence the population parameter lies.

Confidence level �(U���M"8�"���(U���V&�)�a degree of certainty expressed in percentage that an interval would include the population parameter.

Consistency ��)0�O���a property of an estimator if its probability of being close to the parameter it estimates increases as the sample size increases.

Consumer’s risk W�I�)7��1�T�C�probability that a nonconforming product will be available for sale.

Contingency table X���5-��P"�a cross-tabulation of frequencies into rows and columns.

Continuous random variable Y�7���H�&A������6�7�a variable that can take on any value in an interval of numbers.

Continuous variables Y�7����6�7�result from measuring (weight, length, etc.) and as-sume all values between any two specific values.

Control chart Z@������T��;a graphical display of measurements (generally means of many samples of measurements) over time through repeated observation.

Control chart (in forecasting) ��=����Z@������T��;it sets lower and upper limits for individual forecast errors using multiples of the square root of MSE.

Correction factor or continuity correction [��Y���� ����a correction made when a binomial distribution (discrete variable) problem is approximated by the normal distribution (continuous variable).

Correlation analysis ,��*�-�� �� the process of determining a measure of the strength of the linear relationship of the variables.

Correlation coefficient ,��*�-�� ����It measures the strength of the linear relationship that exists within a sample of n pairs of data.

Critical value ���!�����(��it separates the region where the null hypothesis is rejected from the rest of the distribution.

Critical value approach ���!�����(��a method of testing hypothesis in which the sample statistic is compared to a critical value in order to reach a conclusion about rejecting or failing to reject the null hypothesis.

Cumulative frequency ��G���������@���the sum of all frequencies up to and including a given value (or class or category).

Cutoff point $T�(�����<(0the separation between rejecting and not rejecting the null hypothesis.

Cyclical Component ��H����:��G�7�it describes patterns in the data that occur every several years. They are related to business cycles.

Data set :�0�������2&�O�a collection of observations on one or more variables.

Degrees of freedom (df) ��!��:���Mit is equal to the number of observations of a sample (n) minus the number of parameters being estimated.

Delphi method F���M��(�Tthis method allows each member to benefit from the experi-ence and knowledge of other members by neither meeting face-to-face nor knowing the other members’ identity; and personality conflicts are ignored.

Dependent variable $��������6�7�a variable being predicted in regression analysis.

Dependent samples ���������=����Two samples drawn from two populations where the selection of one sample from one population does have an influence on the selection of the second sample from the second population.

Descriptive Statistics ��\&���]�Y8-�methods of organizing, summarizing, and presenting data in an informative way by using tables, graphs and summary measures.

Deseasonalizing (the data) Z�&7���.9����^�C���process of removing seasonal effects from the actual data.

Discrete probability distribution $<(�7�������8-��$%&���a listing of all outcomes of an experiment and the probability associated with each outcome.

Discrete variables $<(�7����6�7�result from counting and can be assigned values such as 0, 1, 2, 3, and so on.

Discrete random variable $<(�7���H�&A������6�7�a random variable that can take on only integer values.

Double Exponential Smoothing >"M37�������N0�O���a smoothing model that incorporates a second smoothing constant to account for the trend in a time series.

Durbin-Watson Test X&)*�"�_��M�����;�a statistical test for determining whether significant correla-tion is present when the regression analysis uses a sample of a time series data.

Efficiency 1]��@��a property of an estimator if it has a relatively small variance.

Empirical approach ���O����`I=��it is based on defining probabilities from statistical data collected from historical occurrences.

Error of estimation _�C����/<;the difference between the sample mean X and the population mean μ.