Important Terminologies In Statistical Inference I I

14
Lessons in Business Statistics “Quantitative Analysis”
  • date post

    20-Oct-2014
  • Category

    Technology

  • view

    2.683
  • download

    2

description

 

Transcript of Important Terminologies In Statistical Inference I I

Page 1: Important Terminologies In  Statistical  Inference  I I

Lessons in Business Statistics “Quantitative Analysis”

Page 2: Important Terminologies In  Statistical  Inference  I I

Chapter 2: Classifying Data toConvey Meaning

Page 3: Important Terminologies In  Statistical  Inference  I I

Introduction

When managers are bewildered by plethora ofWhen managers are bewildered by plethora ofdata, which do not make any sense on the surfacedata, which do not make any sense on the surfaceof it, they are looking for methods to classify dataof it, they are looking for methods to classify datathat would convey meaning. The idea here is tothat would convey meaning. The idea here is tohelp them draw the right conclusion. This chapterhelp them draw the right conclusion. This chapterprovides theprovides the nittynitty--gritty of arranging data intogritty of arranging data intoinformationinformation..

Page 4: Important Terminologies In  Statistical  Inference  I I

1) Meaning and Example ofRaw Data

Meaning of RawMeaning of RawData:Data:Raw Data represent numbersRaw Data represent numbersand facts in the originaland facts in the originalformat in which the data haveformat in which the data havebeen Collected. You need tobeen Collected. You need toconvert the raw data intoconvert the raw data intoinformation for managerialinformation for managerialdecision Making.decision Making.

Example of Raw Data:Example of Raw Data:Assume that you know the weeklyAssume that you know the weeklysales of a product in a region over thesales of a product in a region over thepast year are: (Figures in '000' units)past year are: (Figures in '000' units)52 61 59 55 63 70 59 77 8152 61 59 55 63 70 59 77 8183 69 91 73 83 90 81 77 7783 69 91 73 83 90 81 77 7774 65 56 77 64 49 60 52 5074 65 56 77 64 49 60 52 5045 42 46 39 29 38 41 43 2345 42 46 39 29 38 41 43 2326 27 22 29 31 29 31 30 3026 27 22 29 31 29 31 30 3029 40 44 45 46 52 5329 40 44 45 46 52 53

Suppose you present this set of dataSuppose you present this set of dataas it is to the General Manageras it is to the General Manager(Sales). At best it will be boring to(Sales). At best it will be boring tohim.him.

Page 5: Important Terminologies In  Statistical  Inference  I I

Information is KeyLarge and massive raw data tend to bewilder you so muchLarge and massive raw data tend to bewilder you so muchthat the overall patterns are obscured. You cannot see thethat the overall patterns are obscured. You cannot see thewood for the trees. This implies that the raw data must bewood for the trees. This implies that the raw data must beprocessed to give you useful information.processed to give you useful information.

Raw Data InformationProcess

Page 6: Important Terminologies In  Statistical  Inference  I I

2) Frequency DistributionIn simple terms,In simple terms, frequency distributionfrequency distribution is a summarizedis a summarizedtable in which raw data are arranged into classes andtable in which raw data are arranged into classes andfrequencies. Classes represent categories or groupings, whichfrequencies. Classes represent categories or groupings, whichcontain a lower limit and an upper limit. Classes are formedcontain a lower limit and an upper limit. Classes are formedconveniently following certain guidelines. Against each class,conveniently following certain guidelines. Against each class,you count and then place the number of observations that fallyou count and then place the number of observations that fallinto it. When you do it for all classes in a given data analysiinto it. When you do it for all classes in a given data analysissproblem, it becomes a frequency distribution.problem, it becomes a frequency distribution.

Frequency distribution focuses on classifying raw data intoFrequency distribution focuses on classifying raw data intoinformation. It is the most widely used data reductioninformation. It is the most widely used data reductiontechnique in descriptive statistics. When you are looking fortechnique in descriptive statistics. When you are looking forpattern that would help you understand the characteristic youpattern that would help you understand the characteristic youmeasure in a problem situation, frequency distribution comesmeasure in a problem situation, frequency distribution comesto your rescue.to your rescue.

Page 7: Important Terminologies In  Statistical  Inference  I I

Guidelines for Constructing a FrequencyDistribution Table

1)1) Identify the Minimum ValueIdentify the Minimum Value(Min) and Maximum Value (Max)(Min) and Maximum Value (Max)in the given Data Set. Calculatein the given Data Set. CalculateRangeRange = Max= Max--MinMin

2)2) Decide on theDecide on the Number of ClassesNumber of Classesyou would like to have. Theyou would like to have. Thenumber of classes can benumber of classes can bedetermined as the square root ofdetermined as the square root ofthe number of observations in thethe number of observations in thedata set.. Also for any problem itdata set.. Also for any problem itis recommended that you have notis recommended that you have notless than 5 classes and not moreless than 5 classes and not morethan 15 classes.than 15 classes.

3)3) Determine theDetermine the WidthWidth of theof theClass Interval =Class Interval =Range/ Number of ClassesRange/ Number of Classes

4)4) Formulate the Boundaries ofFormulate the Boundaries ofthe Classesthe Classes in such a mannerin such a mannerthat it will include all thethat it will include all theobservations in the data set.observations in the data set.Avoid overlapping of classes.Avoid overlapping of classes.Once class boundary for eachOnce class boundary for eachclass is ready, all you need toclass is ready, all you need todo is to tally the number ofdo is to tally the number ofobservations in each class.observations in each class.

Page 8: Important Terminologies In  Statistical  Inference  I I

Histogram (also known as frequencyhistogram) is a snap shot photograph ofthe frequency distribution. Histogram is agraphical representation of the frequencydistribution in which the X-axisrepresents the classes and the Y-axisrepresents the frequencies. Rectangularbars are constructed at the boundaries ofeach class with heights proportional tothe frequency.

3) HISTOGRAM

Histogram depicts the pattern of the distribution emerging from thecharacteristic being measured. If the pattern is symmetrical and bell shaped,then it reflects the normal distribution curve. In the quality control parlance,the system is stable; only chance causes are present and the assignablecauses are absent.

Page 9: Important Terminologies In  Statistical  Inference  I I

Role of Histogram in Practice

Page 10: Important Terminologies In  Statistical  Inference  I I

Histogram- Example

The inspection records of a hose assembly operation revealed a hThe inspection records of a hose assembly operation revealed a high leveligh levelof rejection. An analysis of the records showed that the "leaks"of rejection. An analysis of the records showed that the "leaks" were awere amajor contributing factor to the problem. It was decided to invemajor contributing factor to the problem. It was decided to investigate thestigate thehose clamping operation. The hose clamping force (torque) was mehose clamping operation. The hose clamping force (torque) was measuredasuredon twenty five assemblies. (Figures in footon twenty five assemblies. (Figures in foot--pounds). The data are givenpounds). The data are givenbelow: Draw the frequency histogram and comment.below: Draw the frequency histogram and comment.

88 1313 1515 1010 16161111 1414 1111 1414 20201515 1616 1212 1515 13131212 1313 1616 1717 17171414 1414 1414 1818 1515

Page 11: Important Terminologies In  Statistical  Inference  I I

Histogram Example Solution

You will notice that theYou will notice that the RangeRange is 20is 20--88=12.=12. You take theYou take the number of classesnumber of classes asas5(Note that the square root of the number5(Note that the square root of the numberof observations isof observations is 25 = 5). The25 = 5). The widthwidth ofofthe class is Range/Number of classes =the class is Range/Number of classes =12/5 =2.4. Round it to 3. You can now12/5 =2.4. Round it to 3. You can nowform the boundaries of the classesform the boundaries of the classesstarting with 8 and then incrementing bystarting with 8 and then incrementing by3 successively the lower limit of each3 successively the lower limit of eachclass until all the classes are formed.class until all the classes are formed.Tally the number of observations underTally the number of observations undereach class. This would give you theeach class. This would give you thefollowing table of frequency distribution.following table of frequency distribution.

ClassClass FrequencyFrequency88--1111 22

1111--1414 771414--1717 12121717--2020 332020--2323 11

Looking at the histogram, it is easy for you toLooking at the histogram, it is easy for you tosee that the pattern does not show a bell shapesee that the pattern does not show a bell shapecurve. The bars adjacent to the class 14curve. The bars adjacent to the class 14--1717cause some distortion to normality. It is alsocause some distortion to normality. It is alsoevident that the average is in the range 14 to 17.evident that the average is in the range 14 to 17.Corrective action is needed. However, beforeCorrective action is needed. However, beforetaking any action, you must be cautious abouttaking any action, you must be cautious aboutthe fact that the sample size here is only 25the fact that the sample size here is only 25observations. Take more measurements andobservations. Take more measurements anddraw the histogram again before takingdraw the histogram again before takingcorrective steps.corrective steps.

Histogram for the Example

2

7

12

31

0

5

10

15

8-11 11-14 14-17 17-20 20-23

Classes

Fre

qu

en

cy

Page 12: Important Terminologies In  Statistical  Inference  I I

Microsoft Excel and Histogram

The Microsoft Excel Chart Wizard allows you to create a varietyThe Microsoft Excel Chart Wizard allows you to create a variety of chartsof chartsfor numerical as well as categorical data. The histogram picturefor numerical as well as categorical data. The histogram pictured in thed in theprevious slide is an output from Chart Wizard.previous slide is an output from Chart Wizard.

Also there is a powerful utility as addAlso there is a powerful utility as add--in supplied by Microsoft Excelin supplied by Microsoft Excelcalled "Data Analysis" in the Tools Menu. This has a variety ofcalled "Data Analysis" in the Tools Menu. This has a variety of analysisanalysistools, which includetools, which include HistogramHistogram,, Cumulative DistributionCumulative Distribution,, FrequencyFrequencyDistribution,Distribution, Descriptive Statistics,Descriptive Statistics, ParetoPareto--ChartChart and many others.and many others.Please get familiarized with these in Excel at the earliest so tPlease get familiarized with these in Excel at the earliest so that you couldhat you couldfunction as a manager taking information based decisions. The pofunction as a manager taking information based decisions. The power ofwer ofExcel spread sheet software is amazing.Excel spread sheet software is amazing.

Page 13: Important Terminologies In  Statistical  Inference  I I

4) Cumulative Frequency Distribution

A type of frequency distribution that shows how manyA type of frequency distribution that shows how manyobservations are above or below the lower boundariesobservations are above or below the lower boundariesof the classes. You can formulate the following fromof the classes. You can formulate the following fromthe previous example of hose clamping force(torque)the previous example of hose clamping force(torque)

0.080.080.360.360.840.840.960.961.001.00

2299212124242525

0.080.080.280.280.480.480.120.120.040.04

2277

12123311

88--11111111--14141414--17171717--20202020--2323

1.001.002525TotalTotal

CumulativeCumulativeRelativeRelativeFrequencyFrequency

CumulativeCumulativeFrequencyFrequency

RelativeRelativeFrequencyFrequency

FrequencyFrequencyClassClass

Page 14: Important Terminologies In  Statistical  Inference  I I

Ogive Curve

TheThe OgiveOgive curve is a graphicalcurve is a graphicalrepresentation of the cumulative frequencyrepresentation of the cumulative frequencydistribution using numbers or percentages.distribution using numbers or percentages.In this pictorial representation, less thanIn this pictorial representation, less thanvalues are in the Xvalues are in the X--axis and cumulativeaxis and cumulativefrequency in numbers or percentages arefrequency in numbers or percentages arein the Yin the Y--axis. A line graph in the form of aaxis. A line graph in the form of acurve is plotted connecting the cumulativecurve is plotted connecting the cumulativefrequencies corresponding to the upperfrequencies corresponding to the upperboundaries of the classes. Today, thisboundaries of the classes. Today, thisogiveogive graph is elegantly and efficientlygraph is elegantly and efficientlyobtained as output from Chart Wizard orobtained as output from Chart Wizard orData Analysis in the Toolbox of MicrosoftData Analysis in the Toolbox of MicrosoftExcel. TheExcel. The OgiveOgive graph for the presentgraph for the presenttorque example obtained from Microsofttorque example obtained from MicrosoftExcel is given in the adjacent box:Excel is given in the adjacent box:

Cumulative Distribution(Ogive Curve) forthe Example

29

21 24 25

0

10

2030

11 14 17 20 23

Torque(less than value)

CumulativeFrequency