Basics of Stats

download Basics of Stats

of 47

Transcript of Basics of Stats

  • 8/4/2019 Basics of Stats

    1/47

    Business Statistics

  • 8/4/2019 Basics of Stats

    2/47

    Definition

    Statistics is a standard method for

    collecting, organizing, summarizing,presenting, and analyzing and interpretingdata for drawing conclusions and makingdecisions based upon the analyses of these

    data. Statistics are used extensively by

    engineers, managers, govt, businessmen,etc throughout the world.

  • 8/4/2019 Basics of Stats

    3/47

    Collection of data

    Types of data Secondary data

    Whether data are suitable?

    Whether data are adequate?

    Whether data are reliable?

    Primary data Questioning

    observation

  • 8/4/2019 Basics of Stats

    4/47

    Presentation of data Classification

    Geographical C

    hronological Quantitative Qualitative

    Frequency distribution Classification according to class interval

    Class limits Exclusive method Inclusive method

    Class intervals class frequency

  • 8/4/2019 Basics of Stats

    5/47

    Tabulation of data Parts of table

    Charting of data Bar Diagrams

    Pie diagrams

    Line graphs

    Histograms

    Frequency polygon

  • 8/4/2019 Basics of Stats

    6/47

    Functions of Statistics Presents facts in a definite form

    Simplifies mass of figures

    Facilitates comparison

    Helps in formulating and testinghypothesis

    Helps in prediction Helps in the formulation of suitable

    policies.

  • 8/4/2019 Basics of Stats

    7/47

    Populations and SamplesA population is a complete set of all of

    the possible instances of a particularobject

    for example, students in this College.

    A sample is a subset of the population

    for example, any one of the classes.

    We use samples to draw conclusionsabout the parent population.

  • 8/4/2019 Basics of Stats

    8/47

    Measures ofCentral Tendency If you have to declare a single value to

    represent a population or a sample, what do

    you use? The most common value is the mean, also

    called the average or the expected value.

    Another common value is the mode or themost likely (most common) value.

    Another value is the median or the middleof the data set.

  • 8/4/2019 Basics of Stats

    9/47

    Measures ofCentral Tendency

    (ungrouped) Mean

    This is the mathematical average of a set of numbers

    Median This is the middle value of a set of data that has been

    arranged from lowest to highest

    Mode

    The value that occurs the most in a set of data

    We can use expenditure as a good way of discussing thesethree measures. If we wanted to know the averageexpenditure of NIFT students. Lets take random samples of monthly expenditure of NIFT students.

  • 8/4/2019 Basics of Stats

    10/47

    What is the Mean? The mean is the sum of all of the

    values in the data set divided by thenumber of values.

    The equation for calculating the mean isthe same for both samples and

    populations.

    !n

    xx

    Mean

  • 8/4/2019 Basics of Stats

    11/47

    Sample Mean

    Where:

    X-bar is the mean

    xi are the data points

    n is the sample size

    !

    !n

    i

    ix

    n

    x

    1

    1

  • 8/4/2019 Basics of Stats

    12/47

    Population Mean

    Where:

    is the population mean

    xi are the data points

    N is the total number of observations in the

    population

    !

    !N

    i

    ixN

    1

    1Q

  • 8/4/2019 Basics of Stats

    13/47

    Measures ofCentral Tendency

    (ungrouped) The sample gives

    these values:

    5000, 6000, 30000,110000, 15000,6000, 17000, 13000,12000, 11000,

    8000, 6000, 15000,6000, 11500

    The Mean

    This is the

    average. Sum of values =

    271500

    Total N = 15 Mean = 18100

  • 8/4/2019 Basics of Stats

    14/47

    What is the Median? If the data has been sorted (ascending or

    descending), the median is the middle

    value (for an odd number of points) or theaverage of the two middle values (for aneven number of points).

    median is used to characterize data sets

    with a few extreme values that distort therelevance of the mean, such as housevalues or family incomes.

    Median = th item in the data array( )n + 1

    2

  • 8/4/2019 Basics of Stats

    15/47

    Measures ofCentral Tendency

    (ungrouped) The sample gives

    these values:

    5000, 6000, 30000,110000, 15000,6000, 17000, 13000,

    12000, 11000,8000, 6000, 15000,6000, 11500

    The Median

    This is the middlevalues:

    5000, 6000, 6000,6000, 6000, 8000,11000, 11500, 12000,13000, 15000, 15000,

    17000, 30000, 110000 The median here is

    11500

    In cases where there

    are two middle values,we avera e the two.

  • 8/4/2019 Basics of Stats

    16/47

    What is the Mode? If the data is discrete, or has been grouped

    into discrete intervals, the mode is that value

    that occurs the most often. In other words it is the value most likely to

    occur.

  • 8/4/2019 Basics of Stats

    17/47

    Measures ofCentral Tendency

    (ungrouped) The sample gives

    these values:

    5000, 6000, 30000,110000, 15000,6000, 17000, 13000,12000, 11000,8000, 6000, 15000,6000, 11500

    The Mode

    This is the mostnumerous value:

    5000, 6000, 6000,6000, 6000, 8000,11000, 11500, 12000,13000, 15000, 15000,

    17000, 30000, 110000 The Mode here is 6000.

    Sometimes there is nomodeor even two

    modes!

  • 8/4/2019 Basics of Stats

    18/47

    Measures ofCentral Tendency

    (ungrouped) So given these

    values

    5000, 6000, 6000,6000, 6000, 8000,11000, 11500,12000, 13000,15000, 15000,17000, 30000,

    110000

    what is the bestmeasure of central

    tendency for thisrandom sample ofNIFT students?

    Mean?...18100

    Median?...11500

    Mode?...6000

  • 8/4/2019 Basics of Stats

    19/47

    What Is the Range? range: the distance between the

    lowest and the highest values in theset.

    For example, the time to drive toChurchgate is 2-hours plus or minus 15

    minutes. Or, 105 to 135 minutes. Thusthe range is 30 minutes.

  • 8/4/2019 Basics of Stats

    20/47

    Measures ofDispersion or Spread

    (ungrouped) Range

    The highest value minus the lowest value.

    From our last example, the range would be:110000 5000 = 105000

  • 8/4/2019 Basics of Stats

    21/47

    What is the Variance? The Variance of a population is the sum of

    the squares of the differences between the

    mean and the individual data points dividedby the number of data points.

    The Variance of a sample is the sum of thesquared differences divided by the number of

    data points less one.

  • 8/4/2019 Basics of Stats

    22/47

    What is the Standard

    Deviation? Standard Deviation

    This is the average distance yourvalues have from the meanscore.

    The Standard Deviation is the squareroot of the variance

  • 8/4/2019 Basics of Stats

    23/47

    Computing Standard Deviation Population

    Sample "s"

    2

    1

    )(1 QW !

    !N

    i

    ix

    N

    2

    1

    )()1(

    1xx

    ns

    n

    i

    i!

    !

    It is important that you

    recognize the difference

    between these two

    equations!

    The expression under

    the square root sign isthe variance

  • 8/4/2019 Basics of Stats

    24/47

    Measures ofDispersion or Spread

    (ungrouped)Standard Deviation

    Lets return to our NIFTrandom sample

    5000, 6000, 6000, 6000, 6000,8000, 11000, 11500, 12000,13000, 15000, 15000, 17000,30000, 110000

    Follow the steps on the rightwhile we calculate the standarddeviation as a class on theboard

    1. Calculate themeanwhich is 18100

    2.

    Find the distance thateach value has from themean

    3. Square the distance

    4. Add up these distancesand divide by thesample size 1

    5. Then we get the squareroot of this number

  • 8/4/2019 Basics of Stats

    25/47

    Standard DeviationX Mean (x-bar) X x-bar (X x-bar)2

    5000 18100 -13100 17161 + E4

    6000 18100 -12100 14641 + E4

    6000 18100 -12100 14641 + E4

    6000 18100 -12100 14641 + E4

    6000

    18100 -

    12100

    14641 + E48000 18100 -10100 10201 + E4

    11000 18100 -7100 5041 + E4

    11500 18100 -6600 4356 + E4

    12000 18100 -6100 3721 + E4

    13000 18100 -5100 2601 + E4

    15000 18100 -3100 961 + E4

    15000 18100 -3100 961 + E4

    17000 18100 -1100 121 + E4

    30000 18100 11900 14161 + E4

    110000 18100 91900 844561 + E4

  • 8/4/2019 Basics of Stats

    26/47

    Standard Deviation We sum (x x-bar)2, and get the square root

    of this sum. This is the standard deviation.What is the square root of the sum?

    Appx. 26,219

  • 8/4/2019 Basics of Stats

    27/47

    The Subtle Difference

    Between S and The difference in the divisors (N versus n-

    1) results in S being slightly larger than

    .

    This is to account for the fact that S(from a sample) is an estimate of the

    (of a population) and this adds a degreeof error to the value.

    Note: for large n the difference is trivial.

  • 8/4/2019 Basics of Stats

    28/47

    A Valuable Tool The standard deviation is a rather

    recent invention and was originally

    devised by Gauss to explain the errorobserved in measured star positions.

    Today it is used in everything from

    Quality Control to Measuring Risk infinancial investments.

  • 8/4/2019 Basics of Stats

    29/47

    Measures ofCentral Tendency and Dispersion

    (Grouped Data)

    Remember that grouped data is a collectionof data that has been placed into categories

    Thus we need to calculate the mean andstandard deviation differently, but the idea is

    the same.

  • 8/4/2019 Basics of Stats

    30/47

    A.M for Grouped DataThe following are thefrequency distribution

    of 500 workersaccording to theirweekly income (in Rs.)Find the average

    income.

    Income Persons

    0 50 90

    50 100 150

    100 150 100

    150 200 80200 250 70

    250 - 300 10

  • 8/4/2019 Basics of Stats

    31/47

    A.M for Grouped Data

    Income Persons Mid values Deviations

    fx d

    0 50 90 25 - 2 -180

    50 100 150 75 - 1 -150

    100 150 10 125 0 0

    150 200 80 175 1 80200 250 70 225 2 140

    250 - 300 10 275 3 30

    Total 500 -80

  • 8/4/2019 Basics of Stats

    32/47

    A.M for Grouped Data

    117.50

    500

    80125

    1

    1

    Rsx

    hx

    f

    dXf

    Ax n

    i

    i

    n

    i

    ii

    !

    !

    !

    !

    !

  • 8/4/2019 Basics of Stats

    33/47

    Advantages /Disadvantages of theArithmetic Mean

    Advantages:

    1) Familiar and intuitively clear to mostpeople

    2) Every data set has one and only one mean

    3) Useful for performing statisticalprocedures

    Disadvantages:

    1) May be affected by extreme values

    2) Tedious to compute

    3) Difficult to compute for data set withopen- ended classes

  • 8/4/2019 Basics of Stats

    34/47

    Computation ofMean, Median, and Mode for grouped Data

    Age in

    (Yrs)

    id

    Value(x)d=(X-A)/h

    No. of

    Pts(f).fxd

    Cummulative

    Frequency

    10 - 20 15 -3 5 -15 520 - 30 25 -2 19 -38 24

    30 - 40 35 -1 26 -26 50

    40 - 50 45 0 35 0 85

    50 - 60 55 1 15 15 100

    60 - 70 65 2 3 6 103Total 103 -58

    Arithmetic ean = 45 + (-58)/103 X10 = 39.4

  • 8/4/2019 Basics of Stats

    35/47

    Computation ofMean, Median, and Mode for grouped Data

    Median = LC F

    FXh

    ( / . .)2

    where L is lower limit of Median Class; N is total FrequencC.F. id cumulative frequency of class preceding median class, F is frequency of median clasand h is class width.N/2 = 103/2 = 51.5 This value lies in the class interval 40-50 (This value is seen from thecumulative frequency column). Hence L=40

    Median = 40+ (51.2 -50)/ 35 x10 = 40.34

  • 8/4/2019 Basics of Stats

    36/47

  • 8/4/2019 Basics of Stats

    37/47

    Comparing the Mean, Median,

    and Mode

    Mean Mode

    Median

    Mode Mean

    Median

  • 8/4/2019 Basics of Stats

    38/47

    Summary ofCentral Tendency Measures

    Measure Equation Description

    Mean 7x /n Balance Point

    Median (n+1) th item in

    array2

    Middle value in

    ordered array

    Mode none Most frequent

  • 8/4/2019 Basics of Stats

    39/47

    Standard Deviation (Grouped data)

    hf

    df

    f

    dfDS v

    v

    v

    !

    22

    .

    Where f is frequency; d is deviation computed

    as

    di=

    x

    h

    i

  • 8/4/2019 Basics of Stats

    40/47

    SD for Grouped DataThe following dataprovides the chest

    measurement in Cms.Of 50 MBBS students.

    Find Mean and SD

    Chest

    Measurement

    (Cms)

    No. of

    Students

    61 70 2

    71 80 10

    81 90 20

    91 100 17

    101 - 110 1

  • 8/4/2019 Basics of Stats

    41/47

    S.D for Grouped DataC I Mid

    Values(x)Fr.(f)

    D= f x d f X d2

    61 71

    65.5 2 -2 -4 871 -80 75.5 10 -1 -10 10

    81 90 85.5 20 0 0 0

    91 100 95.5 17 1 17 17100 110 105.5 1 2 2 4

    Total 5 39

    h

    ax

  • 8/4/2019 Basics of Stats

    42/47

    S.D for Grouped Data

    86.8

    5.86

    2

    1

    1

    1

    1

    2

    1

    1

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    hx

    f

    xdf

    f

    xdf

    hx

    f

    xdf

    Ax

    n

    i

    i

    n

    iii

    n

    i

    i

    n

    iii

    n

    i

    i

    n

    i

    ii

    W

  • 8/4/2019 Basics of Stats

    43/47

    Uses of Standard Deviation

    Aside from measure of dispersion...

    Determines where values offrequency distribution are in relationto mean (standard scores)

    Measures percentage of items within

    specific ranges Chebyshevs Theorem

    Normal distribution

  • 8/4/2019 Basics of Stats

    44/47

    Coefficient of Variation

    1.Measure of relative dispersion

    2.Always a % 3.Shows variation relative to mean

    4.Used to compare 2 or more groups

    Sample PopulationCV

    s

    x! (100) CV ! (100)Q

    W_

  • 8/4/2019 Basics of Stats

    45/47

    Coefficient of Variation

    ExampleWhich technician shows more variability?

    Qa!40

    Wa!5Qb!160

    Wb!15

  • 8/4/2019 Basics of Stats

    46/47

    Q

    Solution

    CV ! (100)W

    Technician B5

    40(100)=

    = 12.5%

    Technician A

    15

    160(100)=

    = 9.4%

  • 8/4/2019 Basics of Stats

    47/47

    Summary ofVariation Measures

    Measure Equation Description

    Range xlargest - xsmallest Total Spread

    Interquartile RangeQ

    3- Q

    1Spread ofMiddle 50%

    Standard Deviation

    (Sample)x

    n

    2

    1

    Dispersion aboutSample Mean

    Standard Deviation

    (Population)

    x

    N

    Q2 Dispersion about

    Population Mean

    Variance

    (Sample)7(x )

    2

    n 1

    Squared Dispersionabout Sample Mean

    Coeff. of Variation s / (100) Relative Variation

    x_

    x_

    x_