Business Statistics L1

download Business Statistics L1

of 34

Transcript of Business Statistics L1

  • 7/27/2019 Business Statistics L1

    1/34

    Business Statistics

    Fall, 2013Introduction

  • 7/27/2019 Business Statistics L1

    2/34

    Lecture outline

    What is statistics?

    Summarizing the distribution of data

    - capturing the central tendency- spread

  • 7/27/2019 Business Statistics L1

    3/34

    The word statistics originally meant the

    collection of information about and for the

    state.

    It is now a scientific method of collecting and

    analyzing data (making sense of

    numerical/quantitative information) to assist

    in making more effective decisions.

  • 7/27/2019 Business Statistics L1

    4/34

    In statistics, we deal with uncertainty. We dont dealwith What is but of What probably is.

    But what do we mean by it is probably that?

    Language alone is inadequate to illustrate the degreeof uncertainty, we need more formal structure for thispurpose.

    The language of probability will be the focus of the firstpart of this course.

  • 7/27/2019 Business Statistics L1

    5/34

    Also, in statistics, we deal with samples. We

    make statements about a population based on

    the results of a sample. This is the focus of the

    second part of this course.

    Beware, some uncertainty will always remain.

  • 7/27/2019 Business Statistics L1

    6/34

    Then, in future econometrics courses, you will

    learn to use statistical tools to

    - analyze relationships of variables in the

    economics context

    - to do forecasting

  • 7/27/2019 Business Statistics L1

    7/34

    Caveat:

    - Statistics provide useful tools for manages to help them

    in decision making.

    - However, these tools are not intended as substitutes forthe familiarity with the business environment that

    develops through years of study and accumulated

    experience.

    - It is in alliance with other relevant expertise in thebusiness environment that statistical methods have

    proved most valuable as management tools.

  • 7/27/2019 Business Statistics L1

    8/34

    Statistics are everywhere. Wherever they are

    used, those who use them use them to speak

    authoritatively.

    Quite important to use the right statistic for

    the job!

  • 7/27/2019 Business Statistics L1

    9/34

    Data point, data point, data point,

    Distribution of the data points

    Characterize the shape of the distribution

    - the center, usually the mean

    - the spread, the variance

    - the lopsidedness, the skewness

    - (the "peakedness, the kurtosis)

  • 7/27/2019 Business Statistics L1

    10/34

    Capturing the central tendency

    After every exam, you will receive your own

    score, and I will give you the average score of

    the class, why do I assume that you are

    interested in that average score?

  • 7/27/2019 Business Statistics L1

    11/34

    Capturing the central tendency

    Now suppose I ask you to poll your classmates about their opinions on

    making the market economy the core of any countrys development

    process. On a scale of 1 to 5, with 1 being strongly in favor of it and 5

    being strongly against it.

    It turns out that half of the class answered 1 and the other half of the class

    answered 5. When I ask you to tell me the result, which is to summarize

    the class opinion for me, would you add 1 and 5 and then divide theanswer by 2? Thats how you calculate the average. If you do so, you will

    get 3, which indicates indifference. Would you report to me that the

    general opinion in the class on this point is actually quite indifferent?

    1 2 3 4 5

    strongly agree agree indifferent disagree strongly disagree

  • 7/27/2019 Business Statistics L1

    12/34

    Capturing the central tendency

    Suppose you are dealing with manufacturers

    who produce clothing in various sizes. Is

    knowing the mean shirt size of European men

    is 41.3 or that average shoe size of American

    women is 8.24 useful?

  • 7/27/2019 Business Statistics L1

    13/34

    Capturing the central tendency

    Lets consider the incomes or wealth ofhouseholds in a city. Usually, a largeproportion of population has relatively

    modest incomes, but the incomes of, say, thehighest 10% of all earners can be very large.

    In such case, would you use mean income topresent the view of economic well-being inthe city?

  • 7/27/2019 Business Statistics L1

    14/34

    Capturing the central tendency

    The average or mean number is generallyappropriate to summarize datas central tendencywhen we have numerical data.

    But with categorical data, such as opinion scales,mean is meaningless.

    What is valuable for inventory decisions is not themean size, but the modal sizethe size of itemssold most oftenthat is the size in heaviestdemand.

  • 7/27/2019 Business Statistics L1

    15/34

    Capturing the central tendency

    But even with numerical data, mean can sometimesgive misleading information about the center.

    In the case of income distribution, the mean income

    can be inflated by the very wealthy. The existence ofthe very wealthy is also an illustration of outliers,numbers that are so far from the rest of the data.

    Outliers (positive outliers) tend to increase mean butdoes not affect median. The median is preferred to themean in such case o describe the center position inincome distribution.

  • 7/27/2019 Business Statistics L1

    16/34

  • 7/27/2019 Business Statistics L1

    17/34

  • 7/27/2019 Business Statistics L1

    18/34

  • 7/27/2019 Business Statistics L1

    19/34

    Please calculate the

    mean, the median, and

    the mode, and tell us

    which statistic morereasonably captures the

    central tendency of this

    dataset?

  • 7/27/2019 Business Statistics L1

    20/34

    How can we determine if the mean is being heavily influenced by outliers?

    The simple answer is: dont just look at the mean, look at more statistics. If the

    mean and the median are not close together, then the mean may be affected by

    outliers, such as the case in this example.

  • 7/27/2019 Business Statistics L1

    21/34

    Even if the mean and the median are equal in

    a dataset, does it mean that we can use either

    one to adequately capture the central

    tendency?

  • 7/27/2019 Business Statistics L1

    22/34

    Frequency

    Salary

    Mean salary = Median salary

    The mean and median salaries are some of the least frequently

    reported values. These salaries appear to be bimodal. Perhaps in

    this case both staff and executive salaries have been collected.

    Because there are two frequently occurring values, the mode

    salary values may be the best way to summarize the dataset.

  • 7/27/2019 Business Statistics L1

    23/34

    Running the numbers to get mean, median,

    and mode is simply not sufficient.

    Graph the data before deciding how best to

    summarize a dataset.

  • 7/27/2019 Business Statistics L1

    24/34

    Zhijiangsmigrant income in 2007

    The intervals intowhich the data are

    broken down are

    called bins (or

    classes).

    The numbers ofobservations in

    each class are called

    frequencies.

    A histogram is a

    representation ofthe tabulated

    frequencies over

    specified bins.

  • 7/27/2019 Business Statistics L1

    25/34

    You can tell quite a bit about a variable bylooking at a chart of its frequency distribution.

    It is clear to see that the migrant income

    histogram stretches out to the right, we callthis positively skewed. We can tell that themean is greater than the median in this case

    Mean: 1234 yuan Median: 1000 yuan

    Mode: 1000 yuan

  • 7/27/2019 Business Statistics L1

    26/34

    A word on skewness

    Skewness is the direction and relative magnitude the mean is pulled and the

    direction the tail of a graphed dataset is pulled.

    When the mean is pulled to higher values, we say there is a positive or right-

    skewness.

    When the mean is pulled to lower values, we say there is a negative or left-

    skewness.

    There is a type of distribution that has zero skewnessthat is the symmetric

    distribution. With symmetric distribution, the mean is equal to the median.

  • 7/27/2019 Business Statistics L1

    27/34

    The variability or spread

    of a distribution

    When we have two datasets with the same

    mean, how can we tell which dataset is

    More variable?

    more volatile

    less precise

    less predictable

  • 7/27/2019 Business Statistics L1

    28/34

    The variability or spread

    of a distribution

    The easiest way to think about the volatility of

    a dataset:

    Range of the dataset

  • 7/27/2019 Business Statistics L1

    29/34

  • 7/27/2019 Business Statistics L1

    30/34

    The variability or spread

    of a distribution

    What about how far each point is from themean?

    The dataset with the higher average distancefrom the mean should be more spread out orvariable?

    We can express this idea using the followingformula: 1

    ( )i

    xN

  • 7/27/2019 Business Statistics L1

    31/34

    The variability or spread

    of a distribution

    But this formula always equals zero!

    (TA session)

    We must improve this formula slightly so thatdeviations on either side of the mean dont offset eachother in the aggregate. To get rid of the offsets, wecould either use absolute distance, or we can squarethe distances.

    We choose the square the distances, thats easier todeal mathematically in many applications.

  • 7/27/2019 Business Statistics L1

    32/34

    The variability or spread

    of a distribution

    Now we create mean squared deviations from

    the mean.

    We call the mean squared deviations from the

    mean the statistical variance.

    2 2

    1

    1( )

    N

    i

    i

    xN

  • 7/27/2019 Business Statistics L1

    33/34

    The variability or spread

    of a distribution

    But along comes another problem. Variance ismeasured in units of data, squared.

    Wouldnt it be better to use a spread statistic that isexpressed in the same units of the data being studied?

    So we take the square root of the variance.

    And this is called the standard deviation.

    21 ( )i

    xN

  • 7/27/2019 Business Statistics L1

    34/34

    Look at two investment funds below:

    MBA Student Fund A MBA Student Fund B

    Average return

    over 10 years

    5% 5%

    Median return 7% 2%

    Standard

    deviation

    10% 1%

    In which fund would you invest?