histograms, mean, standard...

38
© Florian Hollenbach, Texas A&M University histograms, mean, standard deviation how to describe single variables

Transcript of histograms, mean, standard...

Page 1: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

histograms, mean, standard deviation

how to describe single variables

Page 2: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Histogram

• Histogram describes the distribution of a single variable

• How many observations are in each block (class intervals) of data?

Page 3: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Histograms

• We can also use histograms to compare distribution of a variable for subgroups

• Area of each block represents the percentage of that group in the data

• Size of intervals often differ, so height of blocks not necessarily indicative of size of group

Page 4: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Page 5: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

How to draw histogram by hand

1. Draw distribution table -> percentage of group in each interval

2. Label axis, make sure breaks are equidistant on x-axis

3. Draw blocks, but height is not equal to their percentage!

4. Height of block is the percentage divided by interval length

Page 6: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Average and Standard Deviation Practice

• Which of these has the larger SD? Guess!

• 9, 9, 10, 10, 10, 12

• 7, 8, 10, 11, 11, 13

Page 7: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Average and Standard Deviation Practice

• Which of these has the larger SD? Now calculate the SD for both

• 9, 9, 10, 10, 10, 12

• 7, 8, 10, 11, 11, 13

Page 8: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Average and Standard Deviation Practice

• Calculate the average, deviation from average, and SD for each of the two lists:

• 5, -4, 3, -1, 7

• -5, 4, -3, 1, -7

Page 9: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Normal Approximation

Page 10: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

The normal (Gaussian) distribution (curve)

0.0

0.1

0.2

0.3

0.4

−5.0 −2.5 0.0 2.5 5.0x

density

Formula?

Page 11: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Normal curve

• Symmetric around 0

• Total area under the curve is 100%

• Area between -1 and 1 is ~68%

• Area between -2 and 2 is ~95%

• Area between -3 and 3 is ~99.7%

Page 12: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• We can match up other data that is in similar shape to the normal curve

• To do some histogram has to be on same scale, this is often done by using standard units on the x-axis

• One Standard Unit is the difference to mean, divided by one standard deviation

Page 13: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• Average score on exam was 50 and standard deviation was 10

• What are these scores in standard units: 60, 45, 75

• What are the scores for the following standard units: 0, +1.5, -2.8

Page 14: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• It is quite easy to figure out areas under the normal curve, the important thins is to remember symmetry!

Page 15: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Find the area: 1. To the right of 1.25 2. To the left of 0.80 3. To the left of -0.4 4. Between 0.4 and 1.30 5. Between -0.3 and 0.9 6. Outside of -1.5 to 1.5

Page 16: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Page 17: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• Much of the data we observe in the real world actually follows the normal distribution

• So often histograms when overlaid with the normal curve follow it quite nicely and we can approximate it with the normal curve

• If the histogram is not symmetric, the IQR may be preferred over the standard deviation as a measure of spread

Page 18: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Percentiles

• Percentiles: x percentile meaning x percent below this threshold

• E.G. 99th percentile in a test was a score of 95 means that 99 percent of students scores less or equal to 95

Page 19: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Percentile Rank

• Percentile rank is the percentile associated with a given score

• E.g. in the previous example the percentile rank for a score of 95 is 99

Page 20: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Interquartile Range (IQR)

• 75th percentile - 25th percentile

• Better measure of spread if distribution is not symmetric

Page 21: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurement

Page 22: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurement

• One of (maybe) the most important prerequisite to good science is good measurement

• Without measurement how can we tell whether X or Y changes and by how much?

Page 23: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurement

• Once we have a theory, we have to figure out how to measure our main concepts

• How do we measure our dependent and independent variables?

Page 24: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• Even when measurement seems straightforward, in the social sciences it is often not

• Political science is not like physics where we can easily measure some physical quantity

• E.G. Measuring law maker ideology is really hard and complicated

Page 25: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• Researchers have to make many decisions on how to measure the concepts they are interested in

• Recall example about measuring income — what do we include in income?

• At a minimum these decisions need to be transparent

Page 26: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

If the concept of interest is “economic output” (or “Gross Domestic Product”), which is commonly defined as the total sum of goods and services produced by labor and property in a given time period, then it is a relatively straightforward

matter to obtain an empirical observation that is consistent with the concept of interest. (Kellstedt & Whitten p.95)

Hold on!

GDP

Page 27: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

• “The central issue in national income accounting is deciding which economic activities and actors should and can be included in the official accounts” (Morten Jerven)

Page 28: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

National Accounting in Zambia in 2010:

• now only one man responsible for data collection on national accounts (GDP)

• also had to do statistics for industrial stats and public finances

• Data is “awful"

Page 29: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Reliability

• “repeatable or consistent”

• If we repeat the measurement on identical objects, we ought to get the same results

• If we have two people independently measure something, they should derive at the same result

Page 30: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Validity• “A valid measure accurately represents the concept that it

is supposed to measure” (Kellstedt & Whitten, page 101)

• face validity - on its face, the measure appears to be measuring what it purports to be measuring

• content validity - What is the concept to be measured? What are all of the essential elements to that concept and the features that define it? And have you excluded all of the things that are not it?

• construct validity - the degree to which the measure is related to other measures that theory requires them to be related to

Page 31: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measuring DemocracyAdam Przeworski:

• Democracy is a system in which parties lose elections.

Juan Linz:

• Democracy is government pro tempore (for time being)

Kenneth A. Bollen:

• the extent to which the political power of the elite is minimized and that of the nonelite is maximized. Political rights and political liberties are two major dimensions of the concept and these encompass most of the traits usually attributed to democratic systems

Page 32: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measuring Democracy

• Binary or Continuous?

• Countries are democratic or not!

• Or can we have more or less democratic countries?

Page 33: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

0.00

0.25

0.50

0.75

1.00

1800 1850 1900 1950 2000Years

Shar

e of

Dem

ocra

cies

Page 34: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurements can be wrong

BiasSystematic error

Page 35: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurements can be wrong

BiasSystematic error

i.e. the measurement is always wrong in the same way

Page 36: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurements can be wrong

Chance errornon-systematic error

Page 37: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Measurements can be wrong

Chance errornon-systematic error

i.e. the measurement is wrong but randomly in all directions

Page 38: histograms, mean, standard deviationfhollenbach.org/OLD_polisci209_DONOTUSE/img/images/slidesHisto… · © Florian Hollenbach, Texas A&M University histograms, mean, standard deviation

© Florian Hollenbach, Texas A&M University

Examples

• Bias

• Chance error