Applied Statistics - Introduction

23
Outline Econometrics Illustrations On method Applied Statistics for Economics 1. Introduction SFC - [email protected] Spring 2012 SFC - [email protected] Applied Statistics for Economics 1. Introduction

description

Slides for the introductory section of my introductory course in applied statistics.

Transcript of Applied Statistics - Introduction

Page 1: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Applied Statistics for Economics1. Introduction

SFC - [email protected]

Spring 2012

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 2: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Econometrics

Illustrations

On method

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 3: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Definitions

Statistics or statistical inference is the set of methods used inscience, technology, and industry to extract information from data.

Data is a set of records drawn from observations of the world.

When used in economics (and also business management, finance,and a number of social sciences) and in policymaking,1 statisticalmethods are often called econometrics. We will see that there is agood reason for the terminological distinction. We will follow thisconvention and refer to our course as introductory econometrics.

1Policymaking means choosing rules of behavior (‘policies’). We usuallythink of governments making (and implementing) policies, but this also appliesto any other organization (business, household, nonprofit, club) or individual. Inthis sense, business managers or heads of household are “policymakers.”

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 4: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Econometric applications

In practice, econometrics:

I tests empirically whether theories2 about social or economicbehavior match observed facts,

I forecasts the future values of interesting economic variables ofinterest,

I fits economic models to real-world data, and

I uses historical data to make quantitative policyrecommendations to policymakers.

2By theory (or model), I mean a clear statement about the relationshipbetween at least two variables of interest. In very general terms, a theory is astatement of the following type: “If x , then y .” Often, x is called the ‘premises’and y the ‘conclusions.’ More specifically, a simple theory about cigaretteconsumption would be a statement like this: “Other things equal, if cigaretteprices increase, the consumption of cigarettes will decline.”

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 5: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

The econometrics approach

Ideally, as a scientific discipline, econometrics uses (1) statistics (abranch of deductive mathematics), (2) probability theory (a theoryof uncertainty in the world), and (3) economics (a theory abouthow economic variables are related) in response to the practicalconcerns of policymakers.

Ultimately, it is the practical needs of policymakers that dictatewhich theories to test empirically, which relationships to estimate,and which variables to forecast.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 6: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Illustrations

To illustrate the use of econometrics (and the reason why we call it‘econometrics’ rather than just ‘statistics’), consider the followingexamples:

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 7: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Class size and grades

Does reducing class size improve elementary school education?

The question cannot be answered well by looking at the datacasually. Suppose we do and note that smaller classes and highergrades go together. This may be due to other advantages thatstudents in small classes may have over students in bigger classes.E.g., students in smaller classes may have richer parents, greateraccess to libraries, etc.

The data available don’t come from an experiment whereotherwise identical students are placed in classes of different sizeand then test their respective academic performance.3

Hence, we need special tricks to examine this kind of data and tryto answer the question.

3In Latin, the word “data” is plural for the singular “datum.” However, wemay subsequently say “data is . . . ” rather than – awkwardly – “data are . . . .”

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 8: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Racial discrimination in mortgage lending

Is there racial discrimination in the market for home loans?

Again, a casual look at the data won’t do. If after looking at thedata, we say that black applicants are denied loans more oftenthan white applicants and the issue is race, a critic may object thatthe correlation between race and mortgage approvals may be dueto other reasons. For instance, black people may be poorer andhave less property to use as collateral. Then the issue is not race,but income or wealth.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 9: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Racial discrimination in mortgage lending

Again, the data don’t come from black and white people who areotherwise similar. We need econometrics (not just statistics) to getaround the deficiency of the data. We need to isolate the raceeffect from other effects. One cause doesn’t exclude the other.Moreover, the causes may interact. Discrimination may result notonly from being black or only from being poor, but from beingboth black and poor!

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 10: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Racial discrimination in mortgage lending

Notice how important a test like this can be for policyrecommendations:

If the main reason why black people are more often denied loansthan whites is because they are black, then we need mainly theenforcement of civil rights laws. But if the main reason is that theyare poor, then we mainly need actions and resources to fightpoverty, joblessness, etc. If the reason is the interaction betweenrace and economic condition, then the combination of policiesrequired to address the problem will also be different. Therecommended courses of action depend on the diagnosis. Andsince the resources of a community to deal with its problems arefinite, you want to spend those limited resources in their mosteffective uses.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 11: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Taxes and cigarette smoking

How much do cigarette taxes reduce smoking?

Suppose you look at data on cigarette sales, prices, taxes, andpersonal income for U.S. states in the 1980s and 1990s, and notethat states with low taxes and low prices have higher smokingrates, and vice versa.

A problem here is double causality. Presumably, low taxes lead tohigh demand. But also, because of high demand, there will bemany voters who smoke, and politicians may try to keep cigarettetaxes low to get reelected.

Econometrics methods, as opposed to regular statistical inferencethat relies experimental data, has ways to get around this doublecausality problem.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 12: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Forecasting future inflation

What will the inflation rate be next year?

Nowadays, most central banks think of their mission as controllinginflation (they used to think their mission was to help the economyreach full employment). They set the interest rates based on theirinflation outlook in the future.

If they think inflation will increase, they may want to slow downthe economy by rising the rates. Or vice versa. If they guesswrong, they can cause an unnecessary recession or they may enableinflation to spin out of control.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 13: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Required answers

To give quantitative answers to these questions, we use data. If weuse different data sets, then we may get a different answer. In away, our answer to the question is uncertain. The answer willdepend on the data we use. There’s uncertainty. What kind ofquantitative answers do we need?

Does reducing class size improve elementary school education? Ifclasses are reduced in 10%, holding constant other studentcharacteristics, the test scores of students increased in x%.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 14: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Required answers

Is there racial discrimination in the market for home loans?Holding constant all other characteristics of loan applicants andpossible applicants,4 being black reduces your chances of getting aloan by x%.

How much do cigarette taxes reduce smoking? If the price ofcigarettes increases in 1%, holding constant the income of smokersand possible smokers5 and all other variables, the smoking ratedeclines in x%.

4Potential applicants must be included in the data sample because it maywell be that some blacks don’t apply for loans because they believe they’ll bedenied loans. And loan discrimination is what we’re trying to measure.

5Again, we include potential smokers who don’t currently smoke because ahefty tax may discourage them to join the smoking club and vice versa.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 15: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Required answers

To answer these questions, we need the multiple regression modelthat we’ll introduce by the end of the course. However, becausethis is an introductory course, we may not be able to get to thetopics where we can actually learn the tricks to get around all thedata deficiencies indicated above. Some, perhaps, but not all. Butat least we will know that these issues exist.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 16: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Required answers

What will the inflation rate be next year? Here the type of answeris obvious: The inflation rate next year will be x%.

In this course, we will not be able to study the econometricmethods required to answer this type of question. These methodsare called time-series econometrics, and they are heavily used inmacroeconomics and finance.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 17: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Causality

An action causes an outcome if the outcome is the immediateresult or consequence of that action. Causality means that aspecific action (fertilizing tomatoes) leads to a specific measurableconsequence (more tomatoes).

How do we measure whether a specific action is the cause ofcertain effects? We can run an experiment. For that we need manyplots with tomato plants. They must be, as far as possible,identical except in the amount of fertilizer applied.

Moreover, the decision whether a plot should be fertilized or notmust be random to make sure that the only systematic differencebetween the plots is whether they are fertilized or not. We recordthe amount of fertilizer and count the tomatoes at the end of thecycle.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 18: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Causality

That’s a randomized controlled experiment. The non-fertilizedplots are called the controlled group. The other is the treatmentgroup. It is randomized because the treatment is assignedrandomly to eliminate the possibility of other systematicdifferences among control and treatment groups. If the experimentis conducted in a sufficiently large scale, then we may be able toestimate the causal effect of fertilizing on tomato production.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 19: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Causality

Our definition of causal effect: The effect on an outcome of agiven action or treatment as measured in a randomized controlledexperiment. The only systematic reason for differences in outcomesbetween the controlled and treatment groups is the treatmentitself.

We cannot always conduct experiments in economic life. They’dbe too costly, unethical, or practically impossible. So a randomizedcontrolled experiment will be only a theoretical benchmark for us.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 20: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Causality

Note that, to answer the fourth question, we do not require toknow the causes of inflation. All we need to know is how to makea reliable forecast. We can forecast rain if we look through awindow and see people carrying their umbrellas, relying on the factthat people tend to carry their umbrellas along when they expectrain. But the use of umbrellas is not the cause of rain.6

6Advanced time-series econometrics also has methods to estimate causes:these methods fall under the rubric of ‘structural models.’

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 21: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Data sources

According to its origin or source, there are two basic types of data:

1. experimental data and

2. observational data

In economics (and to a large extent in business) we useobservational data. We need to use econometric tricks to estimatecausal effects from observational data. In the real world, the levelsof “treatment” are not assigned at random and it is therefore hardto disentangle the effect of the “treatment” from the effects ofother causes.

That’s what econometrics is for. That’s why econometrics exists,as opposed to mere statistical inference of the type used in thephysical and natural sciences.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 22: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Data types

Types of data:

1. Cross-sectional data: Data on different entities (individuals,firms, states, countries, etc.) for a single period of time.

2. Time-series data: Data for a single entity (individual, firm,state, country, etc.) from different periods of time or atdifferent points in time.7

3. Longitudinal or panel data: Data for more than one entity inwhich each entity is observed at two or more periods of time.

7For more on this difference, see my review slides on flows, stocks, andaccounting.

SFC - [email protected] Applied Statistics for Economics 1. Introduction

Page 23: Applied Statistics - Introduction

OutlineEconometricsIllustrationsOn method

Wrap-up

1. Why do we need to give quantitative answers to somequestions?

2. What’s a causal effect?

3. What is a randomized controlled experiment?

4. What’s econometrics for?

5. Why do we need techniques different from those used in thephysical and natural sciences?

6. What is the difference between cross-sectional, time series,and panel data?

SFC - [email protected] Applied Statistics for Economics 1. Introduction