Quiz 1. Name: No books, notes, or electronic...

26
Quiz 1. Name:___________________________________________________________________ No books, notes, or electronic devices. 1.(10) What is the slope of the linear approximation to a function f(x)? A. The integral of f(x) B. The derivative of f(x) C. Always 0 D. Always 1/10 2.(10) In the classic regression model, what distribution is assumed for p(y|x)? A. Normal B. Generic C. Bernoulli D. Poisson 3.(10) What happens when you select a smaller "smoothing" parameter (f) in LOWESS? A. The curve is more smooth B. The curve is less smooth C. The curve is flat D. The curve is exponential 4.(30) In the taxpayer document, Y = charitable contributions and X = income. State the correct meaning of the expression E( Y | X=100,000) in terms of charitable contributions and income. One sentence is enough; no more than two. 5. (10) What is another name for the "R-squared" statistic? A. Correlation coefficient B. Residual variance C. t-statistic D. Coefficient of determination 6. (10)The Gauss-Markov Theorem states that certain estimators of the regression coefficients are "good." Which ones are "good"? A. Least squares B. Maximum likelihood C. Method of moments D. Bayesian posterior means

Transcript of Quiz 1. Name: No books, notes, or electronic...

Quiz 1. Name:___________________________________________________________________ No books, notes, or electronic devices.

1.(10) What is the slope of the linear approximation to a function f(x)? A. The integral of f(x) B. The derivative of f(x) C. Always 0 D. Always 1/10

2.(10) In the classic regression model, what distribution is assumed for p(y|x)? A. Normal B. Generic C. Bernoulli D. Poisson

3.(10) What happens when you select a smaller "smoothing" parameter (f) in LOWESS? A. The curve is more smooth B. The curve is less smooth C. The curve is flat D. The curve is exponential

4.(30) In the taxpayer document, Y = charitable contributions and X = income. State the correct meaning of the expression E( Y | X=100,000) in terms of charitable contributions and income. One sentence is enough; no more than two.

5. (10) What is another name for the "R-squared" statistic? A. Correlation coefficient B. Residual variance C. t-statistic D. Coefficient of determination

6. (10)The Gauss-Markov Theorem states that certain estimators of the regression coefficients are "good." Which ones are "good"? A. Least squares B. Maximum likelihood C. Method of moments D. Bayesian posterior means

Quiz 2. Name:_____________________________________________ Closed books, notes, and no electronic devices.

1. (20) Suppose you test Ho: β1 = 0 and get a p-value p = .30. Which is the confidence interval for β1? A. (.10, .50) B. (.20, .40) C. (.30, .30) D. (-3.4, 5.6)

2. (20) Why is it wrong to report p ≤ 0.034 rather than p = 0.034? A. because the p-value is not 0.001 B. because p ≤ 0.034 indicates a one-sided test C. because p ≤ 0.034 indicates significance while p = 0.034 indicates insignificance D. because the results are not easily explained by chance alone

3. (40) Suppose that, in the Toluca case, E(Workhours|Lotsize = 80) = 200. Explain this result in terms of the production process and the Law of Large Numbers.

Quiz 3. Name:_____________________________________________ Closed books, notes, and no electronic devices.

4. (40) I gave four concerns (problems) with using testing methods to evaluate regression assumptions. State two of them.

5. (40) I gave five reasons that you might want to use transformations. State two of them.

Quiz 4. Name:_____________________________________________ Closed books, notes, and no electronic devices.

6. (40) The first equation was Price = 24723 – 0.17 Mileage. Does the fact that b1 is close to zero (here b1 = -0.17) mean mileage is not very important? Do not refer to p-values, tests, correlations, or R2 statistics in your answer.

7. (40) Draw a typical scatterplot of residuals versus fitted values that clearly shows heteroscedasticity. (It’s not necessarily to reproduce the specific graph shown in the reading.)

Quiz 5. Name:_________________________________________ Closed notes, books, and no electronic devices.

1. (40) Perform the following matrix multiplication:

=

635241

100010001

2.(20) Which matrix function can tell you that a matrix is not invertible? A. cofactor B. eigenvector C. determinant D. linear combination

3. (20) The partial regression plot is a scatterplot showing __________ on one axis and ___________ on the other axis. A. residuals, residuals B. residuals, fitted values C. fitted values, slopes D. slopes, intercepts

Quiz 6. Name:_________________________________________ Closed notes, books, and no electronic devices.

In the Carvalho document, he reported the regression equation Sales = 116 – 97.7P1 + 109P2, where

Sales = Your sales of a product

P1 = Your price for the product

P2 = Competitors’ price for the same product

Give the correct interpretation of the coefficient

b1 = -97.7

in a sentence or two, as was done in the document. Make sure your interpretation correctly addresses the fact that this coefficient is a negative number.

Quiz 7. Name: _______________________________________ Closed notes, no electronic devices.

1.(20) Inference between observational units is called _______________; inference within observational units is called _______________. A. predictive, causal B. correlational, causal C. causal, correlational D. correlational, predictive

2.(20) In causal inference, you need to consider outcomes Y that would have happened, had you set the treatment variable to some different number. These outcomes are called A. predictions B. residuals C. slopes D. counterfactuals

3.(20) The true regression model is Y = β0 + β1T + β2X + ε. The confounder X is related to the treatment via X = γ0 + γ1T + ν. Give the slope of the regression of Y on T alone, in terms of these models’ parameters.

4.(20) The book describes three methods for estimating causal effects. Name one of these methods. (Two or three words only. Do not describe the method).

Quiz 8. Name: _______________________________________ Closed notes, no electronic devices.

Define multicollinearity in one or two sentences. (Don’t tell me what it does, just tell me what it is.)

Quiz 9. Name:_______________________________

Here is a model equaiton:

sexdosesexdoseY **** 321 βββα +++=

Assume Males are coded as 1 and Females as 0.

Using the model equation, show why the effect of dose on Y among males is 31 ββ + .

Quiz 10. Name:_______________________________

Closed notes, no electronic devices.

The variance bias trade-off refers to the estimated regression function ).(ˆ xf

Variance refers to the variability (i.e., multiple possible values) of )(ˆ xf .

Why is there variability (i.e. multiple possible values) of )(ˆ xf ?

Quiz 11. Name:_______________________________

Closed notes, no electronic devices.

True regression functions are never straight lines. Instead they are always curved, to some degree. Explain why this is true in an example from your field of study as follows:

1. Name your Y and X variables from your field of study. Y = ____________________________________________________

X = _____________________________________________________

2. Define the true regression function in terms of your Y and X variables.

3. Explain, in terms of your Y and X variables, why the true regression function is not precisely linear.

Quiz 12. Name: _______________________________________________

Here is a model.

Pricei = β0 + β1dn1i + β2dn2i + εi

Recall: There are three neighborhoods, and i = 1,…, 128 identifies a house in the data set.

dn1 = the dummy variable for neighborhood 1

dn2 = the dummy variable for neighborhood 2.

What does the model say about the distribution of house prices in neighborhood 2? (One or two sentences maximum.)

Quiz 14. Name: ____________________________________

Suppose there are outliers in Y|X space. What does this tell you about p(y|x)?

Quiz 15. Name: ___________________________________________________ Closed notes, no electronic devices.

1. Which assumptions are needed for typical quantile regression? (Select all that apply, 8 points per correct selection/non-selection) A. A model for the data-generating process B. Linearity C. Constant variance D. Normality E. Ordinary least squares

2. In the reading, the variable τ was used to indicate a (pick one) A. mean B. probability C. variance D. slope

3. In the food expenditure / income example, the slope of the 0.1 quantile regression function was ________________ the slope of the 0.9 quantile regression function. (pick one) A. Great than B. Less than C. Approximately equal to

Quiz 16. Name:__________________________________________________

1. If a process is stationarity then the means and variances are identical, for all time points t. What else is identical, for all time points t, when the process is stationary?

2. Several stationary processes were discussed in the article. Name one.

Quiz 17. Name:__________________________________________________

The document states that the covariance matrix of the error terms (The εi terms) is Σ.

1. What do different diagonal elements of Σ tell you?

2. What do non-zero off-diagonal elements of Σ tell you?

Quiz 18. Name:__________________________________________________

The document discusses “pitch.” What is “pitch”?

Quiz 19. Name:__________________________________________________

Answer on the lines provided only. (Be brief).

Give an example of panel data as follows, either from the reading, of from your own choosing.

Y = ____________________________________________________________________________

X = _____________________________________________________________________________

The variables Y and X are subscripted by “i” and “t” as follows: Yit, Xit.

In your chosen example,

i refers to ____________________________________________________________________________

t refers to ____________________________________________________________________________

Quiz 20. Name:__________________________________________________

Suppose you can observe data in five groups as follows:

Group 1 Group 2 Group 3 Group 4 Group 5

Data: Y11, Y12, …,

Y1n1

Y21, Y22, …, Y2n2

Y31, Y32, …, Y3n3

Y41, Y42, …, Y4n4

Y51, Y52, …, Y5n5

True (“population”)

mean: µ1 µ2 µ3 µ4 µ5

According to the article, you should not use the ordinary average of the data in Group 1 to estimate µ1, you should instead use a “shrinkage” estimate to estimate µ1. Describe briefly what is the “shrinkage estimate” of µ1. Do not use formulas.

Quiz 21. Name:__________________________________________________

Write down the equation of a simple multilevel model. It should be clear from your equation why it is called “multilevel.”

Quiz 22. Name:__________________________________________________

Suppose your level-one model is

Yij = β0j + β1jXij + εij,

for j = 1, …, J level-2 observational units, and i = 1, …, nj level-1 observation units within level-2 unit j.

Your theory states that the level-two terms β0j and β1j have the following models:

β0j = γ00 + u0j and β1j = γ10 + γ11 Zj + u1j

where Zj is a level-2 predictor variable.

Write the single equation that is your multilevel model. Identify which terms are fixed effects and which terms are random effects in that single-equation model.

Quiz 24. Name:__________________________________________

Regression models assume that Y is randomly sampled from (or produced by) distributions p(y|x).

Classic regression models assume that these distributions p(y|x) are normal distributions.

What are the distributions p(y|x) when Y is a binary random variable?

Quiz 25. Name:__________________________________________

Regression models assume that Y is randomly sampled from (or produced by) distributions p(y|x).

Classic regression models assume that these distributions p(y|x) are normal distributions.

What are the distributions p(y|x) when Y is a nominal random variable?

Quiz 26. Name:__________________________________________

Regression models assume that Y is randomly sampled from (or produced by) distributions p(y|x).

Classic regression models assume that these distributions p(y|x) are normal distributions.

Suppose your dependent variable is Y = count of financial advisors a person has used in their life. The first 10 Y observations in your data set are: 0 0 0 3 0 1 1 1 0 1.

What distributions p(y|x) will you assume to produce Y in this case?

Quiz 27. Name:__________________________________________

Regression models assume that Y is randomly sampled from (or produced by) distributions p(y|x).

Classic regression models assume that these distributions p(y|x) are normal distributions.

Suppose your dependent variable is

Y = time waiting on the phone for customer service.

You will analyze these data using parametric survival analysis methods.

What distributions p(y|x) will you assume to produce Y in this case?

Quiz 28. Name:__________________________________________

Regression models assume that Y is randomly sampled from (or produced by) distributions p(y|x).

Classic regression models assume that these distributions p(y|x) are normal distributions.

What distributions p(y|x) will you assume to produce Y when you use a sample selection model?