AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

33
AGENDA 1. Homework 2. Quiz 3 3. Dummy Variables 4. Forecasting Autoregressive Model Time Series

Transcript of AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

Page 1: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

AGENDA1. Homework

2. Quiz 3

3. Dummy Variables

4. Forecasting

• Autoregressive Model• Time Series

Page 2: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

I. HOMEWORK

Posted under the LESSONS tab in ANGEL

DUE

Friday 4/23/10

Page 3: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

DUMMY VARIABLES

Initially we said that all dependent and independent variables in the regression model should be continuous variables (interval or ratio scale).

Sometimes it is necessary to include a categorical (or qualitative) variable into the regression equation since the qualitative variable may play a significant role in the prediction or explanation of the dependent variable.

Page 4: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

DUMMY VARIABLES (cont.)

Whether someone has a college degree or not may influence the salary the person gets.

The location of a house (rural, urban) can change its value.

The season (winter, fall, summer, spring) can be used when predicting the number of flights arriving in Hawaii.

Whether a company is public or private may have a role in explaining its exports level.

Page 5: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

DUMMY VARIABLES (cont.)

An easy way of incorporating a qualitative variable into a regression model is by representing them with special types of variables called dummy variables.

A dummy variable is a variable that indicates the presence or absence of some characteristics or attribute. The dummy variable assumes the value of 1 if the attribute is present, and 0 if the value is absent.

Dummy variables are also called indicator variables, categorical variables, or binary variables.

Page 6: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

DUMMY VARIABLES Let’s say that we want to predict the salary a

customer service agent gets. We think that years of experience is one of the variables (X1).

We would also like to include whether the person is a college graduate or not. We will use a dummy variable to include this information. Therefore x2 will be

x2 = 0, if the person is not a college

graduate.

x2 = 1, if the person is a college graduate.

Page 7: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

DUMMY VARIABLES

A dummy variable can only take on 2 values (0 or 1), we call the condition in which the dummy variable is 0 the base condition.

The coefficient of the dummy variable represents the difference between being in the base condition and not being in the base condition.

Page 8: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

For continuous variable coefficient:b1 is interpreted as “the change in the predicted value of annual salary (Y) with one unit change in years of experience (X1)”.

For dummy variable coefficient: b2 is interpreted as “the change in the predicted value of the annual salary (Y) when the person is a college graduate versus when he/she is not”. NOT with a one unit change in X2.

INTERPRETING DUMMY VARIABLES

Page 9: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

The dummy variable affects the intercept of the regression model, not the slope.

No college degreeWith college degree

Y-annual salary

X1-years of experience

INTERPRETING DUMMY VARIABLES

Page 10: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

DUMMY VARIABLE EXAMPLE

Y: annual salary

X1: years of experience

X2: 1 if the person has a college degree, 0 otherwise.

Assume that the person has 5 years of experience. What would his salary be if he is not a college graduate? What would his salary be if he is a college graduate?

21 85.225ˆ xxy

Page 11: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

TESTING A MODEL WITH DUMMY VARIABLES

The F-test for the overall model includes the dummy variables as well, and it is interpreted same way as before.

The t-test for testing the significance of the coefficient (H0: 2=0) is also identical to how we did it previously. It tells us whether the use of the variable is justified for the regression model.

Page 12: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

MULTI-CATEGORY DUMMY VARIABLES

What if we want to use a categorical variable that has more than 2 levels? For example, how do we use dummy variables for a “season” variable?

We cannot assign numbers 1, 2, 3, 4… because a dummy variable can only take on values 0 and 1.

Instead we use multiple dummy variables to code the multi-category variable.

When a categorical variable has d levels, d-1 number of dummy variables are used to code this categorical variable.

You take one level to be the base condition where all of the dummy variables are 0.

Page 13: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

For example to code seasons, we need 4 - 1 = 3 dummy variables (X1, X2, X3).

Let’s take winter as our base case. We designate X1 to

represent Spring, X2 to represent Summer, X3 to represent

Fall. Only one of the dummy variables can be 1 at a time.

Winter: 0,0,0 Spring: 1,0,0 Summer: 0,1,0 Fall: 0,0,1

X1 X2 X3Winter 0 0 0 (base case)Spring 1 0 0 X1=1 when springSummer 0 1 0 X2=1 when summerFall 0 0 1 X3=1 when fall

MULTI-CATEGORY DUMMY VARIABLES

Page 14: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

Dummy Variable Example

GPA is a function of class standing:

GPA = 2.65 + 0.5X1 + 0.67X2 -0.18X3

Freshman is the base caseX1 = 1 if sophomore

X2 = 1 if junior

X3 = 1 if senior

How many dummy variables are needed?What is the predicted GPA of a freshman? senior?

Page 15: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

FORECASTING One of the most important applications of

regression analysis is developing forecasts. What we are trying to do is to develop

forecasts of future values based on an examination of the variable in past time periods.

Typical business examples: Forecasting sales of a product so that you can

plan your inventory levels. Forecasting profits and income so you can

determine whether or not you will need a bank loan.

Page 16: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

FORECASTING

Typically the best forecast that you will have for the future is based on actual results and trends of the recent past.

We will use a quantitative forecasting method based on this logic called time series analysis.

Time series data are data collected at regular intervals over a period of time.

Time series analysis is a set of quantitative methods for determining patterns in time series data.

Page 17: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

TIME SERIES ANALYSIS

So, forecasting is the extrapolation of series values beyond the region of the estimation data.

But, in regression we cannot use a regression model outside the range for which the model is estimated. Therefore we need to make the following basic assumption:

Those factors that have influenced patterns of activity in the past and present will continue to do so in more or less the same manner in the near future.

Page 18: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

COMPONENTS OF TIME SERIES

Time-series data are usually effected by four factors:

1. TREND: steady tendency of increase or decrease over time.

Possible Causes: changes in technology, culture, population, popularity…

Duration: many years – Systematic

Example: number of internet users is steadily increasing

year after year.

Page 19: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

2. SEASONAL VARIATION: Regular fluctuations or periodic changes

that repeat year after year.

Possible Causes: weather, social and religious customs…

Duration: repeats every year (4 seasons, 12 months, or 52 weeks depending on

periods being analyzed) – systematic.

Example: sales of snow blowers, suntan lotions, barbeque grills, Christmas shopping

etc.

COMPONENTS OF TIME SERIES (cont.)

Page 20: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

COMPONENTS OF TIME SERIES (cont.)

3. CYCLICAL VARIATION: Repetitive fluctuations or swings of varying

length and intensity in the long-term.

Possible Causes: business or economic conditions.

Duration: periods longer than one year – systematic.

Example: Economic cycles of growth or contraction, inflation, recession

Page 21: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

4. RANDOM OR IRREGULAR VARIATION:Unpredictable random variations in the

time- series that the above three components fail to account for.

Possible Causes: Unforeseen events such as catastrophes, strikes, etc.

Duration: short, unrepeatingUnsystematic, random.

Example: Sales of bundling supplies after a hurricane, loss of customers of

an airline due to a strike.

COMPONENTS OF TIME SERIES (cont.)

Page 22: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

TIME SERIES EXAMPLE

You are asked to prepare a forecast of sales based on previous years’ sales.

Sales

0.0020.0040.0060.0080.00

100.00120.00140.00160.00180.00200.00

1986 1988 1990 1992 1994 1996 1998

Years

Sa

les(

in t

ho

usa

nd

un

its)

Sales

Years Sales1988 100.601989 102.901990 108.701991 128.401992 150.701993 149.601994 166.001995 161.601996 150.601997 174.00

Page 23: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

ESTIMATING A LINEAR TREND

There is nothing specific about the “years” besides the fact that they are ranked with equal distances between observations.

Therefore, it is common practice to recode the years into simpler numbers.

To do this, we take the first year for which the data for which the data is available as the “base year” by setting t=0.

Then for each consecutive period, t increases by one.

Page 24: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

TIME SERIES EXAMPLE (cont.)

Years Time Sales1988 0 100.601989 1 102.901990 2 108.701991 3 128.401992 4 150.701993 5 149.601994 6 166.001995 7 161.601996 8 150.601997 9 174.00

Sales

0.0020.0040.0060.0080.00

100.00120.00140.00160.00180.00200.00

1986 1988 1990 1992 1994 1996 1998

YearsS

ale

s(th

ou

san

d d

oll

ars

)

Sales

- 1988 is the base year. 8554.0

31.893.101ˆ2

R

ty

Regression results:

Page 25: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

INTERPRETING TIME SERIES RESULTS

b0=101.93 predicted sales for t=0 (year 1988)

b1=8.31change in sales in one period (one year)

Forecast for sales in 1998:

Base year is 1988, so t=1998-1988=10

03.185)10(31.893.101ˆ:(1998) Sales Predicted 10 y

tyt 31.893.101ˆ

Page 26: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

TIME SERIES EXAMPLE A time series analysis conducted using data

between years 1992 and 2002 yielded the following results:

Yt=1850 + 35t

where Yt is the yearly profit of a computer company in thousands of dollars and t is measured in years. (t1992=0).

a. What is the estimated yearly profit of this company for the year 2004?

b. How much would you expect the profit level of this company change in 6 years?

Page 27: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

AUTOREGRESSIVE MODELS

So far our future predictions were based on the relationship between the time and existing values.

Another type of model that can be used for estimating the future values. That is the autoregressive model in which the values are predicted based directly on past observations.

Page 28: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

AUTOREGRESSIVE MODEL An auto regressive model uses the past

observations as independent variables in the model as predictors for future values.

Autoregressive models are also called lagged models.

...ˆ)1(210 ktktt YbYbbY

Page 29: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

AUTOREGRESSIVE EXAMPLESales for a women's perfume brand are given for each quarter of 3 consecutive years.

Predict the sales for the first and second quarters of 2001.

Year Quarter Sales

1998 1 147.62 251.83 273.14 249.1

1999 1 139.32 221.23 260.24 259.5

2000 1 140.52 245.53 298.84 287.0

2001 1408.176.11ˆ

tYYt

t-4

t-8

Page 30: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

Seasonal Index

0

500

1000

1500

2000

2500

3000

Q1 Q2 Q3 Q4

Quarter

To

tal

Rev

enu

e

2006

2007

2008

Page 31: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

Seasonal Index

To calculate the seasonal index you need:

1. Calculate the Moving Averages2. Calculate the Centered Moving Averages3. Calculate the Ratio to Centered Moving

Averages4. Average the Ratio to Centered Moving

Averages

Page 32: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

Seasonal Index

To use the seasonal index:

Multiply the result of forecasted sales value times the seasonal index

Page 33: AGENDA 1.Homework 2.Quiz 3 3.Dummy Variables 4.Forecasting Autoregressive Model Time Series.

Seasonal IndexTIME SALES

(1,000s)MOVING AVERAGE

CENTERED MOVING AVERAGE

RATIO TO CENTERED MOVING AVERAGE

2001-1

4.8

2001-2

4.1

5.35

2001-3

6.0 5.475 1.10

5.6

2001-4

6.5 5.7375 1.13

5.875

2002-1

5.8 5.975 0.97

6.075

2002-2

5.2 6.1875 0.84

6.3

2002-3

6.8

2002-4

7.4