Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting

24
Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting John G. Zhang, Ph.D. Harper College [email protected]

description

Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting. John G. Zhang, Ph.D. Harper College [email protected]. Topics. Why forecast How to forecast Why ARIMA What is ARIMA How to ARIMA How ARIMA did Discussion. Why Forecast. - PowerPoint PPT Presentation

Transcript of Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting

Page 1: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

Looking Ahead of the Curve: an ARIMA Modeling Approach to

Enrollment Forecasting

John G. Zhang, Ph.D.

Harper College

[email protected]

Page 2: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 2

Topics

• Why forecast

• How to forecast

• Why ARIMA

• What is ARIMA

• How to ARIMA

• How ARIMA did

• Discussion

Page 3: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 3

Why Forecast

• Queries and Reports: what was

• Dashboard: what is

• Forecasts: what will be

• Forecast for enrollment: more valuable for resources planning

Page 4: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 4

How to forecast

• Naïve forecast: random walk, moving average

• Exponential smoothing• Markov chain• Regression• ARIMA• Others• Combining methods

Page 5: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 5

Why ARIMA• Naïve forecast: best guess if no patterns• Exponential Smoothing: usually designed for

one-step ahead forecast• Markov chain: see reference• Regression: frequently violates the

assumption of uncorrelated errors• ARIMA: worked well, more later• Others: see reference• Combining Methods: non-directional

Page 6: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 6

What is ARIMA

• AutoRegressive Integrated Moving Average

• Generally, the model is given by

t

q

i

iit

dip

i

ii BXBB

10

1

1)()1(1

Page 7: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 7

• where Xt is a time series value at time t, 0 is a constant, • B is a backshift or lag operator, • i is a number of lags or spans, is an error term at time t, and θ are AR and MA parameters, and • p, d, and q are the orders of AR, I, MA

t

q

i

iit

dip

i

ii BXBB

10

1

1)()1(1

tX

Page 8: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 8

• if p = 1, d = 0, q = 1, ARMA(1, 1):

(1 - 1B)(Xt – θ0) = (1 - θ1B) t

• If p = 1, d = 0, θ1 = 0, AR(1) model:

(1 - 1B)(Xt – θ0) = t

• If p = 1, 1 = 1, d = 0, θ1= 0, random walk: • (1 - B)(Xt – θ0) = t

• If 1 = 0, d = 0, θ1 = 0, constant:

(Xt – θ0) = t

t

q

i

iit

dip

i

ii BXBB

10

1

1)()1(1

Page 9: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 9

How to ARIMA

• Box and Jenkins (1976) notation:(p d q)(p d q)s

• Four stages:IdentificationEstimationValidationForecasting

Page 10: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 10

How to ARIMA• SPSS Trends module:

version 12 worked well

version 13 and 14: algorithms changed same data, same program, different forecast

• SAS ETS module:

ARIMA procedure more flexible

forecast consistant

automation possible thanks to macros

Page 11: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 11

Identification

• Series Plot

• Autocorrelation plot

• Dickey-Fuller test of unit root hypothesis

• AR models to compare the log likelihood values for a series and its transformed series

Page 12: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 12

Identification

• Degree of differencing

• Order of AR

• Order of MA

• Seasonality if any

Page 13: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 13

Estimation

• Q statistics• Goodness-of-fit criteria:

variance estimateAkaike information criterionSchwartz Bayesian criterion

• Significance of parameters• Residuals analysis• Mean Absolute Percent Error

Page 14: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 14

Data

• Time series data

• Date variable: year, quarter, month, week, day, hour, minute, second

• Enrollment data: FTE, headcount, seatcount

• Data points

• Nature of the series determines the forecast

Page 15: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 15

Patterns of Data

• Trend: steady increase or decrease in the values of a times series

• Cycle: long-term patterns of rising and falling data

• Seasonality: regular change in the data values that occurs at the same time in a given period

Page 16: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 16

FTE

Page 17: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 17

FTE Pattern

• Trendy: FTE increasing from 1998 to 2006, suggesting non-stationary and differencing necessary

• Seasonal: higher in the Fall and Spring and lower in the Summer each and every year, implying a seasonal factor present as part of the model building process

Page 18: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 18

Autocorrelations and Partial Autocorrelations (ACF and PACF)

ACF

• Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 • 0 1.00000 | |********************|• 1 0.64901 | . |************* |• 2 0.29267 | . |****** |• 3 -.06855 | . *| . |• 4 -.42111 | ********| . |• 5 -.42944 | *********| . |• 6 -.43520 | *********| . |• 7 -.40880 | ********| . |• 8 -.38067 | ********| . |• 9 -.06784 | . *| . |• 10 0.25681 | . |***** . |• 11 0.55983 | . |*********** |• 12 0.85774 | . |***************** |• 13 0.55625 | . |*********** |• 14 0.24975 | . |***** . |• 15 -.06186 | . *| . |• 16 -.36715 | . *******| . |• 17 -.37708 | . ********| . |• 18 -.38454 | . ********| . |• 19 -.36197 | . *******| . |• 20 -.33780 | . *******| . |• 21 -.07144 | . *| . |• 22 0.20576 | . |**** . |• 23 0.46222 | . |********* . |

PACF

• Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1• 1 0.64901 | . |************* |• 2 -0.22210 | ****| . |• 3 -0.28449 | ******| . |• 4 -0.37073 | *******| . |• 5 0.18006 | . |**** |• 6 -0.26468 | *****| . |• 7 -0.29117 | ******| . |• 8 -0.45581 | *********| . |• 9 0.72564 | . |*************** |• 10 0.06626 | . |* . |• 11 0.26005 | . |***** |• 12 0.18460 | . |**** |• 13 -0.22575 | *****| . |• 14 0.14806 | . |***. |• 15 0.10247 | . |** . |• 16 0.16423 | . |***. |• 17 -0.18254 | ****| . |• 18 0.15059 | . |***. |• 19 -0.04279 | . *| . |• 20 0.11045 | . |** . |• 21 -0.18268 | ****| . |• 22 0.08106 | . |** . |• 23 -0.06703 | . *| . |

Page 19: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 19

Q Statistics

• Q Statistics show autocorrelations among various lags highly statistically significant

• Autocorrelations were very high• Further actions needed

Autocorrelation Check of Residuals

To Chi- Pr > Lag Square DF ChiSq --------------------Autocorrelations--------------------

6 385.69 6 <.0001 0.937 0.874 0.808 0.743 0.727 0.711 12 777.02 12 <.0001 0.709 0.707 0.752 0.799 0.833 0.866 18 1107.12 18 <.0001 0.811 0.755 0.697 0.640 0.624 0.608 24 1436.47 24 <.0001 0.605 0.603 0.640 0.679 0.706 0.732

Page 20: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 20

FTE Forecast

0

2,000

4,000

6,000

8,000

10,000

12,000

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

Summer

Fall

Spring

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Fiscal Year

FT

E FTE

LCL

UCL

Page 21: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 21

How ARIMA Did

• Accuracy: what matters most

• 2-period ahead: 0.74% (FTE) 0.50% (HC)

• 6-period ahead: 1.43% (FTE) 1.65% (HC)

• 10-period ahead: 1.40% (FTE) 2.52%(HC)

• Forecast error bigger into distant future

• Eleanor S. Fox (2005) 1.2% (4) 4.1% (8)

• NCES (2003) 1.9% (2) 3.6% (6)

Page 22: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 22

Discussion

• Theoretically factors includable along with the time series itself like in regression

• Unemployment rate• Consumer Price Index (CPI)• High school student population• District population• Tuition• Forecasts used for forecasting?

Page 23: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 23

Discussion

• Stationarity and homogeneity

• Scarcity and spuriousness

• Seasonality and outliers

• Raw or cooked data

• Data mining and stepwise

• Fit and accuracy

• Additive or multiplicative (subset/factored)

Page 24: Looking Ahead of the Curve:  an ARIMA Modeling Approach to Enrollment Forecasting

47th AIR Annual Forum 24

Discussion

• Science and art

• Objective and Subjective

• Quantitative and qualitative

• Over-differencing and over-fitting

• Parsimony and uncertainty

• Simple or complex