Weighing Individual Observations for Time Series Forecasting 2017. 4. 7.¢  Time Series...

download Weighing Individual Observations for Time Series Forecasting 2017. 4. 7.¢  Time Series Forecasting

of 27

  • date post

    05-Oct-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Weighing Individual Observations for Time Series Forecasting 2017. 4. 7.¢  Time Series...

  • Weighing Individual Observations for Time Series Forecasting

    Victor Hoornweg & Philip Hans Franses

    Erasmus University Rotterdam, Tinbergen Institute, Econometric Institute

    Rotterdam, July 1, 2014

    1

  • Introduction

    Issue: • How to deal with structural breaks or outliers?  Weigh individual observations

    • Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 • DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡,

    – where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

    2

    Figure 1. Simulated series

  • Introduction

    Issue:

    • How to deal with structural breaks or outliers?

     Weigh individual observations

    • Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡

    • DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡, – where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

    3

    Figure 2. Individual weights assigned to observations across time

  • Introduction

    Issue:

    • How to deal with structural breaks or outliers?

    Proposed solution:

    • Assign robust weights to observations based on pseudo out-of-sample forecasts (posf):

    – 𝑦𝑤,𝑡 = 𝑤𝑡𝑦𝑡

    – 𝑋𝑤,𝑡 = 𝑤𝑡𝑋𝑡

    – 𝑤𝑡 = 1 𝑇 𝑡=1

    • Use discrete, exponential, and/or equal weights (𝑤𝑡 = 1

    𝑇 ∀ t)

    • Exponential posf

    Relevance:

    • Interpretation: which period in the past is akin to the present period

    • Forecasting accuracy: focus on relevant data

    • Robust: shrink towards equal weights with penalty for unequal weights

    • Easy to apply to many types of datasets (high/low-frequency, many/few variables) and models

    4

  • Introduction

    Overview:

    • Literature

    • Innovations

    • Simulations

    – Forecasting accuracy

    – Influence statistical decisions on forecasts

    • Practical application

    • Discussion

    5

  • Literature on weighing observations

    Select optimal starting point (Pesaran & Timmermann 2007):

    • Compute posf for different starting points

    – Select best starting point

    – Take a weighted combination of starting points

    Exponential smoothing (Holt 1957, Brown 1959):

    – Basic model: 𝑦 𝑇+1 = 𝑤𝑖(𝛾)𝑦𝑖 𝑇 𝑖=1

    Discrete and exponential weights (Pesaran, Pick & Pranovich -PPP- 2013):

    • 𝛽 𝑇(𝒘) = 𝑤𝑡𝒙𝑡𝒙𝑡 ′𝑇

    𝑡=1 −1 𝑤𝑡𝒙𝑡

    𝑇 𝑡=1 𝒚𝑡, 𝑤𝑡

    𝑇 𝑡=1 = 1, h = 1

    • Choose weights so that pMSFE of 𝑦 𝑇+1 = 𝛽 𝑇𝒙𝑇+1 is minimized

    – Discrete breaks: analytic expression of optimal weights for multiple IVs

    • Determine breakpoints by considering all possible combinations between two breakpoints with certain limits for 𝑏1 and 𝑏2

    – Continuous breaks: exponential smoothing

    6

  • Innovations

    Example

    • DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 , where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

    • Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡

    – Computation time: 2.35 sec

    7

    Figure 3. Individual weights assigned to observations across time

  • Innovations

    • Exponentially weighted posf

    • Steps:

    1. Determine breakpoints

    2. Assign discrete weights to observations

    3. Shrink discrete weights towards equal or exponential weights

    • Use penalty for deviating from equally weighted observations

    Figure 4. Individual weights assigned to observations at T=120

    8

  • 1. Determine breakpoints

    Known methods to identify breakpoints or outliers

    • CUSUM(SQ)

    • Chow break test

    • Quandt-Andrews Sup F test

    • Studentized residuals / dfbetas/ dffits

    – 𝑦 = 𝑋𝛽 + 𝐷𝑗𝛾 + 𝜀,

    where 𝐷𝑗 is an (n × 1) indicator vector with 𝐷𝑗𝑗 = 1

    Motivation for new method:

    • Determine multiple breakpoints

    • Applicable to various statistical models

    9

  • Determine breakpoints

    Figure 5. Finding breakpoints at T=120

    10

    Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,

    where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

  • Determine breakpoints

    Figure 5. Finding breakpoints at T=120

    10

    Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,

    where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

  • Determine breakpoints

    Figure 5. Finding breakpoints at T=120

    10

    Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,

    where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

  • 1. Determine breakpoints

    𝑆 𝑡 = 1

    𝑊𝐼𝑁 𝑦 𝑝

    ¬ 𝑡 − 𝑦 𝑝

    ¬ 𝑡−1

    𝑇

    𝑝=𝑇−𝑊𝐼𝑁+1

    • Largest values of 𝑆 are breakpoints

    – Contiguous high values of 𝑆 form a ‘breakperiod’

    – Quick way to find many candidate breakpoints in real-time

    • Combination of test for outlier identification (‘leave-one-out’) and analyzing influence of configurations on posf (Hoornweg & Franses, 2013)

    11

  • 1. Determine breakpoints

    Alternative

    • Equally distribute breaks over treatment sample.

    • Adjust each breakpoint and select adjustment that leads to the biggest increase in forecasting accuracy of posf. Continue until no improvement is made (adjustment to Patient Rule Induction Method -PRIM-algorithm).

    • Computation time: 7.61 seconds instead of 2.35.

    Figure 6. Adjusting equally distributed breakpoints at T=120

    12

  • 2. Discrete weights

    1. Determine pMSFE of each period.

    – Periods with too few observations receive an average weight.

    2. Consider all possible combinations of leaving out periods.

    – Periods left in receive equal weights or inverse pMSFE weights

    𝑤𝑡 𝑖 =

    1 𝑣

    𝑒𝜏,𝑖 2𝑇

    𝜏=𝑇−𝑣+1

    −1

    1 𝑣

    𝑒𝜏,𝑗 2𝑇

    𝜏=𝑇−𝑣+1

    −1 𝑁 𝑗=1

    3. Select discrete weights with highest accuracy of posf

    Figure 6. Assigning weights to periods at T=120

    13

  • 3. Shrink

    • 𝑤𝑡 𝐸𝑋𝑃 =

    −log (1−𝑡/𝑇)

    𝑇−1 , for 𝑡 = 1, 2, … , 𝑇 − 1, and 𝑤𝑇

    𝐸𝑋𝑃 = log (𝑇)

    𝑇−1 (PPP, pp. 144)

    • 𝑤𝑡 𝐸𝑄𝑈𝐴𝐿

    = 1

    𝑇

    • 𝑤𝑡 𝑠ℎ𝑟𝑖𝑛𝑘 𝜑 = 1 − 𝜑 𝑤𝑡

    𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 + 𝜑𝑤𝑡 {𝐸𝑋𝑃,𝐸𝑄𝑈𝐴𝐿}

    , 𝜑 ∈ (0, 0.1, 0.2, … , 1)

    • 𝑅𝑀𝑆𝐹𝐸 𝜆 = 𝑀𝑆𝐹𝐸𝑊 + 𝜆 ∙ 𝑤𝑖−

    1

    𝑇 𝑇 𝑖=1

    𝑤𝑗 𝑚𝑖𝑛−

    1

    𝑇 𝑇 𝑗=1

    ∙ 𝑀𝑆𝐹𝐸𝐸𝑄𝑈𝐴𝐿 .

    Figure 7. Shrinking to exponential weights at T=120

    Figure 8. Shrinking to equal weights at T=120

    14

  • HF-weight

    • Exponentially weighted posf

    • Steps:

    1. Determine breakpoints (𝑆 𝑡 )

    2. Assign discrete weights to observations

    3. Shrink discrete weights towards equal or exponential weights

    Figure 9. Individual weights assigned to observations at T=120

    15

  • HF-weight

    • Exponentially weighted posf

    • Steps:

    1. Determine breakpoints (𝑆 𝑡 )

    2. Assign discrete weights to observations

    3. Shrink discrete weights towards equal or exponential weights

    • Ad hoc decisions:

    – posf:

    • #: 20

    • exponential

    – Maximum # of periods: 4

    – minOBS = 20

    • minimum # obs for periods to get an individual weight

    • minimum # obs in treatment sample.

    – 𝜆 = 0.5: Penalty for deviating from equally weighted observations

    Figure 9. Individual weights assigned to observations at T=120

    15

  • Simulation study • 𝑦𝑡 = 𝑋𝑡𝛽𝑡 + 𝜀𝑡 , 𝜀𝑡~𝑁 0,1 , 𝑣𝑡~𝑁 0,1 #simul=1000

    • Score: % better (-) or worse (+) 𝑀𝑆𝐹𝐸 in comparison to 𝑀𝑆𝐹𝐸(𝑦 𝐸𝑊)

    16

    DGP Mean1 Mean2 Random walk Regressor

    𝑋𝑡 1 1 𝑋𝑡−1 + 0.5 ∙ 𝑣𝑡 ~𝑁(0,1)

    𝛽1≤𝑡≤70 3 3 1 3

    𝛽71≤𝑡≤120 4 5 1 4

    𝛽121≤𝑡≤170 3 3 1 3

    Model

    HF-Weight -9 -36 -224 -35

    Exponential -11 -43 -252 -40

    Discrete -14 -53 -27