Forecasting values of commercial and residential property using non-linear mathematical and...

36
Forecasting values of commercial and residential property using non-linear mathematical and statistical techniques Chris Satchwell, Mandy Bradley Technical Forecasts Ltd Commercial House, 19 Station Road Bognor Regis PO21 1QD Phone / fax 01243-861110 / 861113 http://www.tfl.biz

Transcript of Forecasting values of commercial and residential property using non-linear mathematical and...

Forecasting values of commercial and residential property

using non-linear mathematical and statistical techniques

Chris Satchwell, Mandy BradleyTechnical Forecasts Ltd

Commercial House, 19 Station Road

Bognor Regis PO21 1QD

Phone / fax 01243-861110 / 861113

http://www.tfl.biz

The need for property forecasts

• Quantification of market direction gives best appraisal for most profitable asset

• Facilitates planning for acquisition or disposal further in advance of market turning points

• Saves valuable research time by presenting forecasts in easily analysable format

• Mass forecasting capability aids portfolio analysis by providing most current forecasts for all portfolio and candidate properties

Information in historical data

“Historical market performance is not a reliable indicator of future market behaviour”

Yet .. would anyone present disagree with the following statement:

“Historical data contains some information about future movement”

Relationships in other data series

• Leading indicators• Sought after, and implemented for many years by

property professionals

• Recognise that traditionally strong leading indicators have shown diminished correlations and subsequent ability to provide information over the past few years.

• Parallel series• Similar to leading indicators, and have recognised

mathematical relationships to the Target series

• Improve the forecastability of a Target series

Overview Of Twin-Series Forecasting

Window Of 'Target' TimeSeries Values

Window of 'Associated'Time Series Values

Choice of 'Associated' Series to AidForecast of Target Series

Multiple Models of RelationshipsbetweenTarget & Associated Series

Forecasts Derived From Models

Measuring associations between time series• Form a 1 D histogram of ‘target’ time series values• The data’s disorder (or entropy) is found by summing –

p.ln(p), where ‘p’ is the probability of a ‘bin’ of the histogram.

• Call this E1.

• Form a 2 D histogram of values of both target and associated series and sum –p.ln(p) for each 2D ‘bin’. Call this E2.

• The REDUCTION in disorder from including the new series is (E1 – E2).

• The greater the reduction in disorder (Mutual Information), the stronger the association between the series.

Design issues

Requirements:

1. Capable of generating thousands of forecasts every month from committees of non-linear models

2. Robust

3. As accurate as possible

4. Models need to be complexity-optimised to avoid instabilities

5. Ideally, models should be uncorrelated

6. Capable of getting the best results possible from limited data

Possible network solutions1. MLP’s – accurate but quirky, do not lend themselves to

automation.

2. RBF’s – less forgiving of irrelevant inputs, but can be made accurate and robust. These are used. Unsupervised clustering gives centres of RBF’s.

Possible Complexity Optimisation Solutions1. Cross Validation – difficult with little data & difficult to

automate interpretation of Error v. Complexity graphs

2. MAP – requires multi-dimensional integration capabilities that limit the dimensionality of the problem to which it can be applied. Not robust.

3. Evidence Approximation – issues on robustness, but this was used.

What they don’t tell you about the Evidence Approximation

• Implicit assumption is that a multi-dimensional surface can be fitted through data, such that the likelihood of any ‘noise’ data decays as you move away from the surface.

• It is a technique for multi-dimensional signal extraction from noisy data.

• If the data does not comply with these assumptions, or is pure noise, the method may fail.

• It finds an appropriate amount of regularisation to generalise an over-complicated model, and will not work if the initial model is too simple.

• It is easier to apply to RBF’s than MLP’s

• If it fails, it is probably an indication that the data cannot be sensibly modelled, which is useful to know.

What they do tell you about the Evidence Approximation

• Of all the possible weight-dependent models that could describe the data, a set of weights exist (w MP) that produce a unique maximum for the probability of the model correctly representing the data.

• As the values of weights diverge from w MP the probability of the model being correct decays.

• The two previous points imply that we expect the variation of weights with the probability of a model being correct, to be capable of being expressed by a multi-dimensional Gaussian surface.

• When the maths are worked through (Bishop Ch. 10) this is equivalent to adding a ‘sum of weights squared’ term to a least squares error function.

• A minimum of this function gives the weights than maximize the chances of the model being correct.

Basic formulae

y = j wj.j(x) (1) Formula for RBF

 ED = 0.5 n(tn - j wj.j(xn))

2 (2) L/ S Err. Func.

 EW = 0.5j wj

2 (3) Wt. Comp. of Err. Func.

 Minw [.EW + .ED] (4) Err. Func. to be

minimised

Solution formulae

[n i(xn). j(xn)]{w} = { n i(xn).tn} (5) Soln. to (1)

 [H] {w} = { n i(xn).tn} (6) ..

  (7) Eigenvalues of [H]

 W (8) Number of Weights

  2 EW

MP = W - i / (i + ) (9) Condition for most

probable model

Solutions• The end product of this process (3,000,000 RBF’s/ month) is

either (1) a failure to achieve an answer or (2) a set of weights for the most probable RBF model fitting the data.

• In the event of a failure, it is possible to reduce the width of receptive fields or increase the number of basis functions to try to achieve success. In extremes, success is achieved with ‘spiky’ basis basis functions that probably offer a worse solution that one arrived at by a combination of eye and cross validation, but which is too simple to allow an evidence approximation solution.

• Overall conclusion is not to assume a solution works better in practice just because it has been derived using Bayesian methods.

Committee issues• Models with uncorrelated errors can be combined to

produce an overall error inversely proportional to their number.

• In practice, most models are correlated.

• We use models with different inputs in an attempt to reduce correlations between model errors.

• We have experimented with the covariance method and quadratic programming ( Minz | zT. C. z | s.t. i zi = 1 & 0 <= zi <= 1), but currently use straight averaging of model outputs for our forecasts.

Forecasting issues• The basis of a forecast is to sense if a relationship exists

(Mutual Information), model it (RBF/ Evidence Approximation), assume it continues into the future, and use it to generate results.

• Where the relationships are strong and consistent, the answers tend to be good. Where they are weak or inconsistent, they may not be so good.

• This means that quality can never be guaranteed, only the ability to see how well we would have performed had we used the method on historic data, to produce a forecast that is capable of being compared with more recent data.

The Forecasting Process

Forecast each of 200 (approx) econometricand Social series one step ahead

Far Enough Ahead?

n

Store Forecasts

y

Forecast each of 30,000 (approx) postcodesector house price series, using econometric

& social forecasts as associated series

Summary of forecasting

1. Determine the relationship between target series (eg Land Registry house price data) and economic and financial indicators (eg employment rates, construction indices, lending rates…)

2. Pick out the most significant series that share information with the target series

3. Forecast the target series alongside each of the parallel series

4. Fuse all this data to produce a single forecast that has the highest probability of replicating future movement

About Land Registry data

• Postcode: EX4 4QJ

• Postcode area – EX• 104 in England & Wales

• Postcode district - EX4• Around 2500 in England & Wales, • Average around 20000 addresses in each

• Postcode sector - EX4 4• Average around 3000 addresses in each• but varying from under 500 to over 8000

DetachedDetached

Semi-detachedSemi-detached

TerracedTerraced

Flats/MaisonettesFlats/Maisonettes

… … at least 3 sales at least 3 sales

per typeper type

per quarterper quarter

Building the data set

LandReg data gives 1 quarter’s average prices to postcode sector level for each property type

… SO just add it to the previous quarter’s data ! ?

• Historic data updates • Missing data• Errors in data• New postcodes / old postcodes• … and so on

What sort of accuracy does this deliver?

Average deviation from actual, Sep00-Sep02

Most forecastable

districts

Most forecastable

sectors

Detached houses -8.8% -7.9%

Semi-detached houses -11.1% -10.5%

Terraced houses -10.6% -9.9%

Flats / Maisonettes -8.7% -8.3%

What sort of accuracy does this deliver?

Within 15% of LandReg actual, Sep00-Sep02

Most forecastable

districts

Most forecastable

sectors

Detached houses 77% 76%

Semi-detached houses 62% 69%

Terraced houses 68% 73%

Flats / Maisonettes 75% 72%

Assessing accuracy

Residential data is sparse, and often highly volatile –

eg Detached houses in London N6:

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

Dec-99 Mar-00 Jun-00 Sep-00 Dec-00 Mar-01 Jun-01 Sep-01 Dec-01 Mar-02 Jun-02 Sep-02 Dec-02

N6 4 N6 6

Accuracy at a point in time?

• Accuracy at one specific period may be misleading as an overall measure

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

N6 4 N6 6 N6 4 ma

Volatility measure,

where

(SD of avge hse price) and

ln(return on hse value)

2_

1

)(1

1uu

ns

n

ii

s

1

lni

ii S

Su

Combined approach to forecastability

• RMS error, where = /

• If >> 1, Model may be too simple for data

• If << 1, Model might be trying to model noise

• …. except where << 1

Historic forecastability classification

• IF ( <0.1) AND ((0.9< <1.1) OR ( <0.05)) THEN = A

• IF ( <0.2) AND ((0.8< <1.25) OR ( <0.08)) THEN = B

• IF ( <0.35) AND ((0.6< <1.5) OR ( <0.12)) THEN = C

• ELSE = U

A (genuine) volatile property series:

… … where next, for next 3 years?where next, for next 3 years?

??

60

70

80

90

100

110

120F

eb-8

7

Feb

-88

Feb

-89

Feb

-90

Feb

-91

Feb

-92

Feb

-93

Feb

-94

Feb

-95

Feb

-96

Feb

-97

Feb

-98

Feb

-99

Same series, with associated series:

Any easier?Any easier?

??

-50

0

50

100

150

200F

eb-8

7

Feb

-88

Feb

-89

Feb

-90

Feb

-91

Feb

-92

Feb

-93

Feb

-94

Feb

-95

Feb

-96

Feb

-97

Feb

-98

Feb

-99

Indexbase '87GDQL

BCIS

FBYN

GMPag

CKYW

FTAP

How TFL managed:

70

75

80

85

90

95

100

Jan-98 Jan-99 Jan-00 Jan-01 Jan-02

Actualindex

ForecastfromMar99

How TFL managed:

70

75

80

85

90

95

100

Jan-98 Jan-99 Jan-00 Jan-01 Jan-02

Actualindex

ForecastfromMar99

Forecastshifted by10

Semi-detached houses in GU2 – forecasts vs actual from Dec99-Dec02

90

95

100

105

110

115

120

125

130

135

Dec-99

Mar-00

Jun-00

Sep-00

Dec-00

Mar-01

Jun-01

Sep-01

Dec-01

Mar-02

Jun-02

Sep-02

Dec-02

GU2 forecast

GU2 act

Terraced houses in UB6 - forecast vs actual, Dec99 to Dec02

90

100

110

120

130

140

150

160

Dec-99

Mar-00

Jun-00

Sep-00

Dec-00

Mar-01

Jun-01

Sep-01

Dec-01

Mar-02

Jun-02

Sep-02

Dec-02

TFL forecast

Actual vals

Central London office rental values forecast vs. actual from March 1999

100

110

120

130

140

150

160

TFL forecast

Actual

forecast from IPD’s RegionalPages seriesforecast from IPD’s RegionalPages series

Uses to date include:

1. Investment decisions involving properties

2. Newspapers wanting content

3. Web sites seeking to increase ‘Stickiness’.

4. Future crime rates (for a police force)

5. Government decisions involving land, acquisition of computer/ office equipment & other sundries that need forecasting..

…Questions??

Chris Satchwell, Mandy BradleyTechnical Forecasts Ltd

Commercial House, 19 Station Road

Bognor Regis PO21 1QD

Phone / fax 01243-861110 / 861113

http://www.tfl.biz

…. Yes, Dr Nabney???