LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time...

10
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/326838388 LSTM network time series predicts high-risk tenants Presentation · July 2018 DOI: 10.13140/RG.2.2.19849.95849 CITATIONS 0 READS 5 6 authors, including: Some of the authors of this publication are also working on these related projects: One Small Step for Machine, One Giant Leap for the Market: Lessons Learnt from a Real World Machine Learning Project View project Bus Service Optimisations View project Wolfgang Garn University of Surrey 17 PUBLICATIONS 22 CITATIONS SEE PROFILE All content following this page was uploaded by Wolfgang Garn on 06 August 2018. The user has requested enhancement of the downloaded file.

Transcript of LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time...

Page 1: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/326838388

LSTM network time series predicts high-risk tenants

Presentation · July 2018

DOI: 10.13140/RG.2.2.19849.95849

CITATIONS

0READS

5

6 authors, including:

Some of the authors of this publication are also working on these related projects:

One Small Step for Machine, One Giant Leap for the Market: Lessons Learnt from a Real World Machine Learning Project View project

Bus Service Optimisations View project

Wolfgang Garn

University of Surrey

17 PUBLICATIONS   22 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Wolfgang Garn on 06 August 2018.

The user has requested enhancement of the downloaded file.

Page 2: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

1

LSTM network time series

predicts high-risk tenants

WOLFGANG GARN, Y IN HU, PAUL NICHOLSON, BEVAN JONES,

HONGYING TANG, WILLIAM WRIGHT

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

OverviewBenefits to the real estate sector decreasing lost revenue, increasing efficiency providing more help to tenants before their debt becomes

unmanageable

How? Measure arrears risk for each individual tenant differentiate between short-term and long-term arrears risk predict trajectory of arrears Identify factors that can be operationally used to assist tenants

arrears management

Structure Background Data - arrears Risk & Factors Method - LSTM – network Results & Insights

AbstractIn the United Kingdom, local councils and housingassociations provide social housing at secure, low-renthousing options to those most in need. Occasionally manytenants have difficulties in paying their rent on time and fallinto arrears. The lost revenue can cause substantial financialburden for these agents, while falling into arrears can causestress to tenants. An efficient arrears management scheme isto target those who are more at risk of falling into long-termarrears so that interventions can be used for persistent loss ofrevenue. In our research, a Long Short-Term Memory Network(LSTM) based time series prediction model is implemented todifferentiate the high-risk tenants from relatively temporaryones. This model measures the arrears risk to differentiatebetween short-term and long-term arrears risk and predictsthe trajectory of arrears for each individual tenant. Moreover,further arrears analysis is conducted to investigate whichfactors could provide more assistance for tenants before theirdebt becomes unmanageable. A five-year rent arrears datasetis used to train and evaluate the proposed model. The rootmean squared error (RMSE) is used to punish large errors bymeasuring of the differences between the arrears actuallyobserved and arrears predicted by a model. Hence, this modelbrings benefits to the real estate sector by decreasing lostrevenue, increasing efficiency and providing more help totenants before their debt becomes unmanageable.

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

BackgroundHastoe Housing Association Creating opportunities to innovate

Sustainable Homes Ltd Influencing policy and practice

University of Surrey - Computing and Business School

The associate

Innovate UK funding for 3 years

Knowledge Transfer Partnership

Page 3: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

2

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Content

BackgroundData – rent arrears & factorsMethod - LSTM – network Results & Insights

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Data sourcesHeterogeneous sets Excel maps, databases, …

Mosaic UK Geodemographic classification of households

Geographic's of vacant homes Deprived areas more likely to be in rent arrears

Finances The rich, the poor and the rest

Rent arrears (2555 tenants)

Data: five years John Naisbitt wrote “We are drowning in information but starved for knowledge.”

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Scatter plots & Correlations

Correlation analysis of housing beneficiaries does not support associations between factors

Factors◦ Energy Efficiency (ree)

◦ Housing Benefits (rhb)

◦ Rent in arrears (rma)

ree -0.014 -0.004

-0.014 rhb -0.122

-0.004 -0.122 rma

r values

10/07/2018 6

SA

P r

ating

Housin

g b

enefit

Month

in a

rrears

SAP rating Housing benefit Month in arrears

0.00

0.02

0.04

0.06

Corr:

-0.0139

Corr:

-0.00416

0

50

100

150

200

Corr:

-0.122

-10

0

10

40 60 80 0 50 100 150 200 -10 0 10

Page 4: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

3

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Indirect factorsBenefits >> Energy efficiency >> rent arrears

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Rent arrears & Energy efficiencyA detour … the whole housing market

SAP = Standard Assessment Procedure for energy efficiency

Surprisingly much rent arrears

Lowest SAP rating –> highest rent arrears

Highest SAP rating -> lowest rent arrears

Other SAP ratings debatable

F E D C B A

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Chase for featuresRent arrears and average SAP rating

Average at a certain level>>

potential for systematic feature engineering?

From a subject matter expert’s view:Rent arrears drops as the SAP rating increases

s

Page 5: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

4

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Averages are better!?Correlations become visible

More importantly patterns become visibleFeature engineering “can start”

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Rent arrears & SAP “again”The better the property the less rent arrears

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Feature list78 features were considered

Feature List (ranked by importance to management cost)

SAP 2015 rating

Year of Built

Max Occupants

Number of Bedrooms

Urban (Urban to Rural)

Average of Outdoors Sub-domain Decile (where 1 is most deprived 10% of LSOAs)

Bedrooms (Many to Few)

Environment (Care to Don't Care)

Age (Young to Old)

Technology (Adopt to Reject)

Residency (Long to Short)

Facebook Usage (High to Low)

Income (High to Low)

Children (High to Low)

Smartphone (Yes to No)

Rented (High to Low)

Property (???? to ?)

Mortgage Debt (High to Low)

Twitter Usage (High to Low)

Health (Good to Bad)

Page 6: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

5

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Content

BackgroundData - arrears Risk & FactorsMethod - LSTM – network Results & Insights

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

LSTM network - intro

Source: Cloah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

memory adaptation

forg

et

Forget gate

input gate

candidates

New “memory” cell state

input

output

output

memory

output gate

outputLong Short-Term Memory (see Hochreiter & Schmidhuber, 1997)

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Content

BackgroundData - arrears Risk & FactorsMethod - LSTM – network Results & Insights

Page 7: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

6

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Arrears prediction

Time-series forecasting

85.4% accuracy over 18mths

Tenant Clusters – for policies

Towards automated processes

Human intervention where most needed

Time [month]

Arr

ears

[m

on

th] f

or

ten

ant

x

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Quality - Confusion matrixMajority of arrears and non-arrears are identified correctly.

Assuming difference between actual arrears and predicted arrears <£50 is acceptable for certain month

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Arrears per clusterSix types of tenants were identified

All levels are “pretty” stable Four distinctive levels of avg. rent arrears

Three types of tenants with similar levels oscillate around the zero level

One type has “negative” rent arrears i.e. pay rent in advance

~£400/month in arrears tenants Tenant’s benefit: comparable to a £400 free credit

Association’s cost: £146.8k x 30% = £44.0k per year (for cluster) 30% = administrative cost related to reminders & help strategies

~£1,000/month in arrears tenants Tenant’s benefit: comparable to a £1,000 free credit

Association’s cost: £159k x 30% = £47.4k per year (for cluster)Time [month]av

g. r

ent

arre

ars

per

clu

ster

Page 8: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

7

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Arrears – regional distributionGood predictions for east & west

Prediction for south “interesting”

Geography matters ->

model can be improved using this

East (201x) arrears: 34.8%• Actual - Predicted: 4.93%

South (201x) arrears: 35.7%• Actual - Predicted: -23.8%

West (201x) arrears: 29.5%• Actual – Predicted: 4.37%

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Results & Insights - summaryBenefits to the real estate sector decreasing lost revenue,

“initial customer selection”, Tackle lowest SAP properties

increasing efficiency “credit” acceptance rather than “chasing” (or automatization)

providing more help to tenants before their debt becomes unmanageable

How? Measure arrears risk for each individual tenant

85.4% accuracy

differentiate between short-term and long-term arrears risk

predict trajectory of arrears

Identify factors that can be operationally used to assist tenants arrears management £rent arrears, rel. rent arrears, region, SAP rating, benefits

Time [month]Arr

ears

[m

on

th]

for

ten

ant

x

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

AppendixoBlogs: Sustainable Homes (Hastoe) - KTP project - machine learning techniques to gain insights into the sustainability of homes o The rise of the machines - learning for environmental good and cost savings (Blog, May 24, 2017)

o Are we “boiled” for choice? (Blog, August 3, 2017)

o AI detects roof status using Drones (BA MSc dissertation summary, October 10, 2017)

oHastoe Analytics, OR & Data Science applications

oNeural Networks

oReal estate valuation using regression models and artificial neural networks: An applied study in Thessaloniki by Alexios Georgiadis

Page 9: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

8

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Five years – long term

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Machine Learning in HastoeAn overview

Machine Learning

in Hastoe

Improve service

Reduce cost

Optimise decision

Increase income

Arrear collection

Warning prediction

Investment decision

Arrear collection

startgy

Trickle sales

Proactive and targeted

service

Tenant classification

Repairs analysis

Component

replacement prediction

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

Data Science in HastoeOverview

Page 10: LSTM network time series predicts high-risk tenantsepubs.surrey.ac.uk/849291/1/LSTM network time series predicts hig… · LSTM network time series predicts high-risk tenants Wolfgang

10/07/2018

9

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

http://www.asimovinstitute.org/neural-network-zoo/

LSTM network time series predicts high-risk tenantsWolfgang Garn, Yin Hu, Paul Nicholson, Bevan Jones, Hongying Tang

View publication statsView publication stats