Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in...

Post on 22-Jul-2020

3 views 0 download

Transcript of Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in...

Network Complexity and Spatio-Temporal Data Mining (STDM)

Dr Tao Cheng + STANDARD team {tao.cheng@ucl.ac.uk} Senior Lecturer in GeoInformatics Department of Civil, Environmental and Geomatic Engineering (CEGE) University College London

Outline

•  Nature of Network complexity •  Its challenges for STDM •  Case studies from the STANDARD project •  Future directions for NC and STDM

Challenges - Network Complexity 1) Heterogeneity (structure & performance)

- nonlinearlity - nonstationarity (MAUP problem in GIS)

Great progress in describing structure (e.g. power-laws) of

‘what is’, but how to model and predict nonlinear and nonstationary

performance?

Challenges - Network Complexity 2) Dynamics

- changes in physical structure (nodes & links) - implications for supply/capacity changes - changes in movement patterns on the network (density/flow/speed; behaviour)

- leads to changes in demand Much progress in modelling supply - demand interactions at the

macroscopic level, but - lack of clarity about implications for individual behaviours

and their collective effects; - No readily available tools to demonstrate or capture the

transition from free flow to congestion

Challenges - Network Complexity 3) Interactions & Associations

- spatial (upstream/downstream) - temporal (past/present/future) - spatio-temporal - multiple factors (incidents, weather, big events,..) - multiple networks

We accommodate spatial or temporal associations

(autocorrelations), but -  Fail to integrate treatment of spatio-temporal

autocorrelation simultaneously -  Failure to consider multiple networks

Research Frontiers in Network Complexity 1) Forecasting and prediction

- nonlinearlity & nonstationarity 2) Tools to capture/illustrate the processes

- Emergence and tipping points - Simulating behaviour (macroscopic properties alter because of accumulated microscopic changes)

3) Spatio-temporal dependence and interactions - impact of activities on the network

- interactions between networks

BigData – empirical theory and testing

•  Short-term and long-term journey time prediction –  STARIMA; ANN; Kernel-based approach

•  Early detection of traffic congestion –  clustering: STC; STSS

•  Interactive visualization of journey time reliability and traffic congestion –  2D (hotspot); 3D(wall-map; isosurface)

•  Simulation of non-recurrent congestion –  Agent-based simulation

•  Intervention Analysis (weather, tube strike, road works) –  regression

STANDARD – Spatio-Temporal Analysis of Network Data and Route Dynamics understand traffic congestions in space-time

Space-time prediction & forecasting The challenge lies in the non-stationary (heterogeneity) and non-linearity of space-time data.

Statistical Approaches •  STARIMA models •  space-time geostatistical

models •  spatial panel data models •  space-time GWR How to calibrate the spatio-temporal autocorrelations is the bottleneck.

Machine Learning Approaches •  artificial neural networks

(ANNs) •  self-organized maps •  Genetic algorithms •  support vector machines

(SVMs) •  Kernel-based approach The interpretability of machine learning is low

Real  &me  traffic  forecas&ng  

9  

James Haworth  

10  

Interval   Naïve   ARIMA   STARIMA   LSTARIMA  5  minutes   49.4   47.4   55.9   46  15  minutes   74.7   68.7   89.1   67.3  30  minutes   93.2   82.1   109   80  

Results  –  Root  mean  squared  error  (seconds/kilometre)  

James Haworth & Jaiqiu Wang: Space-Time Modelling and Prediction  

Space-time clustering To extract meaningful patterns (clusters)

•  To detect outliers or emerging phenomena (epidemic outbreaks or traffic congestion)

•  Considering the spatial, temporal and thematic attributes seamlessly and simultaneously, and the dynamicity in the data is the most difficult challenge in spatio-temporal clustering

•  Spatio-temporal scan statistics (STSS) sheds lights on this aspect

•  Efforts are needed to improve computation efficiency and to reduce the false alarm rate of STSS

Clusters of Congestion 25 May 2010 – State Opening of Parliament

Berk Anbaroglu - STSS for early detection of non-recurrent traffic congestion

Space-time visualisation Explores the patterns hidden in the large data sets

•  using advanced (analytical) visualization and animation –  static 2D maps –  3D wall maps and isosurface (hotspots in space-time)

•  Tools: “Visual Analytics” and “Geovisual Analytics” •  Still, real-time visualization of dynamic processes is still very

challenging due to large volume and high dimensions of the data. •  Methods are needed to show evolution and dissipation in space

and time simultaneously (e.g. crime or traffic congestion)

Space-Time Visualisation: data -> process, story traffic congestion in space-time (1)

Cheng, Emmonds, Tanaksaranond, Sonoiki (2010), Multi-Scale Visualisation of Inbound and Outbound Traffic Delays in London, The Cartographic Journal, 47: 323–329.

Visualization of traffic congestion in space-time (2)

3D Wall maps of inbound roads on 6th – 7th September 2010

Top view

Side view

Isosurface

Visualising Congestion Build-up in London 3D Wall Map Travel Time Interactive Visualization Tool

Garavig Tanaksaranond – Space-Time Visualisation of Traffic Congestion

 •  Understanding formation of congestion

through the behaviour of individual drivers •  How do drivers react when faced with road

closure? •  Depends on the urban environment,

individual knowledge of the network and conditions, and behaviour of others

•  Behaviour of individuals (microscopic behaviour) influences the formation and movement of congestion (macroscopic phenomena)

(Manley & Cheng, 2010)

Space-­‐Time  Mul&-­‐Agent  Simula&on    

SPREAD  OF  CONGESTION  

Regent’s Park

Hyde Park

Saturation 0 – 0.2

0.2 – 0.4

0.4 – 0.5

0.5 – 0.6

0.6 – 0.7

0.7 – 0.8

0.8 – 0.9

0.9 – 1.0

1.0 – 1.2

1.2 – 1.5

> 1.5

Ed Manley – Agent-based Simulation

Machine    Learning  

LocaHon  InformaHon  

GPS  

Mode  of  Transport  &  Stops  

h"p://www.homepages.ucl.ac.uk/~ucesadb/video.html  

GPS  Tes=ng  data:  110  par&cipants,  2  Months/  par&cipant  ,  20  second  collec&on  rate  All  par&cipants  based  in  Greater  London  

Adel Bolbol Fernandez - Understanding Travel Behaviours from GPS Data Logs

Future Directions of STDM/NC (1) •  New methods and theory are needed for mining crowd sources that

contributed by citizens and volunteers including social media data –  often extremely noisy, biased, and nonstationary, e.g. trajectory data –  Method needed to combine text mining with STDM –  This area is relevant to the recent development of citizen sciences and

VGI in particular.

•  Theory and methods need to be developed to extract meaningful patterns from those individual sensors and put them under the framework of networks and network complexity such as transport and social-networks made up of those individual.

•  Under network, the interaction and dynamic flows should be considered in mining spatio-temporal patterns.

•  This aspect is relevant to the complexity theory and network dynamics in particular.

Future Directions (cont.) •  STDM for emergency and tipping points, i.e. how to generate actionable

knowledge, i.e. finding the emergent patterns and tipping points of economics and epidemics?

•  It is important to find outliers, but more important is finding the critical points before the system breaks down so that mitigating action can be taken to avoid the worst scenarios such as traffic congestion and epidemic transmission.

•  Another challenge of STDM is how to calibrate, explain and validate

the knowledge extracted. •  A good example of this is the calibration of spatial (or spatio-temporal)

autocorrelation. Higher order spatial autocorrelation models have been developed, but the pitfalls have also been found (LeSage and Pace 2011).

•  This makes machining learning more promising in future STDM.

Future Directions (cont.) •  grid computation and cloud computation

–  Key for scaling the algorithm to large network •  Open sources (data + software + algorithms) •  Online computation •  Real-time computation

•  More systematic applications –  CPC

•  …

Acknowledgements      

hKp://standard.cege.ucl.ac.uk  

+  Dr  Andy  Chow    +  Colleagues  in  TfL