A. Candelieri, F. Archetti, E. Messina clustering to enrich analysis with the relationship between...

19
A. Candelieri, F. Archetti, E. Messina Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Italy Consorzio Milano Ricerche 1

Transcript of A. Candelieri, F. Archetti, E. Messina clustering to enrich analysis with the relationship between...

A. Candelieri, F. Archetti, E. Messina

Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Italy

Consorzio Milano Ricerche

1

Outline

Leakage management in Water Distribution Network

Water Analytics: Machine Learning & Hydraulic Simulation

Results from a real case study

Extending the approach according to new available ICT solutions • Extended Sensor Network & Automated Metering Readers • Complex (Social) Network Analysis approaches

2

(Traditional) Leakage Management

Mass Balance

Mass Balance and/or Minimum Night Flow on Zones/Subnetworks

Different approaches are possible, based on: analysis of acoustic signals (e.g., vibration sensors and hydrophones), ground penetrating radar (GPR), leak noise correlators gas injection

OUTPUT: Evaluating the amount of

non-revenue water

OUTPUT: Identifying a zone/sub-

network (e.g. DMA)

OUTPUT: Identifying a leaky pipe &

severity

3

Our proposal…

Leakage assessment

Leakage detection

Analytical Leakage localization and characterization

Leakage localization

Rehabilitation

The goal: Identifying a restricted set of pipelines «probably» affected by a leak in order to reduce time for (physical) localization and rehabilitation

The proposed workflow for performing :

Simulating virtual leaks of different severity level on each pipe, in turn (“leakage scenarios”)

Storing simulated pressure and flow values (in correspondence of the monitoring points in the real network)

Clustering scenarios according to the «similarity» of flow and pressure values Identifying the group most similar to the measured pressure and flow values measured Ranking the pipelines belonging to that group, also by the estimated leak severity

4

Water distribution network with 5 districts

A monitoring device at the entry point of each district

Toy-benchmark water supply network

5

For each pipe L and discharge coefficient C in {C0 , …, Cm} 1. Place a leak on L with discharge coefficient C in

the original network model (q = C pγ )

2. Simulate the leakage scenario 3. Store pressure Pi and flow Fi values at the

monitoring points end

L C P0 F0 … … PN FM

14 0.1 0.4 1.3 … … 0.3 0.5

14 0.2 0.5 1.4 …. … 0.1 0.9

… … … … … … … …

Leakage Localization Clustering ignoring L and C features

Estimation of Leakage Severity Regression on C based on the values of the other features (L is ignored)

The Machine Learning & Simulation approach

Leakage scenarios generation

Leakage scenarios simulation and dataset building

6

Algorithms (all based on Euclidean distance): Farthest First (a variant of k-means), Agglomerative, Induced Bisecting, In-deep Bisecting (a variant of Induced Bisecting)

Clustering fitness evaluation: Highly Localizing clusters - having less than 25% of pipelines of the network; Average Localizing clusters - having between 25% and 50% of pipelines; Poorly Localizing clusters - having between 50% and 75% of pipelines; No Localizing clusters - having more than 75% of pipelines.

The Clustering Step

High localizing cluster Medium localizing cluster Low localizing cluster

7

Real water supply network – Results Number of Clusters 50-70-100-150

991 pipes 30 discharge coefficients

991 x 30 = 29730 simulated leaks (number of scenarios) to be clustered in R7 according to 7 attributes ( pressures values at 6 monitoring devices and flow values at pump )

8

Results

9

10

Using Clustering for Analytical Localization of Leaks

We have a new vector of observed pressure and flow values which is assigned to a cluster according to some heuristic (e.g., closest centroid or k-NN)

Then, all the pipes belonging to that cluster are taken into account as probably leaky

11

Merging Clustering and results of MNF analysis on DMAs

Probably leaky pipes identified through clustering belong to different DMAs, then one can physically check only those in the DMA(s) suggested by MNF analysis: this step gives a significant reduction in the number

of “suspect” pipes to be checked

Leaky DMA according to Mass Balance/MNF

The regression model is a Least Median Squared Linear Regression

The pipes induced by the cluster (or in a specific DMA) are ranked/selected by their agreement between predicted and simulated discharge coefficient and, then, their frequency in the cluster.

Improving localization through Regression

L C P0 F0 … … PN FM

14 0.1 … … … … … …

.. … … … … … … …

… … … … … … … …

14 0.2 … … … … … …

14 0.3 … … … … … …

.. … … … … … … …

20 0.2 … … … … … …

.. … … … … … … …

.. … … … … … … …

333 0.1 … … … … … …

20 0.3 … … … … … …

Scenarios cluster

identified

predicted C=0.1 Pipes to check:

1) #14 2) #333 3) #20 4) …

12

Exploiting technological advances...

Wireless Sensors Networks

13

Exploiting technological advances...

An example of daily consumption patterns acquired through AMR (sampling rate 30’)

Hydraulic simulation – and therefore leakage scenarios generation – can be more accurate.

HOWEVER…

14

Pressure and flow time series at the monitoring points have to be stored and analyzed rather than single values!

Exploiting technological advances...

L C P0 F0 … … PN FM

14 0.1 … … … … … …

.. … … … … … … …

Each feature is a numerical value (without AMR)

Each feature is a time series (with AMR)

Possible solutions (work in progress)

Feature extraction (on each time series)

Distance measure (overall difference between

simulated and real time series)

The approach is the same but works on a wider set of features

Other approaches (e.g., local clustering and k-NN) could be more suitable than clustering

15

Exploiting mathematical advances...

Using complex (social) network analysis approaches for:

• Markov and Spectral graph clustering • Assessment of the overall reliability of the WDN • Graph-based representation of data points by introducing «richer» relations between instances (e.g., for leakage localization)

• Statistical properties, such as centrality, betweenness, degree distribution, etc.

• Identification of structurally relevant components of the WDN (junctions or pipes)

• Influence Diffusion Models • Water Quality-related network vulnerability

16

17

From Sensors Space to a Scenarios Network

Spectral clustering to enrich analysis with the relationship between scenarios

Data Affinity Graph Laplacian, Spectrum & Eigenvalues

2nd smallest eigenvectors ( bi-partitioning )

m smallest eigenvectors ( k-means in Rm )

Post Processing Clustering Evaluation

Traditional Clustering

Physical Space Sensors Space Scenarios Network Space

Spectral Clustering Traditional Clustering

18

Network Coverage Function – Spectral vs Traditional Clustering Preliminary Results

NCF(C) (%) = (Pipes ID in cluster C / Overall number of Pipe IDs)*100

A. Candelieri, F. Archetti, E. Messina

Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Italy

Consorzio Milano Ricerche

19