HVAC_CSIRO_Proof_2015
-
Upload
mitchell-yuwono -
Category
Documents
-
view
77 -
download
0
Transcript of HVAC_CSIRO_Proof_2015
A
UcH
MQ1
Sa
b
c
a
ARRAA
KDQ4CFEPFH(NwtH
Q3
1Q5
iiac[ia
Q2
((
h1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
ARTICLE IN PRESSG ModelSOC 2983 1–24
Applied Soft Computing xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Applied Soft Computing
j ourna l h o mepage: www.elsev ier .com/ locate /asoc
nsupervised feature selection using swarm intelligence andonsensus clustering for automatic fault detection and diagnosis ineating Ventilation and Air Conditioning systems
itchell Yuwonoa,∗, Ying Guob, Josh Wallc, Jiaming Lib, Sam Westc, Glenn Platt c,teven W. Sua
Faculty of Engineering and Information Technology, University of Technology, Sydney (UTS), 15 Broadway, Ultimo, NSW 2007, AustraliaThe Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Computational Informatics, Marsfield, NSW 2122, AustraliaThe Commonwealth Scientific and Industrial Research Organisation (CSIRO), Division of Energy Technology, Mayfield West, NSW 2304, Australia
r t i c l e i n f o
rticle history:eceived 4 May 2014eceived in revised form 12 February 2015ccepted 17 May 2015vailable online xxx
eywords:ata clusteringonsensus clusteringeature selectionnsemble Rapid Centroid Estimation (ERCE)article Swarm Optimizationault detection and diagnosiseating Ventilation and Air Conditioning
a b s t r a c t
Various sensory and control signals in a Heating Ventilation and Air Conditioning (HVAC) system areclosely interrelated which give rise to severe redundancies between original signals. These redundanciesmay cripple the generalization capability of an automatic fault detection and diagnosis (AFDD) algo-rithm. This paper proposes an unsupervised feature selection approach and its application to AFDD ina HVAC system. Using Ensemble Rapid Centroid Estimation (ERCE), the important features are auto-matically selected from original measurements based on the relative entropy between the low- andhigh-frequency features. The materials used is the experimental HVAC fault data from the ASHRAE-1312-RP datasets containing a total of 49 days of various types of faults and corresponding severity.The features selected using ERCE (Median normalized mutual information (NMI) = 0.019) achieved theleast redundancies compared to those selected using manual selection (Median NMI = 0.0199) CompleteLinkage (Median NMI = 0.1305), Evidence Accumulation K-means (Median NMI = 0.04) and Weighted Evi-dence Accumulation K-means (Median NMI = 0.048). The effectiveness of the feature selection method is
HVAC) systemonlinear Auto-Regressive Neural Networkith eXogenous inputs and distributed
ime delays (NARX-TDNN)idden Markov Model
further investigated using two well-established time-sequence classification algorithms: (a) NonlinearAuto-Regressive Neural Network with eXogenous inputs and distributed time delays (NARX-TDNN); and(b) Hidden Markov Models (HMM); where weighted average sensitivity and specificity of: (a) higherthan 99% and 96% for NARX-TDNN; and (b) higher than 98% and 86% for HMM is observed. The proposedfeature selection algorithm could potentially be applied to other model-based systems to improve thefault detection performance.
41
42
43
44
45
46
. Introduction
Heating Ventilation and Air Conditioning (HVAC) systems aremportant for maintaining the thermal comfort and indoor air qual-ty at places such as offices, shopping malls, warehouses, schools,nd homes [1,2]. According to the report by CSIRO [3], 25% of energy
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
onsumption in Australia is accounted from commercial buildings3]. Moreover, HVAC systems represents 40–50% of energy usen these buildings [4]. In the United States (US), HVAC systemsccount for almost 31% of the electricity consumed by households
∗ Corresponding author. Tel.: +61 430731938.E-mail addresses: [email protected] (M. Yuwono), [email protected]
Y. Guo), [email protected] (J. Wall), [email protected] (J. Li), [email protected]. West), [email protected] (G. Platt), [email protected] (S.W. Su).
ttp://dx.doi.org/10.1016/j.asoc.2015.05.030568-4946/© 2015 Published by Elsevier B.V.
47
48
49
50
51
© 2015 Published by Elsevier B.V.
[1]. Operational problems in the HVAC systems can cause excessenergy consumption. Regular checks and maintenance are there-fore crucial to prevent unnecessary consumption. However, due tothe high reactionary maintenance costs, preventive or predictivemaintenance practices are usually preferred to reactionary main-tenance.
Discriminating a normally behaving HVAC system to a faultcondition is a relatively well researched area. A variety of auto-matic fault detection and diagnosis (AFDD) techniques provide anumber of benefits to the HVAC systems [5–7]. The current AFDDtechniques available in the market for HVAC systems are mainly
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
rule-based approaches [8–10], which obtain prior knowledge toderive a set of if-then-else rules and an inference mechanism thatsearches through the rule-space to draw conclusions. The rule-based systems can be based solely on expert knowledge (inferredfrom experience) or can be based on prior knowledge of a specific
52
53
54
55
56
INA
2 oft Com
sdp
iterimtmmTtsttwaortfiatA
rwfpm(fottgs
bdifcTstiasicgid
ceomimae
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
ARTICLEG ModelSOC 2983 1–24
M. Yuwono et al. / Applied S
ystem. Being one of the very first methods used in HVAC faultetection problems, the rule-based approaches have been mostopularly used over the last decades.
Indeed the rule-based approaches come with advantagesncluding ease of development, transparent reasoning, abilityo reason even under uncertainty, and the ability to providexplanations for the conclusions reached. However, one mustealize that most HVAC systems are installed in different build-ngs/environments. This generally means that rules or analytical
odels developed for a particular system cannot be easily appliedo an alternative system. As such, the difficult process of deter-
ining and setting rules or generating analytical mathematicalodels must be tailored to each individual building/environment.
he threshold method utilized in the rule-based system is proneo producing false alarms. Moreover, building conditions such astructure of the internal architecture design and even external fac-ors (such as shading and the growth of plant life) often change afterhe system installation/initialization of a fault detection system,hich can require rules/models that were originally appropri-
te to be revisited and updated. It can be learned that a numberf weaknesses associated with this type of approach include theequirement of specific tailoring to a system, potential failure ofhe AFDD system due to its limited knowledge boundaries, and dif-culty in updating the model when the AFDD system is installed in
different HVAC system. The aforementioned complications withhe rule-based approach give rise to the data driven methods forFDD in HVAC systems.
Regardless of the approach, the performance of an AFDD algo-ithm generally depends on the quality of the features. In CSIRO,e are developing a novel data-driven machine learning technique
or AFDD in HVAC systems [4,11–14]. Preliminary results wereresented in [11–14], showing the superior performance of theachine learning-based technique in detecting air-handling unit
AHU) faults to rule-based methods based on fault data obtainedrom ASHRAE Project 1312-RP up to 90% accuracy [13]. However,ne limitation of the AFDD systems described in [11–13] is thathey rely on features provided by field experts. As with rules, fea-ures that are particularly effective for a particular system may notuarantee equivalent performance when utilized in an alternativeystem.
Selecting the appropriate features is essential in any model-ased frameworks. Feature selection aims for minimizing redun-ancies/mutual information between features such that the more
mportant ‘characteristic’ features are not undermined. Specificaults exhibit specific symptoms which are observable only inertain clusters of features that behave differently to the others.he difficulty is that these cluster of features need to be con-tantly monitored as they may change dynamically depending onhe condition of the HVAC system under investigation. Moreover,ncorrect selections of these characteristic features are dangerouss they may adversely effect the final classifier to an extent thatome obvious faults are overlooked. The motivation of this papers therefore to design a reliable method for feature selection thatan be used to augment the effectiveness of AFDD frameworks ineneral. The unsupervised data-driven feature selection algorithms designed for HVAC systems operating under varying seasonalynamics.
Evolutionary algorithms are particularly powerful for solvingomplex optimization problems with multiple local minima. Forxample, Differential Evolution (DE) has been used for optimizationf pressure vessel structure design [15] and joint replenish-
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
ent and distribution model [16]. Although the methods outlinedn [15,16] are powerful for general purpose optimization, a
ajor algorithmic restructuring is required to implement theselgorithms for cluster optimization. Instead, our paper is inter-sted in exploiting a lightweight evolutionary algorithm designed
PRESSputing xxx (2015) xxx–xxx
specifically for clustering purposes, the Rapid Centroid Estimation(RCE) [17].
Unsupervised feature selection based on data clustering is inher-ently an ill-posed problem where the goal is to group redundantfeatures into some unknown number of clusters based on intrin-sic information alone. For this paper, we utilize the Ensemble RapidCentroid Estimation (ERCE) [17,18], a semi-stochastic multi-swarmclustering algorithm inspired by the Particle Swarm Optimization(PSO [19]), to determine the characteristic features for the specificseason. The method is designed to automate the selection of charac-teristic features in each season. The block diagram of the proposedmethod is shown in Fig. 1.
The performance of the proposed feature selection algorithmwas tested using two well established time-sequence classifiers:(a) Nonlinear Auto-Regressive Time Delay Neural Networks withExogenous inputs (NARX TDNN); and (b) Hidden Markov Models(HMM) [13]. A comprehensive comparison would also be givenwith regards to other feature selection methods including Li’sManual selection [20], Complete Linkage (CL), Ensemble EvidenceAccumulation K-means (EAC K-means) and Weighted EvidenceAccumulation K-means (WEAC K-means).
The paper is structured as follows: Section 2 presents theoverview of the proposed method as well as the materials used toexamine its performance. Section 3 presents the detailed descrip-tion for each component including feature extraction, featureselection, and the classifier used in experiment. Section 4 describesthe theoretical foundations of the consensus clustering algorithmthat we utilize for performing the feature selection. Section 5describes the data utilized in the experiments. Section 6 presentsa comprehensive experimental result of the proposed method andcomparative analysis with other conventional feature selection andclassification algorithms. Section 7 presents in depth analyses anddiscussion regarding the results. Finally, Section 8 presents the con-clusion and future direction of the research.
2. General overview on HVAC systems
HVAC systems are configured and used to control the environ-ment of a building or a zone including one or several rooms. Theenvironmental variables may, for example, include temperature,air-flow, and humidity. The desired values/set-points of the envi-ronmental variables will depend on the intended use of the HVACsystem. If the HVAC system is being used in an office building, theenvironmental variables will be set to make the building/roomstherein comfortable to humans. An HVAC system typically servicesa number of zones within a building. The system normally includesa central plant which includes:
• a hydronic heater and chiller,• a pump system, which may include dedicated heated and chilled
water pumps, circulates heated and chilled water from the heaterand chiller through a circuit of interconnected pipes, and• a valve system, which may include dedicated heated and chilled
water valves, controls the flow of water into a heat exchangesystem (which may include dedicated heated and chilled watercoils).
The heated and/or chilled water circulates through the heatexchange system before being returned to the central plant wherethe process repeats (i.e. the water is heated or chilled and recircu-
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
lated). In the heat exchange system, energy from the heated/chilledwater is exchanged with air being circulated through an air distri-bution system.
The HVAC system also includes a sensing system which typicallyincludes a number of sensors located throughout the system, such
179
180
181
182
183
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 3
of th
agioiivtHi
sfvdda
rsrtgap
abec
f2Rr
3
docT
•
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
Fig. 1. Block diagram
s temperature, humidity, air velocity, volumetric flow, pressure,as, position, and occupancy detection sensors. The HVAC systems controlled by a control system that may be a stand alone system,r may form part of a building automation system (BAS) or build-ng management and control system (BMCS). The control systemncludes a computing system which is in communication with thearious components of the HVAC system. The control system con-rols and/or receives feedback from the various components of theVAC system in order to regulate environmental conditions for the
nhabitancy or functional purpose of the building.In an AFDD process, data from the components of the HVAC
ystem is received. This data may, for example, include sensed datarom various sensors within the system and feedback data fromarious components of the system. Additional data from externalata sources can also be received, such as the external weatherata. Consequently, the dimensionality and volume of these datare enormous.
In order to ensure proper identification of faults, an AFDD algo-ithm requires redundancies in the selected sensory and controlignal sources to be minimized. Additional information given byedundant features are irrelevant and provide no useful informa-ion in describing the type of fault and will ultimately cripple theeneralization capability of the fault detector. Insufficient featuresre equally as dangerous as it may lead misdiagnoses due to incom-lete information.
The method presented in this paper offers an unsupervisedpproach for feature selection method using ERCE. The system cane summarized in the block diagram in Fig. 1. A sample featurextraction and feature selection result using our proposed approachan be seen in Fig. 2.
The experimental materials in this paper are the experimentalault data from the ASHRAE-1312-RP datasets including Summer007, Spring 2008, and Winter 2008 from the ASHRAE Project 1312-P. In each season, different faults were generated, recorded andeported for experimental uses.
. Methods
Selecting important features in a HVAC system is challengingue to the excessive interrelations between signals. This sectionverviews our contribution on feature selection using consensuslustering and how it is applied for the HVAC system in particular.
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
he section is subdivided into five subsections:
Section 3.1 outlines the general model that we use for extractingmagnitude and oscillation (spectral centroid) features from a rawsignal.
e proposed method.
• Section 3.2 outlines our proposed polar approach for visualizingmulti-dimensional patterns.• Section 3.3 defines the measure that we use for quantifying the
degree of dissimilarity between features.• Section 3.4 provides the general overview of our main contri-
bution, a method for feature selection using semi-stochasticswarm-based consensus clustering, which will be furtherdetailed in Section 4.• Section 3.5 shows the architecture of the neural networks that we
use to benchmark the efficiency of the proposed feature selectionmethod.
3.1. Extracting time signal features: magnitude and spectralcentroid
Sensory signals from a HVAC system are streamed in the formof sampled time signals. From each time signal, HVAC engineersmainly observe two main features for deciding the condition of thesystem:
1. Whether the average magnitude of a sensory reading is insidethe typical condition for the specific season.
2. Whether there is any excessive oscillation in the sensory read-ings compared to the typical condition for the specific season.
For example, a fault type classified as Sequence of Heating andCooling Unstable (HCSF0517) can be identified by observing theexcessive oscillation of the Chilled Water Coil control signal (CHWCGPM). The phenomenon can be seen in Fig. 3. In this Figure, it is easyto observe that the moving average magnitude of the CHWC GPMduring HCSF0517 is considerably close to the typical behavior.
We model these two features mathematically as the movingaverage magnitude and spectral centroid. For a discrete signal gs(n),the two features can be measured using a straightforward calcula-tion as follows.
Magnitude characteristic is measured using a simple movingaverage which is calculated as follows,
MAG(gs) = 1N
N∑n=1
gs(n), (1)
where n denotes the sample number, N denotes the length of the
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
window.Spectral centroid of a signal describes the center of mass of the
spectrum, which can be calculated as follows,
gs = FFT(gs, NFFT ), (2)
263
264
265
266
ARTICLE IN PRESSG ModelASOC 2983 1–24
4 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
F aturesw electe
S
wotf
tcnird
z
waatf
e{
y
w
((
(
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
ig. 2. (a) Raw signals for the Spring 2008 dataset; (b) the low and high frequency fehile signals 161–320 are spectral centroid signals; (c) characteristic features are s
C(gs) =∑NFFT
n=5 |gs(n)|gs(n)∑NFFTn=5 |gs(n)|
, (3)
here FFT denotes fast Fourier transform, NFFT indicates the numberf bin, gs(n) and |gs(n)| represent the center frequency and magni-ude of the nth bin. Notice that the frequency centroid is calculatedrom the fifth bin to isolate only the high frequency oscillation.
Fault can be interpreted as ‘how much a signal deviates from itsypical characteristic during the specific season’. Incorporating thisriterion, each feature vector qs which includes {MAG(gs), SC(gs)} isormalized with respect to its normal operation. The discrepancy
n both direction and magnitude relative to the normal signal isepresented as a signed multiple of the signal’s standard deviationuring typical operation,
s(n) = qs(n) − �n(n)�n(n)
, (4)
here �n(n) and �n(n) denote the mean and standard deviation of feature during its normal operation at a specific sample n takent a particular time of the day. One can automatically realize thathe approach simply calculates the cross-sectional z-score of theeature qs.
The hyperbolic tangent kernel is then applied on the z-score,ffectively transforming each feature to a continuous measure from
− 1, 1} as follows
s(n) = tanh (zs) (5)
hich has a rather intuitive ‘fuzzy’ interpretation as follows:
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
a) ys(n) = 0: feature is at a typical level.b) ys(n) → −1: feature is atypical negative (much smaller than its
typical level),c) ys(n) → 1: feature is atypical positive (much larger than its typ-
ical level).
are isolated from each signal. Signals 1–160 are moving average magnitude signalsd using ERCE, while (d) classification is done using NARX-TDNN.
Intuitively, the variability of ys throughout the season would pro-vide a good indicator of its importance. In this paper, we measurevariability of a feature in term of its entropy as follows,
Hys = −∫
pys (x) log pys (x)dx, (6)
where pys (x) can be approximated empirically from the histogramof ys.
3.2. Feature visualization
Visualization is an important tool to verify the effectiveness of afeature selection algorithm. However, due to the complexity of anHVAC system, simultaneous visualization would easily overwhelmthe observer.
In this paper a polar approach for visualizing patterns consti-tuted by multi-dimensional feature cross-sections is proposed. Thevisualization scheme can be seen in Fig. 4.
Using the proposed visualization scheme, we have the variablenumbers listed in particular angles in the circle, whose correspond-ing radius represents the magnitude of ys, as previously detailedin Eq. (5). A normal system would oscillate inside the typicalregion (ys = 0) such that the polar plot shows a circle-like pat-tern. During fault condition the sensors behave inside either thepositive/negative atypical region such that the polar plot assumesvarious shapes other than circle. For example, Fig. 5 shows that thepattern during normal operations are visually different to the OADamper Stuck (OADS) fault scenario.
3.3. Measuring divergence between features
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
A pair of feature vectors y1 ∈ Y and y2 ∈ Y calculated from Eq.(5) can be treated as a vector of random numbers generated by theprobability distribution functions P = p(x) and Q = q(x), respectively.y1 and y2 can be assumed to be redundant (i.e. generated fromthe same distribution) when the Kullback–Leibler(KL) divergence
320
321
322
323
324
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 5
F atert istic, t
bt
tl
K
=
wdsi
K
powerful for identifying strong clusters in the data [22]. This is par-ticularly useful for our application as can be seen in Section 6 whereit can be observed that the features selected using consensus clus-tering algorithms are generally more compact and least redundant
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
ig. 3. The magnitude (top) and frequency (bottom) characteristics of the Chilled Whough CHWC GPM during HCSF0517 is correlated in terms of magnitude character
etween the two approaches zero [21]. A practical illustration ofhe case can be seen in Fig. 6.
KL-divergence measures the relative entropy between two dis-ributions [21]. KL-divergence measures the amount of informationost when Q is used to approximate P as follows,
L(P||Q ) =
H(P,Q )︷ ︸︸ ︷−∑
x
p(x) log q(x)+
−H(P)︷ ︸︸ ︷∑x
p(x) log p(x), (7)
∑x
p(x) logp(x)q(x)
, (8)
here H(P, Q) denotes the cross entropy between P and Q and H(P)enotes the information entropy of P. In this paper we use theymmetrical KL-divergence as originally proposed in [21] due tots symmetrical property as follows,
Ls(P||Q ) = KL(P||Q ) + KL(Q ||P) =∑
p(x) logp(x) − q(x) log
p(x). (9)
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
x
q(x) q(x)
Control signal (CHWC GPM) during fault (HCSF0517) vs. normal (NOR0505). Evenhe signal is uncorrelated in terms of frequency characteristic.
3.4. Feature selection using consensus clustering
Performing feature selection using prototype-based algorithmssuch as K-means, fuzzy C-means, or Self Organizing Map (SOM),can be difficult because the number of characteristic features K isnot initially known. Consensus clustering provides a quantitativeevidence for determining the number and membership of possibleclusters within a dataset (in our case, features). The method hasgained popularity in cancer genomics as a powerful tool to extractand visualize the dependencies between genes [22–24].
In this paper we propose an approach for unsupervised fea-ture selection using a swarm based ensemble algorithm [18]. Anadvantage of ensemble clustering algorithms to the conventionalclustering algorithms is that they allow a robust estimation ofnatural clusters by investigating the consensus strength betweenmultiple clusterings [22,25,26]. Consensus clustering is particularly
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
compared to the ones selected using complete-linkage. 358
ARTICLE IN PRESSG ModelASOC 2983 1–24
6 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
we ca
12
safce
Fd
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
Fig. 4. The proposed polar visualization scheme. In this illustration,
The feature selection process can be summarized as follows:
. Determine the feature clusters using consensus clustering.
. For each cluster, rank each feature according to its entropy andpick one whose entropy is the highest as the characteristic fea-ture for the cluster.
A sample result of a run of feature selection process using con-
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
ensus clustering is shown in Fig. 7. Features in the same clusterre denoted accordingly using the same color. The radius of eacheature indicates the entropy. A bold circle in each cluster is thehosen characteristic features, which is the feature with the highestntropy compared to the others in the same cluster.
ig. 5. The proposed polar visualization scheme showing the characteristic signals in nataset.
n see that features other than features #4 and #5 behave atypically.
3.5. Fault classification using Nonlinear Auto-Regressive NeuralNetwork with eXogenous inputs and distributed time delays(NARX-TDNN)
The Non-linear Auto-Regressive with eXogeneous inputs(NARX) network architecture [27] is a class of discrete-time non-linear systems. The NARX architecture can be broadly expressed inthe parallel mode,
y(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (10)
or in the series-parallel mode,
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
y(t) = f (u(t − nu), . . ., u(t − 1), u(t), y(t − ny), . . ., y(t − 1)), (11)
where u(t), y(t) and y(t) denote input, actual output and esti-mated output of the network at time t. nu and ny are the inputand output order, and f denotes a nonlinear function, which can be
ormal operation scenarios (left) and in OADS scenario (right) in the Winter 2008
379
380
381
382
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 7
F ny clud UMD . Ift
acfdtw[
tr(TtHoc
4
bs
•
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
ig. 6. A simplified case of redundancy between features in a HVAC system. How maistributions is intuitively smaller than the divergence between yCHWC−VLV and ySA−H
hem into two clusters, i.e. {{ yCHWC−VLV , yCHWC−GPM }, {ySA−HUMD , yRA−HUMD}}.
pproximated using a Multilayer Perceptron (MLP). As opposed toonventional Recurrent Neural Network (RNN), a NARX network’seedback comes only from the output neurons rather than its hid-en states. Using this simplified configuration, it has been arguedhat NARX networks generalize better compared to other RNN net-orks, especially on problems involving long-term dependencies
28].The configurations described in Eqs. (10) and (11) differ only in
heir mode of feedback. The configuration described in Eq. (10) iseferred to as parallel mode or recurrent NARX (NARX-P), while Eq.11) is referred to as series-parallel mode NARX (NARX-SP) [29].he NARX-P uses the state estimate feedback, while NARX-SP useshe actual observable state. Due to the fact that the actual state of anVAC system is practically unavailable at all times, the deploymentf NARX in an AFDD systems is currently limited to the NARX-Ponfiguration.
. Consensus clustering
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
This section explains, in great detail, the semi-stochastic swarm-ased consensus clustering approach to feature selection in a HVACystem. The section is subdivided into six subsections:
Section 4.1 briefly introduces the consensus clustering paradigm,
sters are there? It can be seen that the divergence between yCHWC−VLV and yCHWC−GPM
these four signals were to be clustered, then a possible solution would be to assign
• Section 4.2 presents the visual abstract of our proposed featureselection method,• Section 4.3 overviews Fred and Jain’s Ensemble Accumulation
[25],• Section 4.4 summarizes our previous work on Swarm Rapid Cen-
troid Estimation (SRCE) [17],• Section 4.5 introduces the newly proposed ‘self-evolution’ strat-
egy for the SRCE,• Section 4.6 outlines the new implementation of ERCE for feature
selection purposes.
4.1. Fundamentals of consensus clustering
Consensus clustering infers a consensus matrix from multipleruns of clustering algorithms. This consensus matrix encodes theprobability of each pairs of observation belonging to the same clus-ter. It has been argued that the natural, and arguably, optimum
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
clusters can be validated with higher confidence by analyzing thestability of this matrix [22,25].
The consensus matrix C is a positive semidefinite N × N squarematrix of joint probabilities. Each Cij ∈ {0, 1} represents the proba-bility of data point i and j belonging in the same cluster. For given
419
420
421
422
423
ARTICLE IN PRESSG ModelASOC 2983 1–24
8 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 7. A result of feature selection using ERCE (Algorithm 4, Section 4) on the Spring 2008 dataset, projected on the first and second principal components for ease ofQ6visualization. Each point represents a feature where the radius denotes the corresponding entropy. Each feature cluster is color coded and the characteristic feature of eachc res froc lected( with
d , the
ac
C
wu
424
425
426
427
428
429
luster is annotated accordingly. In this example, ERCE chose 16 characteristic featuan be seen that the spectral centroid feature for CHWC-GPM (SC CHWC-GPM) is seRF) and Supply Fan (SF) features are particularly important. This discovery is in lineuring the season. (For interpretation of the references to color in this figure legend
cluster assignment obtained from the mth clustering, we can cal-
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
ulate the mth co-association matrix as follows,
m = UTmUm, (12)
here each Um is a Km× N matrix which stores the values ofik,m for i ∈ {1, . . ., N} and k ∈ {1, . . ., Km} obtained from the mth
Fig. 8. An illustration describing the architecture of the Parallel Nonlinear Auto-Re
m the 320 features (160 magnitude features and 160 spectral centroid features). It, in line with the observation in Fig. 3. ERCE accurately discovered that Return Fan
the existence of Return Fan Failure (RFF) faults (May 12th, 18th, and 19th) observedreader is referred to the web version of the article.)
run of any clustering algorithm. Each uik,m denotes the probabil-
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
ity of a data point yi belonging to the cluster Ck. For any m, Um
should satisfy the constraints uik,m ∈ {0, 1} and∑K
k=1uik,m = 1. Thematrix multiplication represents a probabilistic ‘and’ operator con-veniently calculated using the (multiplicative) fuzzy T-norm [30].The ith diagonal component of Cm, i.e. Dii,m, quantifies the degree of
gressive Time Delay Neural Networks with eXogenous input (NARX-TDNN).
430
431
432
433
434
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 9
F f the
c and 0e tion 4
sp
C
Twf
C
wc[f
D
wmlttdd
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
ig. 9. Various partitions on the Spring 2008 dataset encoded by 16 subswarms oonstant � is set to 1.2, target entropies � are uniformly randomized between 0.005ase of visualization. In depth explanation regarding the method can be read in Sec
tability for the ith data in the mth clustering. In this paper weropose normalizing Cm by its diagonal matrix Dm as follows,
m = D−1/2m CmD−1/2
m (13)
he consensus C, or ensemble aggregate, is calculated as theeighted average of the co-association matrices C1, C2, . . ., CM as
ollows,
=∑M
m=1wmCm∑Mm=1wm
, (14)
here wm denotes the weight of the corresponding partition whichan be determined manually or using any cluster validation method31]. wm can also be set to assume equal weighting such that wm = 1or all m [25].
The consensus distance matrix can be defined as follows [22],
= 1 − C (15)
hich transforms the consensus matrix into a pairwise distanceatrix. Fred and Jain [25] proposes using single/average/complete
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
inkage algorithm on the D matrix to recover the natural cluster. Inheir 2005 paper, a criterion called maximum lifetime is proposedo determine the optimum threshold for cutting the cluster den-rogram [25]. Readers are encouraged to refer to [25] for moreetails.
Self Evolving Swarm Rapid Centroid Estimation (SE-SRCE, Algorithm 3). Fuzzifier.05. The coordinates are projected to the first and second principal components for.4 and Section 4.5.
4.2. Visual abstract: feature selection using ERCE
A visual abstract of the proposed swarm-based consensusclustering algorithm can be seen in Figs. 9 and 10. Fig. 10presents the consensus matrix and hierarchical cluster tree (clus-ter dendrogram) from the aggregation of the partitions shown inFig. 9.
4.3. Evidence accumulation
Fred and Jain propose the Evidence Accumulation (EAC) in2005 as a consensus clustering framework for combining theresult of multiple runs of a crisp prototype-based clusteringalgorithm (e.g. K-means) [25]. Wang proposes a generalizationto the algorithm, extending the applicability of the EAC forboth crisp and fuzzy clusters [30]. He finds that fuzzy par-titions is rather advantageous to crisp partitions in EnsembleAccumulation as the degree of overlapping in fuzzy partitionencodes to an extent how ‘close’ together clusters are [30].The approach can be summarized as a two step process asfollows,
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
1. Split: Partition the data matrix Y into some number of parti-tions Km (may be fixed or randomized within an interval) usingany prototype-based clustering algorithm. Repeat this step Mtimes.
473
474
475
476
ARTICLE ING ModelASOC 2983 1–24
10 M. Yuwono et al. / Applied Soft Com
Fig. 10. A heat map presenting the consensus matrix resulted from the aggregationof an SE-SRCE swarm shown in Fig. 9 using Algorithm 4 (Section 4.6). The rows andcolumns indicate individual items (in our case: the 320 features) whose consensusvalues range from 0 (never clustered together) to 1 (always clustered together)marked by white to dark blue. The complete linkage cluster dendrogram showingthe degree of redundancy between features is shown above the consensus matrix.Between the cluster dendrogram and the consensus matrix is the cluster label vectorsuggested by the maximum lifetime cut. The output of the consensus clustering isas shown in Fig. 7. (For interpretation of the references to color in this figure legend,t
2
vtMd
u
u
Wpm
[sp[
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
he reader is referred to the web version of the article.)
. Merge: Calculate the consensus matrix C and interpret theensemble clustering by performing a desired graph algo-rithm.
Given the data vectors yi ∈ Y, for each clustering m, Km centroidectors xk ∈ Xm can be obtained using any prototype-based clus-ering algorithm (e.g. K-means, fuzzy C-means, Gaussian Mixture
odels). The degree of membership of yi w.r.t xk is a function ofistance calculated as follows,
ik,m ={
1 if argminxk∈X
d(yi, xk,m)
0 otherwiseu ∈ [0, 1] (16)
ik,m =d(yi, xk,m)−1/(�−1)∑K
d(y , xj,m)−1/(�−1), � > 1 u ∈ {0, 1}. (17)
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
j=1 i
ang argues that using fuzzy partition in consensus clustering isarticularly efficient for suppressing over-segmentation. It is alsoore tolerant to noisy information than its crisp counterpart [30].The conventional approach using Evidence Accumulation (EAC)
25] and Weighted Evidence Accumulation (WEAC) [31] areummarized in Algorithm 1. Notice that the pseudocode is sim-lified using the fuzzy t-norm approach to EAC as introduced in30].
526
527
PRESSputing xxx (2015) xxx–xxx
Algorithm 1. (Weighted) Ensemble Clustering ((W)EACClustering)
Input dim × N Data Matrix Y, maximum number of prototypes Kmax , number ofrepetitions M, Prototype-based clustering algorithm Cluster (e.g. K-means,Fuzzy C-means), Linkage algorithm Linkage.
Output Crisp Ensemble Partition L1: for m = {1, . . ., M} do2: // Partition Y using random number of clusters.3: Krnd← random({2, Kmax})4: {Um , Xm} ← Cluster(Y, Krnd)5: // Calculate the co-association matrix for each clustering.6: Cm ← UT
mUm
7: Cm ← D−1/2m CmD−1/2
m
8: end for9: // Calculate the consensus matrix
10: C ←∑M
m=1wmCm∑M
m=1wm
,
11: // Interpret the consensus matrix using Linkage algorithm12: HierarchicalTree = linkage(C)13: th← MaximumLifetime(HierarchicalTree)14: L ← Cut(HierarchicalTree, th)15: Note that the threshold for cutting the hierarchical tree is determined
using maximum lifetime method [25].
4.4. Swarm Rapid Centroid Estimation
Yuwono [17] proposes the Swarm Rapid Centroid Estimation(Swarm RCEr+) algorithm in 2011 [32]. The semi-stochastic clus-tering algorithm efficiently incorporates the paradigms of ParticleSwarm Optimization (PSO [19]) into the traditional ExpectationMaximization (EM). The statistical validation on benchmark datasuggest that Swarm RCEr+ have a reduced risk of converging tolocal minima and leaner computational complexity compared toearlier evolutionary-algorithm-based clustering approaches [17].The algorithm was updated in 2014 to further decrease its memorycomplexity to be used for Ensemble clustering applications [18].The RCE algorithm below follows the 2014 preposition.
A particle in an RCE subswarm stores a tuple consisting of aposition vector x and a velocity vector v,
particlek,m = {xk,m, vk,m}. (18)
The position vector of each particle represents the coordinate ofa centroid vector xi ∈ R
dim. In RCE a subswarm is a collection ofcentroid coordinates, encoding a possible solution to the clusteringproblem. As the RCE swarm consists of M of such subswarm, atthe end of optimization, as many as M clustering solutions can beobtained.
Each subswarm stores two memory matrices:
1. The self-organizing memory Ym, which is an array of randomlysampled pointers to the data Y,
Ym = randsample(Y, �%), (19)
where � % ∈ {0, 1} denotes the rate of random sampling.2. The best position memory X
bestm which stores the position vec-
tors X = {x1, . . ., xKm } that minimizes a given objective functionf (Ym, Xm) throughout the search. A typical objective function isusually defined as, but not restricted to, the average distortion,
f (Ym, Xm) =∑
xk∈Xm
∑yi∈Ym
uik,md(xk, yi)∑yi∈Ym
uik,m(20)
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
where uik,m can be calculated either using Eq. (16) or Eq. (17).The RCE swarm X
best matrix is the union of all Xbestm such that,
Xbest =
⋃M
m=1X
bestm (21)
528
529
530
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 11
F datast
u
v
x
ws
wdq
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
ig. 11. Trajectory of the Swarm RCE particles recorded after 30 iterations on a toyo initialization. M = 6, tmax = 30, ε = 0.05, ıreset = 15.
On each iteration, the velocity and position of a particle ispdated as follows,
k,m(t + 1) = vk,m(t) + �k,m(t) (22)
k,m(t + 1) = xk,m(t) + vk,m(t + 1) (23)
here � denotes the resultant vector, which consist mainly of theelf organizing term and minimum (best position) term,
�k,m(t) = ϕ1 ◦
self organizing︷ ︸︸ ︷( ∑|Ym|i=1 uik,m (yi − xk,m(t))∑|Ym|
i=1 uik,m
)
+ ϕ2 ◦
minimum (best position)︷ ︸︸ ︷⎛⎝∑|Xbest |
j=1 qjk,m (xbestj
(t) − xk,m(t))∑|Xbest |j=1 qjk,m
⎞⎠,
= ϕ1 ◦ (E[Ym|Xm = xk,m] − xi,m)
(24)
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
+ϕ2 ◦ (E[Xbest |Xm = xk,m] − xk,m),
here ϕ ∈ {0, 1} ∈ Rdim denotes a uniform random vector; uik,m
enotes the cluster membership when Ym is mapped to Xm; whilejk,m denotes the cluster membership when X
best is mapped to Xm.
et with numerous random seeding shows Swarm RCE robustness and insensitivity
Should the self-organizing vector of a particle equals 0, xi willbe directed to xI win,m, the position of the winning particle. xIwin,m
is a particle in the mth subswarm whose cluster has the largestcardinality.
The RCE is equipped with two strategies to cope with suboptimalconvergence including substitution and particle reset as follows:
1. Substitution strategy forces particles in a search space to reachalternate equilibrium positions by introducing position instabil-ity. After each position update episode for a particle, apply
{xi(t + 1), vi(t + 1)} ={{xI win(t + 1) + N(0, �), 0} if ϕ < ε
{xi(t + 1), vi(t + 1)} otherwise(25)
where ϕ is a uniform random number ϕ ∈ {0, 1}, and N(0, �) isa Gaussian random vector with mean � = 0 and standard devia-tion � of each dimension of the data being clustered. ε denotesthe substitution probability parameter. Larger ε increases the fre-
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
quency. Optimal ε values lie between 0.01 ≤ ε ≤ 0.05 [17]. RCEwith substitution strategy enabled is denoted with the super-script +.
2. Particle reset strategy is triggered when fitness of the localminimum f (Ym, Xbest
m (t)) does not improve after a number of
555
556
557
558
559
INA
1 oft Com
ts
A
4
cas
c
h
Bm
H
wpo
w
ther information on the co-association tree can be read in Wang’spaper [34].
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
ARTICLEG ModelSOC 2983 1–24
2 M. Yuwono et al. / Applied S
iterations. Stagnation can be detected using a stagnation counterı which is updated as follows:
ı(t + 1) ={
ı(t) + 1 if f (Ym, X(t)) ≥ f (Ym, Xbest(t))
0 otherwise. (26)
When ı(t + 1) > ımax this strategy reinitializes all particles in asubswarm without resetting the local minimum position matrix.Values being reinitialized are only xk(t) and vk(t). Swarm conver-gence is detected when f (Ym, X
best(t)) does not improve aftera number of resets. RCE with particle reset strategy enabled isdenoted with the superscript r.
The algorithm pseudocode is shown in Algorithm 2. An illus-ration of the search trajectory of the swarm on a toy example ishown in Fig. 11.
lgorithm 2. Swarm RCEr+
Input Data points Y = {y1, . . ., yN } ∈ Rdim , # of clusters K.
Output Swarm centroid vectorsX
best = {Xbest1 , Xbest
2 , . . ., XbestM } ∈ R
dim .1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).2: For each subswarm m, randomly sample Y and store it in the
memory Ym = randsample(Y, �%).3: repeat4: for all m ∈ {1, . . ., M} do5: Calculate Um from the pairwise distance between Xm
and Ym ,6: Calculate Qm from the pairwise distance between Xm
and Xbest ,
7: Store Xbestm which minimizes f (Ym, Xm) throughout the
search,8: Vm← Vm + � m ,9: Xm← Xm + Vm ,10: Redirect particles with zero cardinality toward the
particle whose cluster has the largest cardinality.11: Apply substitution with rate of ε
12: if f (Ym, Xbestm ) does not improve after ıreset iterations
then13: Reinitialize subswarm (randomize(Xm), Vm = 0)14: end if15: end for16: until Convergence or maximum iteration reached17: return X
best = {Xbest1 , Xbest
2 , . . ., XbestM } ∈ R
dim .
.5. Self Evolving Swarm RCE
In this implementation we introduce a new self-evolutionriterion to the RCE which allows each subswarm to summondditional particles at will until the target cluster entropy isatisfied.
The uncertainty for a fuzzy membership value uik ∈ {0, 1} [33]an be quantified as follows,
ik,m = uik,m log uik,m. (27)
ezdek argues that a good clustering can be achieved when hik,m isinimized [33]. The average cluster entropy is then,
m = − 1Km|Ym|
Km∑k=1
|Ym|∑i=1
uik,m log uik,m, (28)
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
here Um is calculated from Xbestm . Hm close to 0.5 indicates a
ossible underpartitioning. Hm very close to 0 may also indicateverpartitioning.
Hm is only investigated each when there is an update to Xbestm
here the number of non-empty clusters is equal to Km such that
PRESSputing xxx (2015) xxx–xxx
|Cbestm | = Km. If Hm is larger than the target entropy �m, the number
of particles incremented using the following rule,
Km(t) ={
Km(t) + z+r if Hm > �m,
Km(t) otherwise,(29)
where Km(t) denotes the number of particles in the swarm m at thecurrent iteration t, z+r denotes an upper-bounded random integer,z+r ∈ Z
+ = [1, 2, . . ., z+max], while �m ∈ {0, 0.5} denotes a target Hm.Using this approach each subswarm to automatically adjusts Km
until the entropy criterion is satisfied.The desired granularity and diversity of the swarm can be con-
trolled by setting or randomizing the value of �m. The growth speedof the swarm can be controlled by setting z+r . As the subswarmsinfer Km automatically from Hm, the need of specifying the ran-domization interval is now abolished (recall that in EAC and WEACK-means, Km is randomized within a pre-specified upper and lowerbound).
The pseudocode of the Self-Evolving Swarm RCEr+ (SE-SRCE) canbe seen in Algorithm 3. A typical summary of an execution of SE-SRCE can be seen in Fig. 12.
Algorithm 3. Self-Evolving Swarm RCEr+ (SE-SRCE)Input Data points Y = {y1, . . ., yN } ∈ R
dim , # of clusters K.Output Swarm centroid vectorsX
best = {Xbest1 , Xbest
2 , . . ., XbestM } ∈ R
dim .1: Initialize the swarm (randomize(X1,. . .,M), V1,. . .,M = 0).2: For each subswarm m, randomly sample Y and store it in the
memory Ym = randsample(Y, �%).3: repeat4: for all m ∈ {1, . . ., M} do5: Execute Algorithm 2 lines 5–14,6: if f (Ym, Xm) improves then7: // Check whether the entropy criterion is satisfied and
whether all subswarms are nonempty8: if |Cbest
m | = Km and Hm > �m then9: Km ← Km + z+r10: end if11: end if12: end for13: until Convergence or maximum iteration reached14: return X
best = {Xbest1 , Xbest
2 , . . ., XbestM } ∈ R
dim .
4.6. Ensemble Rapid Centroid Estimation using Self-EvolvingSwarm
Ensemble RCE (ERCE) [18] is an ensemble extension to theSwarm RCEr+. The algorithm is shown to be relatively leaner com-plexity compared to conventional ensemble clustering algorithms[18], achieving up to quasilinear complexity in both time and space[18].
In this application we propose incorporating the proposedSE-SRCE into the ERCE framework. As the size of the evidence accu-mulation matrix is still relatively manageable (recall that sincethere are 320 features = 160 magnitude features + 160 spectral cen-troid features, the size of C is 320 × 320), EAC can be performedwithout using the co-association tree compression process pro-posed in the original paper [18,34]. However, it needs to be notedthat should the number of features increase up to thousands, it isadvisable that the co-association tree compression is utilized. Fur-
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
In order to interpret the final clustering, we need to clarify that inour application each cluster represents “a group of more redundantfeatures”. For each feature cluster, a feature with the largest entropyis selected as a characteristic feature for the cluster. The pseudocodeof ERCE used in our application is shown in Algorithm 4.
628
629
630
631
INA
ft Com
A
5
eHtmBi2wnw1awamfidu
6
Ompw
12
Samt
••
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
ARTICLEG ModelSOC 2983 1–24
M. Yuwono et al. / Applied So
lgorithm 4. Ensemble Rapid Centroid Estimation (ERCE)Input dim × N Data Matrix Y, number of subswarms M, fuzzification
constant �, target entropy for each subswarm {�1, . . ., �M}, Linkagealgorithm Linkage.
Output Crisp Ensemble Partition LX
best ← SE − SRCE(Y)for all m ∈ {1, . . ., M} do
Given Y and Xbestm , calculate Um using Eq. (17).
// Calculate the co-association matrix for each clustering.Cm ← UT
mUm
Cm ← D−1/2m CmD−1/2
m
end for
C ←∑M
m=1wmCm∑M
m=1wm
,
HierarchicalTree = linkage(C)th← MaximumLifetime(HierarchicalTree)L ← Cut(HierarchicalTree, th)// interpreting the final partitionfor all Ck ∈ {C1, . . ., YL max} do
// For each feature cluster, the characteristic feature is the feature withhighest entropy
ycharacteristick
= argmaxy∈Ck−∫
py(x) log py(x)dx
end for
. Experimental data
The ASHRAE Project 1312-RP modeled and reported a wide vari-ty of faults in three different seasons. The experiments include twoVAC systems running side by side with identical zone load. Fault
est was conducted in Air Handling Unit (AHU)-A, meanwhile nor-al operation was running in AHU-B. By comparing AHU A and
fault characteristics were recorded. ASHRAE-1312-RP datasetsncluded detailed experimental result from Summer 2007, Spring008, and Winter 2008. In each season different types of faultsere generated, recorded and reported. Readings from 160 sig-als sources during normal operation and various fault scenariosere recorded. The data was sampled every minute from 6:00 to
8:00. The faults reported in the ASHRAE-1312-RP datasets as wells a summary on the behavior of the feature proposed by Li [20],ere described in Table 1. Note that the features used in this table
re not part of our research but rather to illustrate how a staticodel would struggle during varying seasons. This is because the
eatures that are important in one season may not be as importantn other seasons. The feature that we use throughout the paper isetermined dynamically using consensus clustering based on thenique behavior in each season.
. Result
Based on the features in Table 1, we can see that faults such asASB, MADU and HCSF are particularly difficult to identify using Li’sodel [20]. In this section we present the experimental result of our
roposed unsupervised feature selection method. In this section weish to investigate the following:
. What the characteristic features for each season are, and
. Whether the selected features improves the generalization capa-bility of an AFDD algorithm in general. In particular, we areinterested in whether we can reliably identify OASB, MADU, andHCSF using the features selected by our proposed method.
Our approach is as follows. From each dataset (Summer 2007,pring 2008, and Winter 2008), as many as 160 time signals, and
vector recording the time of the day were reported. Using the
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
ethod described in Section 3.1 as many as 320 + 1 additional fea-ure could be extracted including:
Magnitude features from 160 sensor and control signals,Spectral centroid features from 160 sensor and control signals.
PRESSputing xxx (2015) xxx–xxx 13
• Time of the day (1 feature),
For clarity, the step-by-step process of the experiment can besummarized as follows:
1. Select a season and get the raw signals during normal operations.2. For each raw signal, isolate the magnitude and spectral centroid
components and calculate the fuzzy feature representation usingthe method described in Section 3.
3. Find the characteristic features using a consensus clusteringalgorithm (Our approach uses ERCE: Algorithm 4).
4 . Append the time-of-the-day feature as an additional feature.5. Using the selected features, train a model (Our approach uses
NARX-TDNN) using the data in Table 1. For each type of fault,randomly partition the data as follows:• 15% as training set,• 15% as validation set, and• 70% as test set.
6. Investigate the results on the test set to see whether using theselected features increases/decreases the classifier’s generaliza-tion capability.
6.1. Feature selection result
We wish to keep the number characteristic feature to a reason-able level (e.g. between 4 and 30) to ensure that the generalizationcapability of the classifier is not undermined. The parameters ofboth ERCE, EAC K-means, and WEAC K-means were selected basedon the assumption derived using the method illustrated in Fig. 12.From the average entropy-distortion scatter for each season suchas depicted in Fig. 12, we approximated the number of character-istic features to be around 5–30 or the average cluster entropy of0.005–0.05.
The parameters used for ERCE were as follows. The initial num-ber of particles was set to 2, the number of subswarms was set to60, substitution probability ε was set to 3%, ıreset was set to 15, thedistance metric was set to KL-divergence, fuzzifier � was set to 1.2,the entropy threshold for each subswarm �m was uniformly ran-domized between 0.005 and 0.05, z+max = 2, maximum number ofiterations was set to 100, and the linkage method was set to com-plete linkage. KL-divergence and complete linkage were selectedas the physical model of the HVAC was assumed to be unknownand even a subtle difference in temporal patterns/shapes could bean important predictive component for specific types of fault. Com-plete linkage favors the formation of small spherical clusters whichis particularly useful for capturing these subtle differences. Opti-mum cut was then conventionally calculated using the maximumlifetime criterion [25]. Subswarms were equally weighted duringensemble aggregation such that w1,...,M = 1.
Further investigation was also performed in order to benchmarkthe quality of the feature selected by the method. Benchmark unsu-pervised feature selection methods includes EAC K-means [25],WEAC K-means [31], and a traditional complete linkage agglomer-ative clustering (CL). CL was utilized to verify the advantages of theconsensus approaches to a conventional graph-based approach. Inthis experiment, the CL hierarchical tree is cut using inconsistencycriterion, with inconsistency coefficient = 1, returning as many as84 clusters, thus 84 characteristic features.
The parameters for EAC K-means and WEAC K-means were setas follows. The number of repetitions was set to 60, the numberof clusters k was uniformly randomized between 5 and 30. The
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
distance metric was set to KL-divergence. The linkage method wasset to complete linkage as per discussion. The optimum cut wascalculated using the maximum lifetime criterion [25]. Weights forWEAC K-means were calculated using the average silhouette widthcriterion [35].
730
731
732
733
734
Please cite
this
article in
press
as: M
. Y
uw
ono,
et al.,
Un
sup
ervised featu
re selection
usin
g sw
arm in
telligence
and
consen
sus
clus-
tering
for au
tomatic
fault
detection
and
diagn
osis in
Heatin
g V
entilation
and
Air
Con
dition
ing
systems,
Ap
pl.
Soft C
omp
ut.
J. (2015),
http
://dx.d
oi.org/10.1016/j.asoc.2015.05.030
AR
TIC
LE
IN P
RE
SS
G M
odelA
SOC
2983 1–24
14
M.
Yuwono
et al.
/ A
pplied Soft
Computing
xxx (2015)
xxx–xxx
Table 1ASHRAE-1312-RP dataset description and symptoms using features described in Shun Li’s model [20].
# Name Description HWC-VLV
P-E-hcoil
CHWC-VLV
P-E-ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-CFM
P-RA-CFM
P-OA-CFM
SA-TEMP
MA-TEMP
RA-TEMP
HWC-DAT
CHWC-DAT
Summer2007
1 NOR0819 Normal Operation2 NOR0825 Normal Operation3 EADS0820 EA Damper Stuck (Fully
Open)0 0 0 0 + + + + 0 + + 0 0 0 0 0
4 EADS0821 EA Damper Stuck (FullyClose)
0 0 0 0 − − − − 0 − − 0 0 0 0 0
5 RFF0822 Return Fan at fixedspeed (30% speed)
0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0
6 RFF0823 Return Fan completefailure
0 0 0 0 ++ ++ −− −− 0 −− ++ 0 0 0 0 0
7 CHWC0824 Cooling Coil ValveControl unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(Reduce PIDProportional Band byhalf)
8 CHWC0903 Cooling Coil ValveReverse Action
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
9 OADS0826 OADS OA Damper Stuck(Fully Closed)
0 0 0 0 ++ ++ ++ ++ 0 + − 0 0 0 0 0
10 CHWV0827 Cooling Coil Valve Stuck(Fully Closed)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
11 CHWV0831 Cooling Coil Valve Stuck(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
12 CHWV0901 Cooling Coil Valve Stuck(Partially Open – 15%)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
13 CHWV0902 Cooling Coil Valve Stuck(Partially Open – 65%)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
14 HCL0828 Heating Coil ValveLeaking (Stage 1 –0.4GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
15 HCL0829 Heating Coil ValveLeaking (Stage 2 –1.0GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
16 HCL0830 Heating Coil ValveLeaking (Stage 3 –2.0GPM)
0 ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
17 OADL0905 OA Damper Leaking(45% Open)
0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0
18 OADL0906 OA Damper Leaking(55% Open)
0 0 0 0 − − − − 0 0 ++ 0 0 0 0 0
19 AHUL0907 AHU Duct Leaking (afterSF)
0 0 + + + + + + + + + 0 0 0 0 0
20 AHUL0908 AHU Duct Leaking(before SF)
0 0 0 0 −− −− −− −− 0 −− −− 0 0 0 0 0
Please cite
this
article in
press
as: M
. Y
uw
ono,
et al.,
Un
sup
ervised featu
re selection
usin
g sw
arm in
telligence
and
consen
sus
clus-
tering
for au
tomatic
fault
detection
and
diagn
osis in
Heatin
g V
entilation
and
Air
Con
dition
ing
systems,
Ap
pl.
Soft C
omp
ut.
J. (2015),
http
://dx.d
oi.org/10.1016/j.asoc.2015.05.030
AR
TIC
LE
IN P
RE
SS
G M
odelA
SOC
2983 1–24
M.
Yuwono
et al.
/ A
pplied Soft
Computing
xxx (2015)
xxx–xxx
15
Table 1 (Continued)
# Name Description HWC-VLV
P-E-hcoil
CHWC-VLV
P-E-ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-CFM
P-RA-CFM
P-OA-CFM
SA-TEMP
MA-TEMP
RA-TEMP
HWC-DAT
CHWC-DAT
Spring2008
1 NOR0502 Normal Operation2 NOR0503 Normal Operation3 NOR0504 Normal Operation4 NOR0505 Normal Operation5 NOR0509 Normal Operation6 OASB0529 OA temperature sensor
bias (+3F)0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 OASB0530 OA temperature sensorbias (−3F)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 OADS0507 OA Damper Stuck (FullyClose)
0 0 0 0 + + + + − + −− 0 0 0 0 0
9 OADS0508 OA Damper Stuck (40%open)
0 0 0 0 + + + + − + −− 0 0 0 0 0
10 EADS0527 EA Damper Stuck (Fullyopen)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 EADS0510 EA Damper Stuck (FullyClose)
0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0
12 EADS0511 EA Damper Stuck (40%open)
0 0 0 0 0 0 0 − 0 − 0 0 0 0 0 0
13 CHW0506 Cooling Coil Valve Stuck(Fully Closed)
0 0 −− −− ++ ++ ++ ++ ++ ++ ++ ++ + + + ++
14 CHW0515 Cooling Coil Valve Stuck(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
15 CHW0516 Cooling Coil Valve Stuck(Partially Open – 50%)
++ ++ ++ ++ 0 0 0 0 0 0 0 0 0 0 ++ 0
16 RFF0512 Return Fan completefailure
0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0
17 RFF0518 Return Fan at fixedspeed (20%spd)
0 0 0 0 0 0 −− −− 0 −− 0 0 0 0 0 0
18 RFF0519 Return Fan at fixedspeed (80%spd)
0 0 0 0 0 0 ++ ++ 0 ++ 0 0 0 0 0 0
19 AFAB0522 Air filter area block fault(10%)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 AFAB0525 Air filter area block fault(25%)
0 0 0 0 + + + + 0 0 0 0 0 0 0 0
21 MADU0513 Mixed air damperunstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 MADU0514 Mixed air damperunstable/Cooling coilcontrol unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 HCSF0517 Sequence of heating andcooling unstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 HCSF0601 Supply Fan controlunstable
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Please cite
this
article in
press
as: M
. Y
uw
ono,
et al.,
Un
sup
ervised featu
re selection
usin
g sw
arm in
telligence
and
consen
sus
clus-
tering
for au
tomatic
fault
detection
and
diagn
osis in
Heatin
g V
entilation
and
Air
Con
dition
ing
systems,
Ap
pl.
Soft C
omp
ut.
J. (2015),
http
://dx.d
oi.org/10.1016/j.asoc.2015.05.030
AR
TIC
LE
IN P
RE
SS
G M
odelA
SOC
2983 1–24
16
M.
Yuwono
et al.
/ A
pplied Soft
Computing
xxx (2015)
xxx–xxx
Table 1 (Continued)
# Name Description HWC-VLV
P-E-hcoil
CHWC-VLV
P-E-ccoil
SF-SPD P-E-SF RF-SPD P-E-RF P-SA-CFM
P-RA-CFM
P-OA-CFM
SA-TEMP
MA-TEMP
RA-TEMP
HWC-DAT
CHWC-DAT
Winter2008
1 NOR0129 Normal Operation2 NOR0216 Normal Operation3 NOR0217 Normal Operation4 OADS0212 OA Damper Stuck (Fully
Close)−− −− 0 0 ++ + ++ + −− ++ −− 0 − 0 0 0
5 OADL0213 OA damper leaking (52%open)
0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0
6 OADL0215 OA damper leaking (62%open)
0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0
7 EADS0202 EA Damper Stuck (Fullyopen)
0 0 0 0 0 0 0 0 0 + + 0 0 0 0 0
8 EADS0203 EA Damper Stuck (FullyClose)
− −− 0 0 0 0 0 −− 0 −− −− 0 0 0 0 0
9 CHW0210 Cooling Coil Valve Stuck(Fully Open)
++ ++ ++ ++ 0 0 0 0 0 0 0 − 0 0 ++ −
10 CHW0211 Cooling Coil Valve Stuck(Partially Open – 20%)
+ + + + 0 0 0 0 0 0 0 0 0 0 ++ 0
11 HCF0205 Heating Coil FoulingStage 1
0 −− 0 0 + + + + 0 + − 0 0 0 0 0
12 HCF0206 Heating Coil FoulingStage 2
0 −− 0 0 + + + + 0 + − 0 0 0 0 0
13 HCRC0207 Heating coil reducedcapacity Stage 1
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 HCRC0208 Heating coil reducedcapacity Stage 2
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 HCRC0209 Heating coil reducedcapacity Stage 3
+ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A plug {0(a) , +(b) , ++(c) , −(d) , −−(e)} indicates that the value for the variable is: (a) 0: unchanged (the fault has no effect on the corresponding variable); (b) +: greater than normal; (c) ++: substantially greater than normal; (d) −:less than normal; (e) −−: substantially less than normal.
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 17
200 400 600 800 10000
0.1
0.2
0.3
0.4
iteration
Ave
. Clu
ster
Ent
ropy
200 400 600 800 10000
10
20
30
40
iteration
Num
ber
of C
lust
ers
200 400 600 800 100010
−2
100
102
iteration
Ave
. Dis
tort
ion
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 5 10 15 20 25Number of Clusters
Clu
ster
Ent
ropy
0 5 10 15 20 250
10
20
30
40
50
60
Number of Clusters
Ave
rage
Dis
tort
ion
0 0.1 0.2 0.3 0.4 0.50
10
20
30
40
50
60
Cluster Entropy
Ave
rage
Dis
tort
ion
0
0.1
0.2
0.3
0.4
0.5
0
5
10
15
20
25
0
10
20
30
40
50
60
Number of ClustersCluster Entropy
Ave
rage
Dis
tort
ion
Fig. 12. The scatter plot of the average distortion with respect to cluster entropy and the number of clusters extracted after a run of SE-SRCE with � = 1.2. The top graphs showthe cross-sectional plots of the three parameters during optimization of SE-SRCE, leading to the creation of the bottom scatter plot. The appropriate entropy range/K rangecan be investigated by observing Km , Hm , and f (Ym, X) trade-offs so that both distortion and entropy can be minimized while keeping the number of clusters to a reasonablelevel.
INA
1 oft Com
mbdmba
wyt
X
a
Y
Tosdp9rbe
tftriq
cEf
6
tuciics
ofTvLtmsm
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
ARTICLEG ModelSOC 2983 1–24
8 M. Yuwono et al. / Applied S
We measured the appropriateness of the feature selectionethod by investigating the normalized mutual information (NMI)
etween features [26]. Mutual information examines the depen-ence between two discrete distributions X and Y. Minimizingutual information is equal to maximizing the KL-divergence
etween the cross-entropy H(X, Y) and the marginal entropies (H(X)nd H(Y)) as follows,
NMI(X; Y) = I(X; Y)√H(X)H(Y)
,
= H(X) + H(Y) − H(X, Y)√H(X)H(Y)
,
=∑
x∈X
∑y∈Y p(x, y)(log p(x, y)/(p(x)p(y)))√(∑
x∈Xp(x) log p(x))(∑
y∈Y p(y) log p(y)) ,
(30)
here X and Y in our case was a pair of fuzzy feature signals (y1 and2 calculated using Eq. (5)), rounded to the nearest integer, suchhat
(n) = round(y1(n)), X(n) ∈ {−1, 0, 1}, (31)
nd
(n) = round(y2(n)), Y(n) ∈ {−1, 0, 1}. (32)
he NMI is calculated by marginalizing the probability of co-ccurrence between these three discrete categories. For a pair ofignals, NMI closer to 1 indicates that the feature pair is redun-ant. For each feature set, the strictly upper triangular of theairwise NMI matrix is taken and the median, 75 percentile, and5 percentile is averaged over 80 runs. Since we want to minimizeedundancies between features, a good feature set is characterizedy an average NMI closer to 0. Table 2 summarizes the result of thexperiment.
The characteristic features in each season were unique fromhose of other seasons. In order to analyze the important featuresor each season, we repeated the clustering process 200 times. Fromhis process, three histograms describing the probability of occur-ence of the characteristic features for each season were reportedn Fig. 13. The probability of occurrence was calculated as the fre-uency of appearance divided by the number of trials.
The overall patterns for fault classes for each season based on theharacteristic features are presented in Figs. 14–16, respectively.ach circle in these figures show the condition of the characteristiceatures during a specific fault in the HVAC system.
.2. Classification result
Generalization capability of a classifier is a powerful indicator ofhe quality of the features. Using the characteristic features selectedsing the proposed method, a classifier can be trained with lessomputational burden and less probability of overfitting (note thatn our experiment, 30% of the data was equally divided into train-ng and validation sets, the remaining 70% is used as test set). Thelassifier were trained and tested using the fuzzy features, ys, as ishown in Figs. 14–16.
The parameters for NARX-TDNN are set as follows. The numberf hidden neurons was set to 10. The input layer, hidden layer, andeedback orders were set to 2. The architecture is illustrated in Fig. 8.he dataset was divided at random to be used for training (15%),alidation (15%), and test (70%) sets. The training was done using
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
evenberg–Marquardt algorithm. The experiment was repeated 80imes for each season to test the reliability and repeatability of the
ethod. Using the features shown in Figs. 14–16, the average sen-itivity and specificity of the proposed method compared to Li’sanual feature selection approach is presented in Table 3.
PRESSputing xxx (2015) xxx–xxx
The quality of the feature sets selected by ERCE was bench-marked against the features selected by EAC K-means, WEACK-means, and Complete Linkage. The features selected by thesefour competing algorithms were supplied for both NARX-TDNNand Hidden Markov Models (HMM) [11–13], where the trainingand testing for both classifiers were repeated 100 times for eachpair of feature selection and classification algorithm. The weightedaverage (WA) sensitivity and WA specificity result are reported inTable 4.
The significance of the experimental result were validated usingpaired t-test with null hypotheses as follows:
1. H∗0 : The performance of a classifier using features from ERCE isnot significantly better than using features from algorithm X. Astar (*) in Tables 3 and 4 indicates that H∗0 should be rejected,whereas no sign indicates otherwise.
2. H†0 : Given the same feature selection algorithm, a trained
classifier A does not exercise significantly better performancecompared to classifier B. A dagger (†) in Table 4 indicates that H†
0should be rejected, whereas no sign indicates otherwise.
7. Discussion
As the proposed feature selection process is strictly unsu-pervised, analyzing the result leads to a number of interestingobservations.
With regards to the redundancies between features,it can be seen in Table 2 that all consensus algorithms(Median NMIERCE = 0.019, Median NMIEAC Kmeans = 0.040, MedianNMIWEAC Kmeans = 0.048) in general outperformed CL (MedianNMI = 0.1305), manual selection (Median NMI = 0.0199, Q75%NMI = 0.2227), and no selection (Median NMI = 0.1857). The threeconsensus algorithms reported less than 20 characteristic featureson average, which is at least four times lower than the numberof characteristic features selected using CL. Furthermore, thefeatures selected by ERCE (Median NMI = 0.019 ± 0.004) outper-formed those that are selected by other consensus algorithms:EAC K-means (Median NMI = 0.040 ± 0.011) and WEAC K-means(Median NMI = 0.048 ± 0.034) as indicated by its low NMI. ERCEalso had smaller standard deviations on all performance aspects,especially on the number of features, suggesting the relativelyhigh reliability and repeatability of the proposed swarm-basedconsensus clustering algorithm.
With regards to the reliability of the feature selection algorithm,ERCE consistently selects features that are unique and relevant tothe faults in the corresponding year, as can be seen in Fig. 13. Forexample, throughout the experiment using Winter 2008 dataset,ERCE consistently selected HWC-VLV, PLN-TMP, EA-DMPR, HWC-DAT and HWP-GPM, which are ones of the important features forthe specific season. Pattern for the Winter 2008 dataset is shown inFig. 16. In this figure, the pattern for Exhaust Air Damper Stuck(EADS) faults can be easily distinguished among the others byobserving the conditions of both EA-DMPR and PLN-TMP. Simi-larly, HCRC faults in this season are characterized by abnormalHWC-VLV and VAV-DMPR signals. CHW faults are also observablefrom an increase in HWC-DAT as the system compensates for theincreased flow of chilled water due to the faulty cooling coil valve.ERCE also appropriately discovers that SC CHWC-GPM is a partic-ularly important feature in Spring 2008 due to HCSF0517, as hasbeen discussed previously in Section 3. ERCE discovers that outside
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
air damper (OA-DMPR) is consistently inside the atypical nega-tive region during HCSF faults. This information may be useful forfurther investigation of the nature of the particular fault.
Regarding the effects of the proposed feature selection algo-rithm to classifier performances, the result of ERCE+NARX-TDNN,
845
846
847
848
849
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 19
Table 2The Normalized Mutual Information (NMI) between features selected using various feature selection algorithm on Spring 2008 dataset. Boldface indicates the lowest NMI(the least redundancies between features).
Feature selection method
Without feature selection Manual selection [20] CL
# of Features 320 16 84NMI between characteristic feature pairs
Median 0.1857 0.0199 0.1305Q75% NMI 0.4110 0.3014 0.2227Q95% NMI 0.8821 0.4899 0.4863
Feature selection method
EAC k-Means WEAC K-means ERCE
# of Features 15.90 ± 3.86 16.70 ± 4.73 17.20 ± 1.60
0.00.10.3
ptwbEMf
Fp
850
851
852
853
854
855
856
857
858
NMI between characteristic feature pairsMedian 0.040 ± 0.011
Q75% NMI 0.106 ± 0.025
Q95% NMI 0.404 ± 0.035
articularly in the Spring 2008 shows a clear advantage of ERCEo other feature selection approaches. As can be seen in Table 3,
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
hen compared to the manual selected features as suggestedy Li [20], supplying NARX-TDNN with the feature selected byRCE results in consistent specificity improvements in Spring 2008.oreover overall statistically significant weighted average per-
ormance improvements are also observed throughout Summer
ig. 13. Representative feature occurrence histogram for each season after 200 clusterirobability of occurrence, calculated as the frequency of appearance divided by the numb
48 ± 0.034 0.019 ± 0.00431 ± 0.068 0.078 ± 0.01364 ± 1.600 0.339 ± 0.031
2007, Spring 2008, and Winter 2008 based on our experiment.Based on the statistical results in Table 4, using features from
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
Li and EAC K-means limits NARX-TDNN’s specificity at an aver-age around 91.54% and 91.85% respectively. The low average maybe attributed to misclassification of a number of more ambigu-ous faults such as OASB, MADU, AFAB and HCSF. This reportis consistent with Li’s observation, presented in Table 1 where
ng trials. The x-axis denotes the specific label for each feature, y-axis denotes theer of trials.
859
860
861
862
863
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG ModelASOC 2983 1–24
20 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
NOR0819NOR0825
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
EADS0820
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
EADS0821
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
RFF0822
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
RFF0823
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
CHWC0824
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
CHWC0903
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
OADS0826
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
CHWV0827
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
CHWV0831
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
CHWV0901
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
CHWV0902
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
HCL0828
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
HCL0829
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
HCL0830
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
OADL0905
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
OADL0906
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
AHUL0907
1
2
3
456
78
9
10
11
12
13
1415
16 1718
19
20
21 −1.0 0.0 1.0
AHUL0908
Fig. 14. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Summer 2007 dataset.
Please cite this article in press as: M. Yuwono, et al., Unsupervised feature selection using swarm intelligence and consensus clus-tering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems, Appl. Soft Comput. J. (2015),http://dx.doi.org/10.1016/j.asoc.2015.05.030
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 21
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
NOR0502NOR0503NOR0504NOR0505NOR0509
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
OASB0529
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
OASB0530
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
OADS0507
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
OADS0508
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
EADS0527
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
EADS0510
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
EADS0511
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
CHW0506
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
CHW0515
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
CHW0516
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
RFF0512
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
RFF0518
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
RFF0519
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
AFAB0522
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
AFAB0525
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
MADU0513
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
MADU0514
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
HCSF0517
1
2
3
456
7
8
9
10
11
12
1314 15
16
17
18
19 −1.0 0.0 1.0
HCSF0601
Fig. 15. Patterns constituted by the characteristic features for each data in the ASHRAE-1312-RP Spring 2008 dataset.
ARTICLE IN PRESSG ModelASOC 2983 1–24
22 M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx
1
2
3
4
5
6
7
−1.0 0.0 1.0
NOR0129NOR0216NOR0217
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADS0212
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADL0213
1
2
3
4
5
6
7
−1.0 0.0 1.0
OADL0215
1
2
3
4
5
6
7
−1.0 0.0 1.0
EADS0202
1
2
3
4
5
6
7
−1.0 0.0 1.0
EADS0203
1
2
3
4
5
6
7
−1.0 0.0 1.0
CHW0210
1
2
3
4
5
6
7
−1.0 0.0 1.0
CHW0211
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCF0205
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCF0206
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0207
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0208
1
2
3
4
5
6
7
−1.0 0.0 1.0
HCRC0209
res fo
tfprTSmd
TN
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
Fig. 16. Patterns constituted by the characteristic featu
hese faults seem to have no effects on the manually selectedeatures. Similar cases are seen with WEAC K-means and com-lete linkage. Using features from ERCE allows NARX-TDNN toeach a significantly higher specificity average of 98.37% ± 0.25%.he significance of the results are statistically validated on bothummer 2007 and Spring 2008 datasets, where signals exhibitore nonlinearities compared to those in the Winter 2008
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
ataset.Regarding the general performance of the classifiers, results in
able 4 show the comparative performance between HMM andARX-TDNN. While HMM shows superior specificity in Winter
r each data in the ASHRAE-1312 Winter 2008 dataset.
2008 dataset, its specificity in Spring 2008 and Summer 2007is relatively not as high. This is arguably due to the nonlin-earities in the fault patterns in Spring 2008 and Summer 2007datasets compared to Winter 2008 faults. For instance, it canbe seen in Fig. 15 that MADU, AFAB and HCSF faults exhibitvisually ambiguous patterns. When dealing with these nonlineardatasets, the NARX-TDNN classifier benefits from its capabil-
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
ity in dealing with long-term dependencies. Table 4 shows thatNARX-TDNN was capable in distinguishing these faults, achiev-ing specificity of 98.37% ± 0.25% using the features provided byERCE.
882
883
884
885
ARTICLE IN PRESSG ModelASOC 2983 1–24
M. Yuwono et al. / Applied Soft Computing xxx (2015) xxx–xxx 23
Table 3NARX-TDNN classification result.
Fault type Feature selection method
Manual selectiona ERCEb
Sensitivity Specificity Sensitivity Specificity
Summer 2007
NOR 99.9% ± 0.1% 98.1% ± 1.6% 99.9% ± 0.2% 99.0% ± 2.1%EADS 99.7% ± 0.5% 99.5% ± 2.7% 99.8% ± 0.3% 98.9% ± 2.5%RFF 99.9% ± 0.0% 99.0% ± 2.7% 99.9% ± 0.1% 99.5% ± 1.4%CHWC 99.9% ± 0.2% 99.0% ± 1.1% 99.8% ± 0.2% 99.0% ± 4.4%OADS 99.9% ± 0.2% 98.0% ± 2.2% 99.9% ± 0.3% 97.3% ± 3.1%CHWV 99.8% ± 0.3% 99.0% ± 4.3% 99.7% ± 0.9% 99.2% ± 2.5%HCL 99.7% ± 0.4% 98.0% ± 1.0% 99.7% ± 0.3% 98.4% ± 2.4%OADL 99.7% ± 0.5% * 95.2% ± 7.1% 99.9% ± 0.2% 98.0% ± 1.2%AHUL 99.8% ± 0.2% 99.8% ± 1.1% 99.9% ± 0.1% 99.5% ± 2.6%Weighted average 99.8% ± 0.1% *96.8% ± 2.2% 99.8% ± 0.1% 98.4% ± 0.7%
Spring 2008
NOR 99.8% ± 0.3% 99.3% ± 2.1% 99.9% ± 0.1% 99.6% ± 0.6%OASB 99.1% ± 1.5% *95.0% ± 6.1% 99.7% ± 0.3% 99.5% ± 1.4%OADS 99.9% ± 0.2% *98.2% ± 1.7% 99.8% ± 0.1% 99.5% ± 0.9%EADS 99.9% ± 0.1% *98.3% ± 0.5% 99.9% ± 0.1% 99.0% ± 2.8%CHW 99.7% ± 0.4% *98.7% ± 0.8% 99.8% ± 0.2% 99.3% ± 0.7%RFF 99.9% ± 0.2% *82.6% ± 33.1% 99.8% ± 0.1% 99.4% ± 0.7%AFAB 99.7% ± 0.2% *42.9% ± 17.8% 99.7% ± 0.2% 98.5% ± 4.9%MADU 98.6% ± 1.6% *70.4% ± 39.8% 98.9% ± 0.2% 98.0% ± 4.0%HCSF 99.6% ± 0.6% *94.7% ± 6.6% 99.9% ± 0.0% 99.5% ± 1.5%Weighted average 98.9% ± 0.2% *86.2% ± 5.0% 99.9% ± 0.1% 99.2% ± 0.5%
Winter 2008
NOR 99.6% ± 0.4% 99.3% ± 1.1% 99.8% ± 0.1% 98.3% ± 2.4%OADS 99.9% ± 0.1% *95.6% ± 3.8% 99.8% ± 0.2% 98.7% ± 1.4%OADL 99.8% ± 0.4% 98.5% ± 3.2% 99.5% ± 0.7% 98.5% ± 1.5%EADS 99.9% ± 0.4% 97.9% ± 1.3% 99.6% ± 0.3% 97.5% ± 2.5%CHW 99.8% ± 0.4% *97.5% ± 5.2% 99.6% ± 0.3% 99.1% ± 1.2%HCF 99.8% ± 0.4% *95.1% ± 4.5% 99.2% ± 0.7% 97.2% ± 2.9%HCRC 99.8% ± 0.4% 99.0% ± 2.2% 99.8% ± 0.3% 99.4% ± 1.1%Weighted average 99.7% ± 0.2% 97.5% ± 0.7% 99.8% ± 0.1% 98.7% ± 0.7%
H∗0 : The performance of NARX-TDNN using features from ERCE is not significantly better than using manually selected features.a Manual selection utilizes Shun Li’s feature set [20].b ERCE features are as shown in Fig. 14–16.* Reject H∗0 ( = 0.001).
Table 4Performance comparison with competing feature selection methods, tested against two classification methods: NARX-TDNN and HMM.
Feature selection # of features HMM NARX-TDNN
WA sensitivity WA specificity WA sensitivity WA specificity
Summer 2007
Manual selectiona 16 ± 0.00 *98.65% ± 0.34% 89.45% ± 2.48% †99.59% ± 0.12% †96.81% ± 1.99%EAC K-means 29.85 ± 17.26 *98.70% ± 0.50% *85.01% ± 4.94% †99.69% ± 0.22% *,†95.07% ± 3.75%WEAC K-means 14.14 ± 13.09 *97.69% ± 0.13% *72.85% ± 1.48% †99.79% ± 0.08% *,†96.85% ± 2.31%Complete linkage 81.00 ± 0.00 98.71% ± 0.98% 90.49% ± 7.52% †99.51% ± 0.27% †96.42% ± 1.16%ERCE 21.41 ± 4.46 99.15% ± 0.32% 90.85% ± 4.16% †99.69% ± 0.08% †97.61% ± 0.85%
Spring 2008
Manual selectiona 16 ± 0.00 98.90% ± 0.54% †91.54% ± 2.98% *98.89% ± 0.23% *86.17% ± 5.01%EAC K-means 34.56 ± 9.40 98.55% ± 0.42% 91.85% ± 2.68% *,†99.02% ± 0.81% *91.92% ± 6.42%WEAC K-means 33.52 ± 10.32 98.83% ± 0.40% 93.37% ± 2.38% †99.20% ± 0.49% *92.37% ± 6.53%Complete linkage 84 ± 0.00 98.80% ± 0.46% 94.12% ± 2.61% †99.62% ± 0.17% *95.14% ± 1.29%ERCE 19.93 ± 5.19 98.84% ± 0.32% 92.68% ± 2.66% † 99.79% ± 0.10% † 98.37% ± 0.25%
Winter 2008
Manual selectiona 16 ± 0.00 98.81% ± 0.56% *92.92% ± 0.31% †99.71% ± 0.15% †97.51% ± 0.65%EAC K-means 27.74 ± 7.18 †99.98% ± 0.14% †99.85% ± 0.85% 99.49% ± 0.50% 97.87% ± 2.06%WEAC K-means 21.37 ± 11.75 †99.96% ± 0.18% 99.79% ± 1.00% 99.59% ± 0.19% 97.68% ± 0.88%Complete linkage 95 ± 0.00 99.87% ± 0.40% 99.21% ± 2.37% 99.74% ± 0.13% 98.54% ± 1.01%ERCE 7.88 ± 3.02 99.92% ± 0.31% 99.49% ± 1.43% 99.73% ± 0.19% 98.35% ± 1.16%
H∗0: The performance of a classifier using features from ERCE is not significantly better than using features from algorithm X. H†0: Given the same feature selection algorithm,
a o class
8
oth
886
887
888
889
890
891
892
trained classifier A does not exercise significantly better performance compared ta Manual selection utilizes Shun Li’s feature set [20].* Reject H∗0 ( = 0.001).† Reject H†
0 ( = 0.001).
. Conclusion
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
A method for automating feature selection and classificationf faults for Heating Ventilation and Air-Conditioning (HVAC) sys-ems using a knowledge-discovery and Neural-Network approachas been proposed. The core of the method is the Ensemble Rapid
ifier B.
Centroid Estimation (ERCE) which automatically finds characteris-tic features and discards redundant features. Using these character-
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
istic features, a Parallel Nonlinear Auto-Regressive Neural Networkwith eXogenous inputs and distributed time delays (NARX-TDNN)is then trained to identify the faults described in ASHRAE-1312-RPSummer 2007, Spring 2008, and Winter 2008 datasets.
893
894
895
896
INA
2 oft Com
tgtN(lsnra
eiwHaaNAac
impfc
A
IAfbAVpitcarip
R
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
ARTICLEG ModelSOC 2983 1–24
4 M. Yuwono et al. / Applied S
The performance of the proposed unsupervised fea-ure selection algorithm (ERCE Median NMI = 0.019 ± 0.004)enerally outperformed the conventional consensus clus-ering including Evidence Accumulation K-means (MedianMI = 0.040 ± 0.011), Weighted Evidence Accumulation K-means
Median NMI = 0.048 ± 0.034), and the conventional completeinkage clustering (Median NMI = 0.1305). ERCE also had smallertandard deviations on all performance aspects, especially on theumber of features, suggesting the relatively high reliability andepeatability of the proposed swarm-based consensus clusteringlgorithm.
The proposed feature selection method was tested on thexperimental fault data from the ASHRAE-1312-RP datasets includ-ng Summer 2007, Spring 2008, and Winter 2008 using two
ell-established time-domain classifiers: (a) NARX-TDNN; and (b)idden Markov Models (HMM). Satisfactory results were reportednd summarized. Our experimental results showed weighted aver-ge sensitivity and specificity of: (a) higher than 99% and 96% forARX-TDNN, and; (b) higher than 98% and 86% for HMM on theSHRAE-1312-RP datasets. The proposed feature selection methodppears to have positive effect in improving the generalizationapability of both AFDD algorithms based on our experiment.
Notwithstanding the satisfactory result to date, further works necessary to investigate the performance of the proposed
ethod on alternative HVAC systems. Future works will incor-orate semi-supervised adaptive learning capability for automaticault discovery. We are also interested in applying the proposedonsensus clustering method for other applications.
cknowledgements
This research is funded by The Commonwealth Scientific andndustrial Research Organisation (CSIRO), Marsfield, Australia. TheSHRAE-1312-RP Summer 2007, Spring 2008, and Winter 2008
ault data are provided by CSIRO. The research is supervisedy CSIRO, the paper writing is supervised specifically by Guo.utomatic Fault Detection and Diagnosis (AFDD) for the Heatingentilation and Air Conditioning (HVAC) research is an ongoingroject in CSIRO Energy Technology and Computational Informat-
cs. We acknowledge the inputs of the anonymous reviewers forhe time and effort in providing our paper comprehensive qualityriticisms. The corresponding author would also like to personallycknowledge Nina Elita for her contribution, especially in proofeading and provision of sincere moral support to the correspond-ng author during the preparation, writing and submission of thisaper.
eferences
[1] A. Kusiak, M. Li, F. Tang, Modeling and optimization of {HVAC} energy con-sumption, Appl. Energy 87 (2010) 3092–3102.
[2] A. Kusiak, F. Tang, G. Xu, Multi-objective optimization of {HVAC} system withan evolutionary computation algorithm, Energy 36 (2011) 2440–2449.
[3] J. Wall, Automatic Fault Detection and Diagnosis, 2011 http://www.csiro.au/Outcomes/Energy/building-fault-detection.aspx
[4] J. Ward, Opticool, 2013 http://www.csiro.au/Organisation-Structure/Flagships/Energy-Flagship/Opticool.aspx
[5] J. Liang, R. Du, Model-based fault detection and diagnosis of HVAC systemsusing support vector machine method, Int. J. Refrig. 30 (2007) 1104–1114.
[6] D. Jacob, S. Dietz, S. Komhard, C. Neumann, S. Herkel, Black-box models for faultdetection and performance monitoring of buildings, J. Build. Perform. Simul. 3(2010) 53–62.
Please cite this article in press as: M. Yuwono, et al., Unsupervised
tering for automatic fault detection and diagnosis in Heating Ventilahttp://dx.doi.org/10.1016/j.asoc.2015.05.030
[7] C. Lo, P. Chan, Y.-K. Wong, A.B. Rad, K. Cheung, Fuzzy-genetic algorithm for auto-matic fault detection in HVAC systems, Appl. Soft Comput. 7 (2007) 554–560.
[8] J. Schein, S.T. Bushby, N.S. Castro, J.M. House, A rule-based fault detectionmethod for air handling units, Energy Build. 38 (2006) 1485–1492.
[9] T.M. Rossi, J.E. Braun, A statistical, rule-based fault detection and diagnosticmethod for vapor compression air conditioners, HVAC&R Res. 3 (1997) 19–37.
[
[
PRESSputing xxx (2015) xxx–xxx
10] J. Schein, Results from Field Testing of Embedded Air Handling Unit and VariableAir Volume Box Fault Detection Tools, U.S. Dept. of Commerce, TechnologyAdministration, National Institute of Standards and Technology, 2006.
11] J. Wall, Y. Guo, J. Li, S. West, A dynamic machine learning-based tech-nique for automated fault detection in HVAC systems, in: Proceedings ofthe ASHRAE Annual Conference, Montreal, Quebec, Canada, 2011, 2011,pp. 449–456.
12] Y. Guo, D. Dehestani, J. Li, J. Wall, S. West, S. Su, Intelligent outlier detection forHVAC system fault detection, in: Proceedings of the 10th International HealthyBuildings Conference, Brisbane, Queensland, Australia, 2012, 2012.
13] Y. Guo, J. Wall, J. Li, S. West, Intelligent model based fault detection anddiagnosis for HVAC system using statistical machine learning methods, in:Proceedings of the ASHRAE 2013 Winter Conference, Dallas, USA, 2013, 2013.
14] M. Yuwono, S.W. Su, Y. Guo, J. Li, S. West, J. Wall, Automatic feature selectionusing multiobjective cluster optimization for fault detection in a heating venti-lation and air conditioning system, in: Proceedings of the 2013 1st InternationalConference on Artificial Intelligence, Modelling and Simulation, AIMS ’13, IEEEComputer Society, Washington, DC, USA, 2013, 2013, pp. 171–176, http://dx.doi.org/10.1109/AIMS.2013.34
15] W. Deng, X. Yang, L. Zou, M. Wang, Y. Liu, Y. Li, An improved self-adaptivedifferential evolution algorithm and its application, Chemometr. Intell. Lab.Syst. 128 (2013) 66–76, http://dx.doi.org/10.1016/j.chemolab.2013.07.004
16] L. Wang, C.-X. Dun, W.-J. Bi, Y.-R. Zeng, An effective and efficient differen-tial evolution algorithm for the integrated stochastic joint replenishment anddelivery model, Knowl.-Based Syst. 36 (2012) 104–114, http://dx.doi.org/10.1016/j.knosys.2012.06.007
17] M. Yuwono, S. Su, B. Moulton, H. Nguyen, Data clustering using variants of rapidcentroid estimation, IEEE Trans. Evol. Comput. 18 (2013) 366–377.
18] M. Yuwono, S. Su, B. Moulton, H. Nguyen, An algorithm for scalable clustering:ensemble rapid centroid estimation, in: Proceedings of the 2014 IEEE Congresson Evolutionary Computation, 2014, 2014, pp. 1250–1257.
19] D.W. van der Merwe, A.P. Engelbrecht, Data clustering using particle swarmoptimization, in: Proceedings of the 2003 IEEE Congress on Evolutionary Com-putation, 2003, vol. 1, 2003, 2003, pp. 215–220.
20] S. Li, A Model-Based Fault Detection and Diagnostic Methodology for SecondaryHVAC Systems (Ph.D. thesis), Drexel University, 2014.
21] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22(1951) 79–86, http://dx.doi.org/10.1214/aoms/1177729694
22] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: A resampling-based method for class discovery and visualization of gene expressionmicroarray data, Mach. Learn. 52 (2003) 91–118, http://dx.doi.org/10.1023/A:1023949509487
23] M.D. Wilkerson, D.N. Hayes, ConsensusClusterPlus: a class discovery toolwith confidence assessments and item tracking, Bioinformatics 26 (2010)1572–1573.
24] D.N. Hayes, S. Monti, G. Parmigiani, C.B. Gilks, K. Naoki, A. Bhattacharjee,M.A. Socinski, C. Perou, M. Meyerson, Gene expression profiling reveals repro-ducible human lung adenocarcinoma subtypes in multiple independent patientcohorts, J. Clin. Oncol. 24 (2006) 5079–5090.
25] A. Fred, A. Jain, Combining multiple clusterings using evidence accumulation,IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 835–850, http://dx.doi.org/10.1109/TPAMI.2005.113
26] A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for com-bining multiple partitions, J. Mach. Learn. Res. 3 (2003) 583–617, http://dx.doi.org/10.1162/153244303321897735
27] I.J. Leontaritis, S.A. Billings, Input–output parametric models for non-linearsystems. Part I: Deterministic non-linear systems, Int. J. Control 41 (1985)303–328, http://dx.doi.org/10.1080/0020718508961129
28] H. Siegelmann, B. Horne, C. Giles, Computational capabilities of recurrent NARXneural networks, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27 (1997)208–215, http://dx.doi.org/10.1109/3477.558801
29] J.M. Menezes Jr., G. Barreto, A new look at nonlinear time series predictionwith NARX recurrent neural network, in: Ninth Brazilian Symposium on NeuralNetworks, 2006. SBRN ’06, 2006, pp. 160–165, http://dx.doi.org/10.1109/SBRN.2006.7
30] T. Wang, Comparing hard and fuzzy C-means for evidence-accumulation clus-tering, in: Proceedings of the 18th International Conference on Fuzzy Systems,FUZZ-IEEE’09, IEEE Press, Piscataway, NJ, USA, 2009, 2009, pp. 468–473.
31] F. Duarte, A.L.N. Fred, A. Lourenco, M. Rodrigues, Weighting cluster ensemblesin evidence accumulation clustering, in: Portuguese Conference on ArtificialIntelligence, 2005. EPIA 2005, 2005, pp. 159–167, http://dx.doi.org/10.1109/EPIA.2005.341287
32] M. Yuwono, S.W. Su, B.D. Moulton, H.T. Nguyen, Fast unsupervised learningmethod for rapid estimation of cluster centroids, in: Proceedings of the 2012IEEE Congress on Evolutionary Computation, 2012, 2012, pp. 889–896.
33] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: G.Estabrook (Ed.), Proceedings of the 8th International Conference on Numerical
feature selection using swarm intelligence and consensus clus-tion and Air Conditioning systems, Appl. Soft Comput. J. (2015),
Taxonomy, Freeman, San Francisco, CA, 1975, 1975, pp. 143–166.34] T. Wang, Ca-tree: a hierarchical structure for efficient and scalable
coassociation-based cluster ensembles, IEEE Trans. Syst. Man Cybern. Part B:Cybern. 41 (2011) 686–698, http://dx.doi.org/10.1109/TSMCB.2010.2086059
35] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validationof cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.
1037
1038
1039
1040
1041
1042