Model predictive HVAC control with online occupancy model
Transcript of Model predictive HVAC control with online occupancy model
M
JD
a
ARRAA
KMMOOMH
1
estiofSaca
ponuctsLs
(
h0
Energy and Buildings 82 (2014) 675–684
Contents lists available at ScienceDirect
Energy and Buildings
j ourna l ho me page: www.elsev ier .com/ locate /enbui ld
odel predictive HVAC control with online occupancy model
ustin R. Dobbs ∗, Brandon M. Henceyepartment of Mechanical and Aerospace Engineering, Cornell University, Upson Hall, Ithaca, NY 14853, USA
r t i c l e i n f o
rticle history:eceived 3 March 2014eceived in revised form 7 July 2014ccepted 22 July 2014vailable online 1 August 2014
a b s t r a c t
This paper presents an occupancy-predicting control algorithm for heating, ventilation, and air con-ditioning (HVAC) systems in buildings. It incorporates the building’s thermal properties, local weatherpredictions, and a self-tuning stochastic occupancy model to reduce energy consumption while maintain-ing occupant comfort. Contrasting with existing approaches, the occupancy model requires no manualtraining and adapts to changes in occupancy patterns during operation. A prediction-weighted cost
eywords:odel predictive controlPCccupancy predictionn-line trainingarkov chainsVAC
function provides conditioning of thermal zones before occupancy begins and reduces system outputbefore occupancy ends. Simulation results with real-world occupancy data demonstrate the algorithm’seffectiveness.
© 2014 Elsevier B.V. All rights reserved.
. Introduction
The long-term increase in energy prices has driven greater inter-st in demand-based HVAC control. Fixed temperature setpointchedules and occupancy-triggered operation are commonly usedo trim energy consumption, but these approaches have signif-cant drawbacks. First, fixed schedules become outdated; whenccupancy patterns change, early or late occupants are left uncom-ortable, or the space is conditioned prematurely or for too long.econd, thermal lag limits response speed and thus precludesggressive temperature set-back. Addressing both schedule inac-uracy and thermal lag requires a stochastic occupancy model and
control scheme that can use it effectively.Considerable research effort has been directed toward occu-
ancy detection and modeling. Work on detection has focusedn boosting accuracy through sensor fusion using probabilistic,eural, or utility networks [1–4]. Agent-based models have beensed to predict movement within buildings [5,6], as have Markovhains [7–9]. Erickson and Dong, for example, considered roomso be Markov states and movements among them to be tran-
itions in order to predict persons’ behavior, while Dong andam [10] used a semi-Markov model to merge multiple sensortreams into an occupant count estimate. The simpler Page model∗ Corresponding author. Tel.: +1 607 269 5352.E-mail addresses: [email protected] (J.R. Dobbs), [email protected]
B.M. Hencey).
ttp://dx.doi.org/10.1016/j.enbuild.2014.07.051378-7788/© 2014 Elsevier B.V. All rights reserved.
considered Boolean occupancy (occupied or vacant) under a time-heterogeneous Markov chain to generate realistic simulation inputdata, rather than for on-line forecasting [8].
With the exception of the Page model, the above efforts havefound use in heuristic [11–13] or model predictive control (MPC)schemes [10,13,14], but they face barriers to widespread usage.Most notably, where authors have used MPC, they have also usedmanually-generated thermal models [10,13,14] even though modelcreation is tedious and time-consuming and therefore expensive.Eager to demonstrate excellent performance, researchers havefavored systems with complex topologies and numerous adjust-ments that yield “one-off” engineering efforts without a clearpath to large-scale adoption. The system outlined in [10], forinstance, uses CO2, sound, and light sensors that require care-fully set detection thresholds for each room, plus an on-boardweather forecasting algorithm in lieu of forecasts already available.We aim, instead, to make occupancy-predicting control accessibleto a broader audience by presenting a simple but effective algo-rithm with a straightforward implementation. For example, weuse an automated BIM translation facility outlined in a previouspaper [15], and the core algorithm is industry-standard MPC withoccupancy weighting in the cost function. Each of the very fewadjustments serves a clearly defined purpose, and we have outlinedeach component’s operation with the practitioner in mind.
Second, recent research has paid little attention to the commis-sioning and maintenance of occupancy prediction algorithms;model training, if mentioned at all, has been assumed to be doneall at one time by someone skilled in the art [8,10,11]. Although
676 J.R. Dobbs, B.M. Hencey / Energy and Buildings 82 (2014) 675–684
State Observer
Temperature
Occupancy
Occupancy Model
Prediction
HVAC
BIM Weather
Zone
Bayesian Training
MPC Synthesis
Thermal Network
Fig. 1. Proposed system architecture. For this study, the building model has beentef
mituwf
lmiumttc
2
(c
••
•
b(vuz
s
Optimize Over
Receding Horizon
Update Occupancy
Model
Measure Occupancy
and Temperature
ranslated automatically from CAD data into a linear, time-invariant network thatncompasses the dominant thermal processes. (Model translation may also be per-ormed manually.)
ost training algorithms could be extended to work on-line, ongo-ng maintenance remains a source of long-term cost neglected byhe literature. An occupancy model invariably becomes out-of-datenless it is periodically retrained or can incrementally refine itselfith new observations. Our work uses on-line Bayesian inference
or stable performance without ongoing manual effort.The paper progresses as follows. First, we outline the prob-
em formulation. Second, we describe the stochastic occupancyodel and its on-line training algorithm. Third, we discuss its
ntegration with model predictive control. Finally, we present sim-lation results using real-world occupancy data and compare ourethod’s performance to a correctly set scheduled controller and
o an occupancy-triggered controller. Throughout the discussion,he control scenario is kept deliberately simple to emphasize theontribution of occupancy learning and its use with MPC.1
. Problem statement
We wish to minimize the total energy usage of a building heatingor cooling) system while maintaining occupant comfort. Versusonventional occupancy-triggered or scheduled control, we aim to
boost comfort by conditioning the space before occupants arrive,limit energy consumption by not running the system too early,andexploit stored thermal energy by reducing power before occu-pants leave.
Our approach is based on MPC but uses a cost function weightedy occupancy predictions from a self-training stochastic modelFig. 1). At each step, the system measures how much of the pre-ious hour the space was occupied, and the expected occupancy issed to find the best sequence of N future heat inputs to the thermalone that minimizes the expected cost. The optimization is
minuk ···uk+N−1
N−1∑E[g(xk+j, uk+j, �, �k+j)]
j=0
subject toxi+1 = Axi + Buui + BwE[wi] ∀i ∈ Z
+
0 ≤ u ≤ umax
(1)
1 See [13] for a comparison of MPC and heuristic control for a more complex HVACystem.
Fig. 2. Process flow during operation.
where
• k ∈ Z+ is the current time step, and j ∈ [0, N − 1] is the optimiza-
tion index over the horizon;• A ∈ R
n×n describes the building’s thermal dynamics;• x ∈ R
n×1 contains the building’s thermal state;• uk. . .k+N−1 contains the controller output, constrained within the
system’s capacity umax;• Bu is a vector that connects the heat input u to the zone air volume;• wk is the current weather observation, and wk+1...k+N−1 contain
an up-to-date weather prediction;• Bw is a vector that connects the weather conditions to the building
envelope;• � is the temperature setpoint, which is constant for this study
(but can be varied in practice);• �k is the latest occupancy measurement, and �k+1. . .k+N−1 are the
predicted occupancies; and• g(x, u, �, �) is a cost function that penalizes total energy consump-
tion and penalizes discomfort based on the occupancy �.
The expectation operator E[g] in Eq. (1) reflects that future val-ues of g require predictions of occupancy and of the weather. Theoptimization yields a sequence of N power commands to the HVACsystem, where positive values are heat and negative are cooling; thefirst command uk is applied, and the rest are discarded. The previ-ous and current occupancy observations are then used to train theoccupancy model, and the entire process repeats the next time step(Fig. 2).
Two assumptions are made in this presentation. First, we treatthe weather forecast as accurate so that we can later omit theexpectation operator from w. Second, we use a very simple costfunction with constant efficiency and a single linear actuator.These assumptions improve clarity but are not required in practice.Where available, weather uncertainty data can be rolled into thecost function in order to improve robustness [14]. Multiple actua-tors (e.g. radiant and forced air with vastly different response times)or nonlinear actuation (e.g. variable air volume damper position)can be pulled into the dynamical model and the cost function with-out undermining the basic approach [13,14,16]. Finally, the energypenalty gain can be varied over time to reflect, for example, chang-ing system efficiency or electricity cost.
3. Building thermal model
Thermal model accuracy influences controller performance, sowe need a thermal model that closely approximates the dominantdynamics. Here, we outline how the state-space building model isgenerated, and we validate it against EnergyPlus simulation results.
J.R. Dobbs, B.M. Hencey / Energy and Buildings 82 (2014) 675–684 677
environment. (New York Times building shown.)
ceitMrmzmots
aaiwtalrS
tac
rWpp
−1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
10
12
14
16
18
20
Simulation days
Zon
e ai
r te
mpe
ratu
re (
°C)
RC NetworkEnergyPlusAmbient
Fig. 3. The Sustain modeling and simulation
Thermal model creation has historically been a manual processontributing substantially to MPC implementation cost. Researchfforts such as the Sustain platform (Fig. 3) [15,17] and the Build-ng Resistance-Capacitance Modeling Toolbox [18] have ariseno streamline the creation of dynamical equations suitable for
PC. Here we have used a module in Sustain to generate aesistor–capacitor network directly from a CAD model. The thermalodel states are the building’s internal temperatures, including
one air plus wall layers and roofing materials that are not normallyeasured; a state observer can easily estimate these values during
peration.2 Although not used here, ways to automatically tunehe RC network parameters on-line and even estimate disturbancesuch as solar load have recently been introduced [20].
The model used for this study has 41 states: one for zone airnd the rest for building structure.3 It assumes well-mixed airnd uses time-invariant convection coefficients. Fixed coefficientsmply that the thermal gradients are always in the same direction,
hereas EnergyPlus switches coefficients depending on whetherhe gradient enhances convection [22]. In practice, for improvedccuracy, the RC network can be adjusted at each step, or a non-inear model may be used. We have included limited support foradiant transfer using coefficients from EnergyPlus’ Simple and
impleCombined convection algorithms [22].The model accepts the following inputs:
2 The observability assumption is valid because of the RC network’s construction;he driving sources (exterior and interior conditions) are themselves measurablend there are no hinges in the network. See [17] for details on the RC networkonstruction and [17,19] Theorem 1 for a proof of observability.
3 The model can include multiple control zones if needed. In practice, one mayeduce the model size using balanced truncation, which reconfigures the state space.
e have chosen to retain the full-order model to maximize accuracy and preservehysical intuition. See [21] for a survey of methods that reduce state space size whilereserving structure.
Fig. 4. RC network (solid) and EnergyPlus (dashed) simulation results for a stepchange in ambient temperature.
• the outside dry-bulb temperature,• the ground temperature,4 and• heat injected by the control system into the space (positive or
negative).
The state equation of the building is
xk+1 = Abxk + Bwwk + Buuk, (2)
where k is the time step (in hours) and xk is the complete tempera-ture state vector containing the zone temperature xzone
k. The vector
4 Daily ground temperature is available for free through on-line sources such asthe U.S. Surface Climate Observing Reference Network [23].
678 J.R. Dobbs, B.M. Hencey / Energy and Buildings 82 (2014) 675–684
ov ch
wtbc
tsbda
•
••
itscm
4
poelEcsTbl
4
epoe
o
Boolean occupancy is directly observed. Second, we augment theBoolean training with forgetting capability. Finally, we refine theapproach to use fractional occupancy in order to make predictionsmore precise.
Fig. 5. Occupancy model as a time-varying Mark
k contains the weather forecast, and uk is the heat injected intohe room by the HVAC system. The sign of uk and its constraints cane made negative, or the sign of the vector Bu can be reversed, forooling.
Let us now validate the model by comparing the zone tempera-ure time response of the RC network to EnergyPlus results underimplified conditions. The goal is not to exactly match EnergyPlus,ut rather to show that the dominant response is plausibly close. Too this, we have simulated the building using first the RC networknd then EnergyPlus under the following set of conditions:
a step change in air, ground, and sky infrared temperatures from10 to 20 ◦C,no wind or humidity, andEnergyPlus heat transfer algorithms: Simple convection for inte-rior, SimpleCombined for exterior, and CTF (conduction transferfunction) for walls.
The RC network implementation lacks support for skynfrared transfer through windows; by matching the sky radiantemperature to the outside air temperature, we have removed thisource of discrepancy from the simulation. Under the simplifiedonditions, very similar response times (Fig. 4) suggest that the RCodel is adequate for demonstration.
. Stochastic occupancy model
The heart of our method is its on-line trained Markov occu-ancy model that quickly adapts and enables the MPC to predictccupancy. The input is a stream of asynchronous pulses from pyro-lectric infrared (PIR) or similar sensors that indicate whether ateast one person is in the space. We have chosen the Mitsubishilectric Research Lab (MERL) motion detector data set [24], whichonsists of a series of one-second pulses from various motion sen-ors located throughout hallways and conference rooms in MERL.5
he meetings in the Belady conference room show a good balanceetween repetition and variety to showcase the benefits of on-line
earning.
.1. Markov chain formulation
The occupancy model is a periodic Markov chain updated atvery observation. The occupancy at time k is either �k = 1 (occu-ied) or �k = 0 (vacant). The current occupancy state and the timef day determine the probability of future occupancy. We wish tostimate the probabilities
pk = P (�k+1 = 1 | �k = 1) ,
qk = P (�k+1 = 1 | �k = 0) .(3)
5 Although we have used the MERL occupancy data, the thermal model is not onef the MERL building.
ain (a) and unrolled into a periodic structure (b).
The transition probabilities of this two-state time-varying Markovchain (Fig. 5a) are periodic; we have chosen a period M = 24 hours,so p24 ≡ p0 and q24 ≡ q0. To better visualize the periodicity, weunroll the Markov chain into 2M states (Fig. 5b), where each hourhas a 1k and 0k state. Although k in general grows without bound,its range is limited to 0 ≤ k ≤ M − 1 when dealing with the Markovchain. The choice of M affects how learned patterns relate to subse-quent predictions; if space usage patterns vary significantly acrossthe weekdays, one might want to prevent occupancy observationson Monday from influencing control actions on Tuesday, in whichcase a one-week chain would be more appropriate. For this study,the one-day Markov chain is trained using Monday through Fri-day occupancy data from the MERL data set, ignoring weekends.In practice, one could switch to a different Markov chain or useoccupancy-triggered control over weekends.
4.2. Training
In contrast to batch training, which uses a fixed-size history(Fig. 6a), on-line incremental training proceeds without user inter-vention. It uses observations to update density functions for eachof the transition probabilities; the expected values of these den-sity functions in turn populate the Markov chain’s transition matrix(Fig. 6b).
Boolean occupancy lacks granularity that could otherwise makepredictions more accurate. For example, occupancy for the entiretyof the previous hour implies different future occupancy comparedto just a few minutes. The question we wish to answer is: Given thespace was occupied for a certain fraction of the previous hour, for whatportion of subsequent hours do we expect occupancy? We approachthe problem in three steps. First, we explain the simplest case where
Fig. 6. Conventional batch training (a) versus the proposed on-line incrementalBayesian training algorithm with forgetting (b).
y and
4
iii
w
scoNNfi
b
iop
w
�
ae
t∫tipso
wt
Tbu
aoca
J.R. Dobbs, B.M. Hencey / Energ
.2.1. Boolean observed occupancyEach state of the unrolled Markov chain (Fig. 5b) has two outgo-
ng transition paths, analogous to a coin toss where the coin’s biass unknown. The well-known probability function of a biased coins
(�, N, NH) =(N
NH
)�NH (1 − �)N−NH , (4)
here
(NNH
)is the number of ways to permute NH heads in a
equence of N tosses, and � is the heads bias (with 0.5 being a fairoin). This function can be parameterized on �, N, or NH dependingn the purpose. With the bias � = �0 known and the number of tosses
= N0 fixed, the probability of obtaining NH heads, (� = �0, N = N0,H), is a discrete binomial distribution over NH. When N and NH arexed, (�, N = N0, NH = NH0 ) is the probability density over the
ias �, with∫ 1
0 (�, N = N0, NH = NH0 ) d� = 1.6
Instead of computing using N and NH all at once, we can obtaint iteratively using Bayes’ rule. Suppose we have a sequence ofutcomes xj ∈ {1, 0} where 1 means heads. The distribution, nowarameterized only on �, is defined recursively as
j(� | x1...j) ∼ j−1(� | x1...j−1)�(�, xj)
= j−1(� | x1...j−1)�(�, xj)∫ 1
0
j−1(� | x1...j−1)�(�, xj)d�
, (5)
here
(�, x) ={� x = 1
1 − � x = 0,(6)
nd 0(�) = 1 is a uniform distribution reflecting no prior knowl-dge of the bias. The ∼ notation means dividing by a constant so
hat∫ 1
0 j(�) d� = 1 holds. Our best guess of the bias is E
[ j(�)
]=
10� j(�)d�.Now let us apply this analogy to occupancy prediction. Coin
oss outcomes are independent, but occupancy transition probabil-ties depend on the current state. At any given time there are twoossible states, so we need to maintain two distributions per timetep. Let �k ∈ {0, 1} be the occupancy. The transition probabilitiesf interest are
pk = P (�k+1 = 1 | �k = 1) = E [fk(pk)]
qk = P (�k+1 = 1 | �k = 0) = E [gk(qk)] ,(7)
here the density functions fk(pk) and gk(qk) are the latest itera-ions of fk,j(pk) and gk,j(qk), updated each training instance j using
fk,j(pk | �1...k+1, �k = 1) ∼ fk,j−1(pk | �1...k)�(pk, �k+1)
fk,j(pk | �1...k+1, �k = 0) = fk,j−1(pk | �1...k)
gk,j(qk | �1...k+1, �k = 0) ∼ gk,j−1(qk| �1...k)�(qk, �k+1)
gk,j(qk | �1...k+1, �k = 1) = gk,j−1(qk | �1...k).
(8)
he ∼ indicates normalization, and f0(pk) = 1 and g0(qk) = 1 asefore. The distribution fk,j(pk) does not change from fk,j−1(pk)nless the space was occupied, and gk,j−1(qk) is also left alone unless
6 The function f(�) is a continuous beta distribution. Once linear forgetting isdded, these distributions become prohibitive to maintain analytically because sumsf beta distributions are not themselves beta distributions, but rather are compli-ated piecewise functions [25]. Therefore it is more practical to maintain numericalpproximations.
Buildings 82 (2014) 675–684 679
the space was vacant. In other words, to update the distributionsfor a state, a transition out of that state must have been observed.
4.2.2. Forgetting factorAs training proceeds, the distributions fk(pk) and gk(qk) become
increasingly narrow and converge toward delta functions, the old-est and newest training data exerting equal but ever-decreasinginfluence on the model; even the newest training data becomesdiluted. This is acceptable for batch training, where the historylength is chosen explicitly, but not for incremental training, whereeventually the distributions cannot change at all. We introduce aforgetting factor � to gradually discount older training data andallow the Markov chain to retain its flexibility. Linear forgetting isimplemented using
f ′k,j
(pk) = �fk,j(pk) + (1 − �)f0(pk)
g′k,j
(qk) = �gk,j(qk) + (1 − �)g0(qk),(9)
where f0(pk) = 1 and g0(qk) = 1, and fk,j(pk) and gk,j(qk) are the pos-terior distributions that have just been trained before forgetting.7
There is no direct equivalence between forgetting factors andbatch training history lengths; batch training (Fig. 6a) is analo-gous to a finite impulse response (FIR) filter with a defined memorylength, while incremental training (Fig. 6b) is structurally reminis-cent of an infinite impulse response (IIR) filter where the previousoutput is fed back into the filter. With batch training, the hand-picked data set may not contain all the transitions of interest,so some transitions may not be trained at all. The incrementalapproach applies training and forgetting simultaneously, retaininginfrequently observed transitions longer.
To illustrate the effect of forgetting on the distributions, we havetrained a single state of the Markov chain repeatedly using alternat-ing transitions �0 = 0 → �1 = 1 and �0 = 0 → �1 = 0. This is analogousto flipping an unbiased coin numerous times and observing headsevery other flip, from which we expect an increasingly narrow dis-tribution for p0 peaking near 0.5 (Fig. 7a). This result would bepreferred if the pattern were never expected to change, but suchconcentration in the distribution hinders its ability to change andis therefore undesirable. Using 15% forgetting (� = 0.85) gives a dis-tinctly broader distribution lifted off the horizontal axis (Fig. 7b).The distribution—and therefore its expected value—shifts laterallywith each alternate observation, even after many iterations; thisextra mobility reflects greater adaptability. The value � = 0.85 ismuch more forgetful than would be used in practice; Section 6.2will explore the relationship between � and prediction accuracy.
4.2.3. Using fractional occupancyMeasuring the percentage of occupancy over each time step
makes occupancy predictions more precise. To convert the asyn-chronous pulses from PIR sensors (Fig. 8a) into a discrete-timesequence of fractional values, we apply a simple two-step heuris-tic. First, we merge closely-spaced pulses using a minimum dwelltime to get a square wave signal �(t) (Fig. 8b). Then we superim-pose a fixed time grid over the signal and average it over each stepto obtain the discrete sequence
�k = 1tk − tk−1
∫ tk
tk−1
�(t) dt ≡ P (�k = 1) ∈ [0, 1], (10)
essentially treating �k (Fig. 8c) as the duty cycle sequence of thepulse width modulated signal �(t). We subsequently pretend that�(t) is sampled probabilistically through �k with a distribution over
7 There are other ways to implement forgetting; see [26] for a survey that com-pares linear with multiplicative forgetting.
680 J.R. Dobbs, B.M. Hencey / Energy and
Fig. 7. A particular occupancy transition probability distribution when trained withalternating data using no forgetting (� = 1) (a) and with considerable forgetting(� = 0.85) (b). The forgetting factor broadens the distribution and allows it to shift lat-erally even when extensively trained, reflecting greater ability to adjust to changesin space usage. For each case, the initial distribution is uniform (black horizontaltrace). Higher levels of training are shown as brighter color.
Fig. 8. Asynchronous sensor pulses (a); derived continuous signal using dwell time(
tP
occupancy state �k ∈ R1×2M evolves according to �k+1 = �kP. The
matrix P can be constructed from four blocks: P(I) for 1k → 1k+1,P(II) for 1 → 0 , P(III) for 0 → 1 , and P(IV) for 0 → 0 tran-
b); resulting discrete-time occupancy percentage (c).
he Markov state space �k ∈ R1×2M . The statements �k = 60% and
(�k = 1) = 0.6 are considered equivalent.From this we estimate the occupancy at time k + 1 using
P (�k+1 = 1 | �k) = �kP (�k+1 = 1 | �k = 1)
+ (1 − �k)P (�k+1 = 1 | �k = 0) (11)
= �kE [pk] + (1 − �k)E [qk] ,
Buildings 82 (2014) 675–684
where the expectation operator reflects the fact that pk and qk areestimated via fk(pk) and gk(qk). At each step k, there are four possiblestate transitions with associated posterior distributions
�k = 1 → �k+1 = 1 : f (1)k,j
(pk)∼�(pk, 1)fk,j−1(pk),
�k = 1 → �k+1 = 0 : f (0)k,j
(pk)∼�(pk, 0)fk,j−1(pk),
�k = 0 → �k+1 = 1 : g(1)k,j
(qk)∼�(qk, 1)gk,j−1(qk),
�k = 0 → �k+1 = 0 : g(0)k,j
(qk)∼�(qk, 0)gk,j−1(qk),
(12)
where f (1)k,j
is the updated posterior distribution as if �k = 1 and
�k+1 = 1 had been observed, f (0)k,j
is similar to f (1)k,j
but updated as if
�k+1 = 0 had been observed, and likewise for g(1)k,j
and g(0)k,j
. To obtain
fk,j(pk), we blend f (1)k,j
(pk) and f (0)k,j
(pk) according to the later obser-vation �k+1. We then weight the training according to �k, whichreflects how likely the space was to have started occupied; valuesof �k closer to one apply more training to fk(pk), while those closerto zero cause heavier training of gk(qk).
fk,j(pk)=�k(�k+1f
(1)k,j
(pk)+(1 − �k+1)f (0)k,j
(pk))
+(1 − �k) fk,j−1(pk)
gk,j(qk)=(1 − �k)(�k+1g
(1)k,j
(qk)+(1 − �k+1)g(0)k,j
(qk))
+�kgk,j−1(qk)
(13)
Once the new distributions fk,j(pk) and gk,j(qk) have been found, for-getting is applied similarly to Eq. (9), where Eq. (13) are used insteadfor the posterior distributions. The post-forgetting distributions arethen stored.
4.2.4. Effect of training on distribution shapeTo illustrate the connection between training data patterns
and the shapes of fk(pk) and gk(qk), we have trained two Markovchains with the MERL Belady conference room data from March22 to June 9 and sampled the distributions afterward. In Fig. 9, twosets of distributions—one for 2:00 am → 3:00 am (a) and the otherfor 3:00 pm → 4:00 pm (b)—are shown for both strong forgetting(� = 0.85, solid) and no forgetting (� = 1.0, dashed). In Fig. 9a we seethat both occupancy and vacancy at 2:00 am strongly imply vacancyat 3:00 am. In other words, early morning occupancy is very uncom-mon and usually brief. Because occupancy is rare at 2:00 am, thetransition �2 = 12 → �3 = 13 is very weakly trained and has a veryflat distribution. In Fig. 9b, we see that occupancy at 3:00 pm ismore varied, resulting in more typical bell-shaped distributions.The distributions for 3:00 pm suggest that meetings are likely tocontinue into the next hour but are unlikely to start the followinghour. The distributions for � = 0.85 are shaped similarly to thosefor � = 1.0 but are markedly subdued with expected values closerto 0.5.
4.3. Transition matrix and occupancy prediction
Recall from Section 4.1 and Fig. 5b the Markov chain has states10 . . . 123 and 00 . . . 023. The probability distribution of the current
k k+1 k k+1 k k+1sitions. The entries for P(I) and P(IV) are the expected values
J.R. Dobbs, B.M. Hencey / Energy and
Fig. 9. Probability densities for the occupied-to-occupied and vacant-to-occupied at2:00 am (a) and 3:00 pm (b). Two cases are shown: a heavy forgetting factor (� = 0.85,s
oc
F
P
T
P
the constraints on u, make it difficult to find a closed-form solu-tion using exact dynamic programming. (Recall from Fig. 5 thateach occupancy state has two possible outgoing transitions.) If we
8 Because the forecast is updated at each time step, our implementation adjuststhe augmented transition matrix A before each MPC synthesis to reflect the latestprediction. This simplifies the cost function and allows the MPC to be formulated ina compact vectorized form as detailed in Equation 3.8 of [27].
olid) and no forgetting (� = 1.0, dashed).
f p0. . .M−1 and q0. . .M−1, and the other two matrices are theiromplements.
P(I)ik
={P (�k = 1 | �i = 1) = E [pi] k = i + 1 (mod M)
0 otherwise
P(II)ik
={P (�k = 0 | �i = 1) = 1 − E [pi] k = i + 1 (mod M)
0 otherwise
P(III)ik
={P (�k = 1 | �i = 0) = E [qi] k = i + 1 (mod M)
0 otherwise
P(IV)ik
={P (�k = 0 | �i = 0) = 1 − E [qi] k = i + 1 (mod M)
0 otherwise
(14)
or example, P(I) takes the form
(I) = E
⎡⎢⎢⎢⎢⎢⎢⎢⎣
0 p0
0 p1
. . .. . .
0 pM−2
pM−1 0
⎤⎥⎥⎥⎥⎥⎥⎥⎦. (15)
he complete matrix is
[P(I) P(II)
]
=P(III) P(IV). (16)
Buildings 82 (2014) 675–684 681
The expected occupancy j steps in the future given a currentestimate �k is
E
[�k+j | �k
]=
[�k1k1×M (1 − �k)1k1×M
]︸ ︷︷ ︸�k
Pj
[1M×1
0M×1
](17)
where 1k1×M is a vector with the kth element set to one and all othersleft zero.
5. MPC formulation
To balance competing demands for occupant comfort and lowtotal energy consumption, we need to avoid conditioning the spacewhen vacancy is expected; the level of comfort should scale withoccupancy. To simplify the cost function, we have augmented thebuilding’s state space model with the non-changing temperaturesetpoint and a weather forecast shift-register system, i.e.⎡⎢⎣xk+1
�k+1
k+1
⎤⎥⎦
︸ ︷︷ ︸xk+1
= A
⎡⎢⎣xk
�k
k
⎤⎥⎦
︸ ︷︷ ︸xk
+
⎡⎣ Bu0
0
⎤⎦
︸ ︷︷ ︸B
uk, (18)
where x is the building’s thermal state (Eq. (2)), � is the comfortsetpoint, and is a shift-register state that iterates through theweather forecast over the MPC horizon. The augmented matrixA connects the weather forecast to the building thermal modelinternally.8 We seek the optimal control law
u∗k(xk, �k) = arg min
uE
⎡⎣N−1∑j=0
g(xk+j, uk+j, �k+j)
⎤⎦
subject toxi+1 = Axi + Bui ∀i ∈ Z
+
0 ≤ u ≤ umax
(19)
where uk+j is an individual control action and �k+j ∈ [0, 1] is an occu-pancy measurement or prediction. This is standard except that thestage cost adjusts the discomfort weigh using occupancy, i.e.
g(x, u, �) = x��Qx + r|u|= �ˇ(xzone − �)2 + r|u|,
(20)
where
• x is the augmented system state vector and xzone is the zone airtemperature being controlled,
• u is the heat input to the zone,• � is the observed or predicted occupancy,• � is the comfortable setpoint temperature (constant),• Q is a matrix that extracts ˇ(xzone − �)2 from x�Qx, and•
and r are the discomfort and energy cost gains.9
The many (2N) possible occupancy state trajectories, along with
9 In this simplified formulation, only the ratio of r and matters; together, theyconstitute a single tuning adjustment. The energy cost gain r can be time-varying ifone wishes, for example, to incorporate time-of-day utility pricing.
682 J.R. Dobbs, B.M. Hencey / Energy and Buildings 82 (2014) 675–684
FWs
io
u
t
Ats
iudrqtttp
6
6
c
•
•
•••
ea
Fig. 11. Influence of forgetting factor � on one-hour root-mean-square predictionerror. These results were obtained by training the model incrementally over theMERL Belady conference room occupancy data from February 12 to April 10, 2007and simultaneously comparing each observation to the prediction made in the pre-vious hour.
250
300 PredictiveTriggeredScheduled
0 1 2 3 4 5 6 7 80
50
100
150
Discomfort (occupancy °C)
Num
ber
of O
ccur
renc
es
Fig. 12. Distribution of occupancy-weighted discomfort over a two-monthsimulation. The properly-tuned schedule shows very little discomfort, while
ig. 10. Discomfort cost for high expected occupancy and low expected occupancy.hen high occupancy is predicted, the curve steepens and less deviation from the
etpoint is permitted.
nstead condition all occupancy predictions solely on the presentbservation, we obtain the approximation
∗k(xk, �k) ≈ arg min
0≤u≤umax
{g(xk, uk, �k) +
N−1∑j=1
g(xk+j, uk+j, E
[�k+j | �k
])}, (21)
where E
[�k+j | �k
]comes from Eq. (17).10 The optimization is
hen
minuk ···uk+N−1
N−1∑j=0
x�k+jE
[�k+j | �k
]Qxk+j + r|uk+j|
subject toxi+1 = Axi + Bui ∀i ∈ Z
+
0 ≤ u ≤ umax
(22)
s with conventional MPC, the controller applies uk to the sys-em and discards uk+1 . . . uk+N−1; the solution is repeated at eachubsequent step.
The controller never reaches the setpoint � for two reasons. First,ncluding energy in the cost function counteracts temperature reg-lation, with the trade-off tuned through the ratio ˇ/r. Second, theiscomfort cost is weighted by expected occupancy, which nevereaches 1.0. We have chosen to penalize |u|, rather than u2, becauseuadratic cost suppresses peaks and spreads control action overime; peak suppression inhibits the full system shutdown necessaryo save energy during vacancy. When high occupancy is predicted,he discomfort cost (Fig. 10) becomes steeper and causes the tem-erature to more closely approach the setpoint.
. Comparison to conventional control
.1. Experimental setup
To demonstrate the algorithm’s advantages over conventionalontrol, we have run a simulation under the following conditions:
MERL occupancy data for the Belady conference room (sensors452 and 453) from February 12 to April 10, 2007;EnergyPlus weather data for Elmira, NY starting March 1 (typicalmeteorological year) and a three-week warm-up period;
no un-modeled disturbances;one-hour time step;system capacity of 8.0 kW.10 Multi-parametric methods can be used to partition the state space into regions,ach with an exact control law parameterized on the entire state, at the expense of
more complex MPC formulation [28].
occupancy-triggered control produces many severe instances. Occupancy predictingcontrol yields a distribution similar to that of scheduled control but shifted slightlyto the right.
The thermal model is the single-zone building RC network dis-cussed previously. To emphasize the benefit of prediction, we havechosen the weather period to just saturate the control outputin typical winter conditions. (We could have chosen January andincreased the system capacity slightly for the same result.)
6.2. Choosing �
Before we run the simulation, we need to choose the forgettingfactor. Without forgetting (� = 1.0), consistent occupancy patterns
allow predictions to asymptotically approach � = 0 and � = 1, butthe ever-lengthening effective history length hinders adaptationand leads to very large prediction error. At the other extreme, highforgetting (� 1.0) gives a model easily distracted by irregularitiesy and Buildings 82 (2014) 675–684 683
tlsFettte
6
i
12
3
duatw
Table 1Predictive, triggered, and scheduled control performance summary for two-monthsimulation.
Discomfort (◦C×h×occ.) Energy
Total Peak Variance Total (kWh) Savings (%)
Predictive 270 3.73 0.20 2493 19
Fe
J.R. Dobbs, B.M. Hencey / Energ
hat consistently predicts occupancy near � = 0.5, which againeads to high prediction error. Intuition suggests that a minimumhould exist between these limits, and indeed this is the case.ig. 11 shows the relationship between � and one-hour predictionrror using the simulation occupancy data, with � = 0.974 givinghe best prediction accuracy. Of course, there is no guarantee thathe best past value of � will work well in the future; nonetheless,he convexity suggests that � could be calibrated on-line with anxtremum-seeking algorithm [29].
.3. Performance comparison
Fig. 13 shows simulation results for three identically-tuned MPCmplementations:
a purely occupancy-triggered controller, a scheduled controller supplemented with occupancy triggering,and
an on-line trained occupancy-predicting controller with oneweek of pre-training (� = 97.4%).
The occupancy-triggered controller (green) maintains � = 23 ◦Curing occupied hours and 10 ◦C during vacant hours. The sched-
led controller uses the same setpoint from 5:00 am to 9:00 pmnd any time the space is occupied. To simplify the simulation, allhree controllers ignore occupancy and control to 10 ◦C (50 ◦F) overeekends.ig. 13. Simulation results for a single-zone building: occupancy prediction (a), temperatnergy consumption (d). The discomfort (deviation from setpoint) is weighted by the occ
Triggered 396 7.67 0.72 2237 27Scheduled 108 1.69 0.04 3088 0
6.3.1. Energy and comfortThe occupancy-triggered controller consumes by far the least
energy because it does not account for thermal lag or expectedoccupancy and therefore runs the least. Not surprisingly, its com-fort performance upon occupant arrival is very poor, with largeleading spikes on the discomfort trace in Fig. 13c and frequent callsfor maximum output power in Fig. 13d. The scheduled controllerleaves plenty of margin around the typical occupancy envelope andconsequently yields excellent comfort at the expense of energy effi-ciency. The comfort performance of occupancy predicting MPC liesbetween these two methods, with peak discomfort slightly worsethan scheduled control but without the severe deviations of trig-gered control. Table 1 shows up to 19% energy savings comparedto the scheduled controller and significantly lower peak discomfort
than the occupancy-triggered controller.Perhaps more interesting than the discomfort peak is its distri-bution. Fig. 12 shows how many times various occupancy-weighteddiscomfort levels occur under each control method. It comes as
ure control performance and ambient conditions (b), occupant discomfort (c), andupancy �k .
6 y and
lwwsdmisp
umttmam
7
aUcgfmbm
wptomhtApaapusw
A
R
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
84 J.R. Dobbs, B.M. Hencey / Energ
ittle surprise that the scheduled controller maintains discomfortithin 2 ◦C at all times. (Clearly, though, an out-of-date scheduleould not perform this well, so this is a rather optimistic profile of
cheduled control.) The occupancy-predicting controller maintainsiscomfort less than 2 ◦C more than 94% of the time with relativelyild outliers. The occupancy-triggered controller trails with 75%
ncidence of low discomfort and numerous severe violations. Inummary, the occupancy predicting control scheme yields comforterformance that rivals that of a properly tuned schedule.
Energy performance is also as expected. The conservative sched-le leaves ample time to pre-condition the space along with someargin in the evening. The cost of this performance is 38% more
otal energy than the occupancy-triggered controller. Consump-ion by the occupancy-predicting controller is moderate, at 11%
ore than the triggered and 19% less than the scheduled control,nd there are very few instances where the system needs to run ataximum power to catch up.
. Conclusion
We have demonstrated the use of model predictive control with stochastic occupancy model to reduce HVAC energy consumption.sing occupancy predicted by an automatically trained Markovhain, the algorithm is simplified by approximate dynamic pro-ramming where occupancy is projected multiple steps into theuture using a current observation. We remark that although our
ethod relies on weather forecasts and a dynamical model of theuilding, on-line data sources and emerging software tools haveade these easier to acquire.We have made some simplifications to improve clarity. First,
e have chosen a rather coarse one-hour time step, even thoughractical controllers normally operate on a much finer time scaleo provide adequate bandwidth; the Markov model may, however,perate on an entirely different time scale from the MPC with onlyinor implementation changes. Second, our hypothetical system
as constant efficiency and operates only in heat mode to simplifyhe cost function and maintain focus on the paper’s contribution.s long as energy consumption can be controlled and room tem-erature can be measured, the stochastic occupancy model may bepplied to arbitrarily complex MPC scenarios. Finally, we have used
certainty-equivalence assumption for weather and occupancyredictions; recent research has introduced ways to incorporatencertainty into the optimization for added robustness. Demon-trating our algorithm without these simplifications is left to futureork.
cknowledgements
The authors thank Peter Radecki for his constructive feedback.
eferences
[1] K.P. Lam, M. Höynck, B. Dong, B. Andrews, Y. Chiou, R. Zhang, D. Benitez, J.Choi, et al., Occupancy detection through an extensive environmental sensornetwork in an open-plan office building, IBPSA Building Simulation 145 (2009)1452–1459.
[2] R.H. Dodier, G.P. Henze, D.K. Tiller, X. Guo, Building occupancy detectionthrough sensor belief networks, Energy and Buildings 38 (9) (2006) 1033–1043.
[3] S. Meyn, A. Surana, Y. Lin, S.M. Oggianu, S. Narayanan, T.A. Frewen, Asensor-utility-network method for estimation of occupancy in buildings,in: Proceedings of the 48th IEEE Conference on Decision and Control HeldJointly with the 2009 28th Chinese Control Conference, IEEE, Shanghai, 2009,pp. 1494–1500.
[
[
Buildings 82 (2014) 675–684
[4] J. Hutchins, A. Ihler, P. Smyth, Modeling count data from multiple sensors: abuilding occupancy model, in: 2nd IEEE International Workshop on Computa-tional Advances in Multi-Sensor Adaptive Processing – CAMPSAP 2007, 2007,pp. 241–244.
[5] C. Liao, Y. Lin, P. Barooah, Agent-based and graphical modelling of buildingoccupancy, Journal of Building Performance Simulation 5 (1) (2012) 5–25.
[6] V.L. Erickson, Y. Lin, A. Kamthe, R. Brahme, A. Surana, A.E. Cerpa, M.D. Sohn, S.Narayanan, Energy efficient building environment control strategies using real-time occupancy measurements, in: Proceedings of the First ACM Workshop onEmbedded Sensing Systems for Energy-Efficiency in Buildings, ACM, 2009, pp.19–24.
[7] V.L. Erickson, A.E. Cerpa, Occupancy based demand response HVAC controlstrategy, in: Proceedings of the Second ACM Workshop on Embedded SensingSystems for Energy-Efficiency in Buildings, ACM, Zürich, 2010, pp. 7–12.
[8] J. Page, D. Robinson, N. Morel, J.-L. Scartezzini, A generalised stochastic modelfor the simulation of occupant presence, Energy and Buildings 40 (2) (2008)83–98.
[9] B. Dong, K.P. Lam, C. Neuman, Integrated building control based on occupantbehavior pattern detection and local weather forecasting, in: Twelfth Interna-tional IBPSA Conference, IBPSA, Sydney, Australia, 2011, pp. 14–17.
10] B. Dong, K.P. Lam, A real-time model predictive control for building heating andcooling systems based on the occupancy behavior pattern detection and localweather forecasting Building Simulation, vol. 7, Springer, 2014, pp. 89–106.
11] V.L. Erickson, M.A. Carreira-Perpinán, A.E. Cerpa, OBSERVE: occupancy-basedsystem for efficient reduction of HVAC energy, in: 10th International Confer-ence on Information Processing in Sensor Networks (IPSN), IEEE, 2011, pp.258–269.
12] G. Gao, K. Whitehouse, The self-programming thermostat: optimizing setbackschedules based on home occupancy patterns, in: Proceedings of the First ACMWorkshop on Embedded Sensing Systems for Energy-Efficiency in Buildings,BuildSys’09, ACM, New York, NY, 2009, pp. 67–72.
13] S. Goyal, H.A. Ingley, P. Barooah, Occupancy-based zone-climate control forenergy-efficient buildings: complexity vs. performance, Applied Energy 106(2013) 209–221.
14] F. Oldewurtel, A. Parisio, C.N. Jones, D. Gyalistras, M. Gwerder, V. Stauch, B.Lehmann, M. Morari, Use of model predictive control and weather forecastsfor energy efficient building climate control, Energy and Buildings 45 (2012)15–27.
15] D. Greenberg, K. Pratt, B. Hencey, N. Jones, L. Schumann, J. Dobbs, Z. Dong,D. Bosworth, B. Walter, Sustain: an experimental test bed for building energysimulation, Energy and Buildings 58 (2013) 44–57.
16] P. Haves, B. Hencey, F. Borrell, J. Elliot, Y. Ma, B. Coffey, S. Bengea, M. Wetter,Model predictive control of HVAC systems: implementation and testing at theUniversity of California, Merced, Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA, USA, 2010 (Technical report).
17] J.R. Dobbs, B.M. Hencey, Automatic model reduction in architecture: a windowinto building thermal structure, in: Proceedings of the 5th National Conferenceof IBPSA-USA, IBPSA, Madison, WI, 2012, pp. 562–568.
18] D. Sturzenegger, D. Gyalistras, V. Semeraro, M. Morari, R.S. Smith, BRCMMatlab toolbox: model generation for model predictive building control, in:Proceedings of the American Control Conference, IEEE, Portland, OR, 2014, pp.1063–1069.
19] K.-S. Lu, K. Lu, Controllability and observability criteria of RLC networks overF(z), International Journal of Circuit Theory and Applications 29 (3) (2001)337–341.
20] P. Radecki, B.M. Hencey, Online thermal estimation, control, and self-excitationof buildings, in: Proceedings of the 52nd IEEE Conference on Decision andControl, IEEE, Firenze, Italy, 2013, pp. 4802–4807.
21] J.R. Dobbs, B.M. Hencey, A comparison of thermal zone aggregation methodsProceedings of the 51st IEEE Conference on Decision and Control, volume 1,IEEE, Maui, HI, 2012, pp. 6938–6944.
22] United States Department of Energy, EnergyPlus Engineering Document: TheReference to EnergyPlus Calculations, University of Illinois and University ofCalifornia, 2011.
23] National Climactic Data Center, U.S. surface climate observing reference net-work, 2014 https://www.ncdc.noaa.gov/crn/qcdatasets.html
24] C.R. Wren, Y.A. Ivanov, D. Leigh, J. Westhues, The MERL motion detector dataset,in: Proceedings of the 2007 Workshop on Massive Datasets, MD’07, ACM, NewYork, NY, 2007, pp. 10–14.
25] A.K. Gupta, S. Nadarajah, Handbook of Beta Distribution and Its Applications,volume 175, CRC Press, Boca Raton, FL, 2004, pp. 69–84.
26] R. Kulhavy, M.B. Zarrop, On a general concept of forgetting, International Jour-nal of Control 58 (4) (1993) 905–924.
27] J.A. Rossiter, Model-Based Predictive Control: A Practical Approach, CRC Press,Boca Raton, FL, 2003 (Chapter 3.2).
28] E.N. Pistikopoulos, M.C. Georgiadis, V. Dua, Multi-Parametric Programming:Theory, Algorithms, and Applications, vol. 1, Wiley-VCH Verlag, Weinheim,Germany, 2007.
29] K.B. Ariyur, M. Krstic, Real-Time Optimization by Extremum-Seeking Control,John Wiley & Sons, Hoboken, NJ, 2003.