Hardware Implementation of Low-Complexity Deep Learning ...

6
Hardware Implementation of Low-complexity Deep Learning-based Model Predictive Controller Nirlipta Ranjan Mohanty 1 , Saket Adhau 2 , Deepak Ingole 3 , and Dayaram Sonawane 1 Abstract— Model predictive control (MPC) is an advanced control strategy that predicts the future behavior of the plant while considering the system dynamics and constraints. This optimization-based control algorithm needs huge amount of computational resources as it solves the optimization problem at each sampling time. This computational load demands powerful hardware and light-weight algorithms for implementing MPC on an embedded systems with limited computational resources. Deep neural network (DNN) is a attractive alternative for MPC as it provide the close approximation for various linear/nonlin- ear functions. In this paper, a feed forward neural network (FFNN) and recurrent neural network (RNN)-based linear MPC is developed. These neural network algorithms which are trained offline can efficiently approximate MPC control law which can be easily executed on low-level embedded hardware. The closed-loop performance is verified on a hardware-in- the-loop (HIL) co-simulation on ARM microcontroller. The performance of proposed DNN-MPC is demonstrated with a case study of a 2 degree-of-freedom (DOF) helicopter. Hardware result shows that the DNN-MPC is faster and consumes less memory as compared to MPC while retaining most of the performance indices. Index Terms— Model predictive control, deep neural net- work, approximate control law, linear systems, HIL co- simulation. I. I NTRODUCTION Model predictive control (MPC) is an advanced control technique having excellent control performance for various complex systems. The important reasons for its success includes systematically handling multi-input multi-output systems, handling physical constraints on input and output variables, potential to consider multiple process objectives, and ability to control processes with nonlinear time-varying dynamics [1]. MPC calculations are based on current measurements and predictions of the future values of the controlled variables us- ing a mathematical model of the system. It solves numerical optimization problem at each sampling time that minimizes a given objective function and computes a sequence of optimal control actions. The receding horizon control (RHC) strategy, which is a distinguishing feature of MPC, allows applying the first computed control action, thus utilizing the most recent measurements to optimize the control performance [2]. MPC has been successfully implemented in various industrial 1 College of Engineering Pune, Shivajinagar 411005, India. {nrm18.instru, dns.instru}@coep.ac.in 2 Norwegian University of Science and Technology, Trondheim, Norway. [email protected] 3 KU Leuven, Department of Mechanical Engineering, Leuven, Belgium. [email protected] process applications due to its versatility, robustness, and safety guarantees [3]. Despite having great success of MPC, there are still chal- lenges for implementing MPC in real-time systems. Online implementation of MPC involves a considerable amount of calculations as it repeatedly solves an optimal control problem within the sampling time. The computational burden increases with an increase in the number of constraints, con- trol and state variables, and prediction horizon. For real-time MPC implementation, different approaches are proposed in the literature. One approach is the development of fast solvers includes fast MPC [4], alternating directions method of multipliers (ADMM) [5], and fast gradient method [6] approaches to solving the optimization problem online and are successfully used in implementing MPC. These methods show fast performance when problem dimensions are small, hence limits application to smaller systems. Authors in [7] have proposed a novel technique to use posit number system for efficient embedded implementation of online MPC that reduces the computational burden and memory utilization of MPC. Another approach is to use explicit MPC (EMPC) [8] that solves optimization problem offline and store the solution in a lookup table (LUT), thus increasing the computational speed as online computations is limited to search algorithm. The major drawback of this approach is that the dimension of the LUT increases exponentially with the length of the horizon and optimization variables hence restricted its appli- cation to embedded systems with limited storage capabili- ties and computational power. Another approach for online optimization is the use of learning-based MPC (LBMPC). On this track, DNN based MPC is proposed that learns the MPC policy and approximates the optimal control law to reduce the computational load [9]. The continued growth of computational power and the availability of large labelled data are two important factors for the recent success of DNN. The success of MPC mostly depends on the system model. However, in practice, there exists model mismatch with the plant, which leads to the low performance and offset in the reference tracking. The well-known approach to achieve offset-free tracking is to use a disturbance observer augmented with the process model [10]. Nevertheless, this augmentation increases the computational cost of the opti- mization and limits its applicability to small systems. In this paper, we propose an approximate offset-free linear MPC using a DNN approach to reduce the computational burden of classical offset-free linear MPC. The idea is to design an DNN-MPC based on data obtained by open-

Transcript of Hardware Implementation of Low-Complexity Deep Learning ...

Page 1: Hardware Implementation of Low-Complexity Deep Learning ...

Hardware Implementation of Low-complexity Deep Learning-basedModel Predictive Controller

Nirlipta Ranjan Mohanty1, Saket Adhau2, Deepak Ingole3, and Dayaram Sonawane1

Abstract— Model predictive control (MPC) is an advancedcontrol strategy that predicts the future behavior of the plantwhile considering the system dynamics and constraints. Thisoptimization-based control algorithm needs huge amount ofcomputational resources as it solves the optimization problem ateach sampling time. This computational load demands powerfulhardware and light-weight algorithms for implementing MPCon an embedded systems with limited computational resources.Deep neural network (DNN) is a attractive alternative for MPCas it provide the close approximation for various linear/nonlin-ear functions. In this paper, a feed forward neural network(FFNN) and recurrent neural network (RNN)-based linearMPC is developed. These neural network algorithms which aretrained offline can efficiently approximate MPC control lawwhich can be easily executed on low-level embedded hardware.The closed-loop performance is verified on a hardware-in-the-loop (HIL) co-simulation on ARM microcontroller. Theperformance of proposed DNN-MPC is demonstrated with acase study of a 2 degree-of-freedom (DOF) helicopter. Hardwareresult shows that the DNN-MPC is faster and consumes lessmemory as compared to MPC while retaining most of theperformance indices.

Index Terms— Model predictive control, deep neural net-work, approximate control law, linear systems, HIL co-simulation.

I. INTRODUCTION

Model predictive control (MPC) is an advanced controltechnique having excellent control performance for variouscomplex systems. The important reasons for its successincludes systematically handling multi-input multi-outputsystems, handling physical constraints on input and outputvariables, potential to consider multiple process objectives,and ability to control processes with nonlinear time-varyingdynamics [1].

MPC calculations are based on current measurements andpredictions of the future values of the controlled variables us-ing a mathematical model of the system. It solves numericaloptimization problem at each sampling time that minimizes agiven objective function and computes a sequence of optimalcontrol actions. The receding horizon control (RHC) strategy,which is a distinguishing feature of MPC, allows applying thefirst computed control action, thus utilizing the most recentmeasurements to optimize the control performance [2]. MPChas been successfully implemented in various industrial

1 College of Engineering Pune, Shivajinagar 411005, India.{nrm18.instru, dns.instru}@coep.ac.in

2 Norwegian University of Science and Technology, Trondheim, [email protected]

3 KU Leuven, Department of Mechanical Engineering, Leuven, [email protected]

process applications due to its versatility, robustness, andsafety guarantees [3].

Despite having great success of MPC, there are still chal-lenges for implementing MPC in real-time systems. Onlineimplementation of MPC involves a considerable amountof calculations as it repeatedly solves an optimal controlproblem within the sampling time. The computational burdenincreases with an increase in the number of constraints, con-trol and state variables, and prediction horizon. For real-timeMPC implementation, different approaches are proposed inthe literature. One approach is the development of fastsolvers includes fast MPC [4], alternating directions methodof multipliers (ADMM) [5], and fast gradient method [6]approaches to solving the optimization problem online andare successfully used in implementing MPC. These methodsshow fast performance when problem dimensions are small,hence limits application to smaller systems. Authors in [7]have proposed a novel technique to use posit number systemfor efficient embedded implementation of online MPC thatreduces the computational burden and memory utilization ofMPC.

Another approach is to use explicit MPC (EMPC) [8] thatsolves optimization problem offline and store the solutionin a lookup table (LUT), thus increasing the computationalspeed as online computations is limited to search algorithm.The major drawback of this approach is that the dimensionof the LUT increases exponentially with the length of thehorizon and optimization variables hence restricted its appli-cation to embedded systems with limited storage capabili-ties and computational power. Another approach for onlineoptimization is the use of learning-based MPC (LBMPC).On this track, DNN based MPC is proposed that learns theMPC policy and approximates the optimal control law toreduce the computational load [9]. The continued growth ofcomputational power and the availability of large labelleddata are two important factors for the recent success of DNN.

The success of MPC mostly depends on the systemmodel. However, in practice, there exists model mismatchwith the plant, which leads to the low performance andoffset in the reference tracking. The well-known approach toachieve offset-free tracking is to use a disturbance observeraugmented with the process model [10]. Nevertheless, thisaugmentation increases the computational cost of the opti-mization and limits its applicability to small systems.

In this paper, we propose an approximate offset-free linearMPC using a DNN approach to reduce the computationalburden of classical offset-free linear MPC. The idea is todesign an DNN-MPC based on data obtained by open-

Page 2: Hardware Implementation of Low-Complexity Deep Learning ...

loop offset-free MPC where the pair of state and inputs arestored. The DNN-MPC is designed using two neural networkarchitectures, namely, feedforward neural network (FFNN)and recurrent neural network (RNN). The developed DNN-MPC is implemented on a STM32 Nucleo-144 developmentboard with STM32F429ZI microcontroller. The proposedDNN-MPC is tested in the hardware-in-the-loop (HIL) co-simulation on a case study of a 2 degree-of-freedom (DOF)helicopter. The HIL co-simulation results of DNN-MPCare presented and subsequently compared with the classicaloffset-free MPC. Presented results show that the developedDNN-MPC is simple, faster, and takes less memory ascompared to classical MPC. The performance of DNN-MPCis close to the classical offset-free MPC and successfullyhandle the output disturbances. The main contributions ofthis paper are as follows:

• design of DNN based controller to approximate MPC,• hardware implementation of DNN-MPC with two ar-

chitectures,• HIL co-simulation of DNN-MPC and comparison of

offset-free MPC and DNN-MPC.

II. MODEL PREDICTIVE CONTROL

This section presents a classical offset-free linear MPCformulation using disturbance modeling approach.

A. MPC Formulation

We consider a discrete-time linear time-invariant (LTI)system described as follows:

x(t+ Ts) = Ax(t) +Bu(t), (1a)y(t) = Cx(t) +Du(t), (1b)

where, x(t) ∈ Rnx , u(t) ∈ Rnu and y(t) ∈ Rny , are statevector, control input vector, and output vector of the system,respectively. A ∈ Rnx×nx , B ∈ Rnx×nu , C ∈ Rny×nx ,and D ∈ Rny×nu are the system matrices describing theplant dynamics with (A,B) is controllable and (C,A) isobservable. Here Ts is the sample time of the system.

B. Disturbance Modeling

To achieve offset-free performance and disturbance re-jection, the plant model (1) is augmented with disturbancevector d(t) ∈ Rnp as shown below:[

x(t+ Ts)d(t+ Ts)

]︸ ︷︷ ︸

xe(t+Ts)

=

[A Bd

O I

]︸ ︷︷ ︸

Ae

[x(t)d(t)

]︸ ︷︷ ︸

xe(t)

+

[BO

]︸ ︷︷ ︸

Be

u(t), (2a)

ye(t) =[C Cd

]︸ ︷︷ ︸Ce

[x(t)d(t)

]︸ ︷︷ ︸

xe(t)

+Deu(t), (2b)

where Bd ∈ Rnx×np , Cd ∈ Rny×np are the disturbancemodel matrices.

C. Observer Design

Extended state xe is estimated from the plant measure-ment by designing a Luenberger observer for augmentedsystem (2) as follows [11]:

xe(t+ Ts) = Aexe(t) +Beu(t) + Le(y(t)− ye(t)), (3a)ye(t) = Cexe(t) +Deu(t), (3b)

where Le =[Lx Ld

]is the estimator gain matrix for state

and disturbance with dimension (nx × ny) and (np × ny),respectively.

D. MPC Problem Formulation

Using system model (2) and disturbance observer (3), theprediction for differential state and output in (4) can bewritten as:

minUN

N−1∑k=0

(yk − yr,k)TQ(yk − yr,k) + ∆uTkR∆uk, (4a)

s.t.

xk+Ts= Axk +Buk +Bddk, k = 0, . . . , N − 1, (4b)

dk+Ts= dk, k = 0, . . . , N − 1, (4c)

yk = Cxk +Duk + Cddk, k = 0, . . . , N − 1, (4d)∆uk = uk − uk−1, k = 0, . . . , N − 1, (4e)umin ≤ uk ≤ umax, k = 0, . . . , N − 1, (4f)u−1 = u(t− Ts), (4g)x0 = x(t), (4h)

where Q ∈ Rnx×nx and R ∈ Rnu×nu are the weightingmatrices, with conditions Q � 0 to be positive semi-definite,and R � 0 to be positive definite. We denote N as theprediction horizon, yr is the reference trajectory of the outputand initial conditions are x0 and u−1. Based on the concept(RHC) [11] the solution of (4) generates sequence of optimalcontrol actions (U?

N = u?0, . . . , u?N−1). However, only the

first term of the control sequence (u?0) is feed to the plantand a new sequence is calculated using new measurements atthe next sampling time. The optimization process is repeatedto predict future states xk+1 at time k + 1.

E. Online/Implicit MPC

It is well known that the constrained finite time opti-mal control problem (4) can be transformed in a standardquadratic programming problem as follows:

minU

1

2UTHU + fTU, (5a)

s.t. GU ≤ w, (5b)

where H ∈ RnuN×nuN is the hessian matrix, f ∈Rnx×nuN , G ∈ Rq×nuN , w ∈ Rq , and q is number ofinequalities. The QP problem (5) can be solved using wellknown methods such as active set method and interior pointmethods [12].

Page 3: Hardware Implementation of Low-Complexity Deep Learning ...

III. DEEP NEURAL NETWORK

Artificial neural network (ANN) is a machine learningapproach that mimics the human brain through a set ofalgorithms. ANN consists of an input and output layer withone or more hidden layer within. ANN having one hiddenlayer are called shallow neural networks, and more than onelayer is termed a deep neural networks. DNN have betterapproximation capabilities of complex function compared toshallow neural networks [13].

The neurons of neural networks have weights and biasassociated with them, and their values are updated duringthe training process based on the error at the output usinga backpropagation algorithm. Activation functions used inANN are nonlinear mathematical equations that determinethe output of the learned model, its accuracy, and com-putational efficiency. Popular activation functions used inANN-based controller is sigmoid, hypertangent, rectifiedlinear unit (ReLu). The ANN with an appropriate numberof hidden layers and nodes can match the performance ofMPC. Generally, the selection of hidden layer and nodes aremade by trial and error method [14]. In this work, we haveinvestigated the FFNN and RNN type regression model fordesigning the neural network controller.

A. Architecture of FFNNThe feedforward neural network was the first and simplest

type of ANN devised. In this network, the information movesin only one direction forward from the input nodes, throughthe hidden nodes (if any) and to the output nodes. Thereare no cycles or feedback loops in the network [9]. Due tothe absence of feedback, FFNN provides only a static systemmapping. FFNN is defined as a sequence of layers of neuronswhich determines a function N : Rnx → Rny of the form:

N = fL + 1 ◦ gL ◦ fL ◦ · · · g1 ◦ f1(x), (6)

where x ∈ Rnx , y ∈ Rny are input and output of the network,L is the number of hidden layers with M number of neuronsin each layer. N is described as DNN for L ≥ 2 and shallownetwork for L = 1. The interconnections among nodes areshown in Fig. 1. The output of the network is defined as

x1

x2

x3

xn x

h11

h12

h13

h1M

hL 1

hL 2

hL 3

hL M

y1

y2

y3

ˆyn y

Input

LayerHiddenLayer

HiddenLayer

Output

Layer

Fig. 1. Architecture of feed forward neural network with L hidden layers.

follows:

yt = g

(w0 +

nx∑i=1

xiwi

), (7)

where g is the activation function, wi are weights forcorresponding inputs and w0 is the bias of the neuron.The objective is to find the optimal values of wi and w0

by optimizing some cost function. In this paper hyperbolictangent are considered as activation function as follows:

g(x) = tanh(x) =e2x − 1

e2x + 1, (8)

Equation (7) can be rewrite as follows:

yt = g(w0 +XTW ), (9)

where X =

x1...xnx

and W =

w1

...wnw

.

B. Architecture of RNN

A recurrent neural network is a class of ANNs whereconnections between nodes form a directed graph alonga temporal sequence. This allows it to exhibit temporaldynamic behaviour. It has both feedforward and feedbackconnections between the neurons. Previous states generatedfrom past inputs are feedback into the network, and this givesmemory to it. While in FFNN, only current input states areconsidered, in RNN, both current and previous states arefocused. It makes RNN capture the dynamic behaviour of asystem [15]. RNN structure is generated by adding a loop inthe FFNN network, which allows information to process overtime. At time step t RNN takes input xt, computes outputof the network yt, updates internal state ht, and passes theupdated internal state information to the next time step t+ 1of the network. The structure of RNN is shown in Fig. 2.Internal state ht is defined as follows:

ht = g(ht−1, xt), (10)

where g is the activation function, ht−1 is previous statefrom previous time step, and xt is current input the networkis receiving. Equation (10) can be rewrite as follows:

ht = tanh(WThhht−1 +WT

xhxt), (11)

where tanh is the activation function for hidden nodes,Wxh ∈ Rnh×nx and Whh ∈ Rnh×nh are the input-to-hidden weight matrix and hidden-to-hidden weight matrix,respectively. The output of the network is defined as follows:

yt = WThyht, (12)

where Why ∈ Rny×nh is the hidden-to-output weight matrix.

ht ht ht ht=ht

y0

x0

y1

x1

y2

x2

yt

xt

yt

xt . . .

Fig. 2. A recurrent neural network and its unfolded structure.

Page 4: Hardware Implementation of Low-Complexity Deep Learning ...

IV. CASE STUDY: 2 DEGREE OF FREEDOM HELICOPTER

To show the performance of developed DNN-MPC, weconsider a case study of helicopter control problem. Thestate-space representation of 2 DOF helicopter model canbe expressed as follows:

x(t+ Ts) = Ax(t) +Bu(t), (13a)y(t) = Cx(t) +Du(t), (13b)

where state vector x(t) = [θ, ψ, θ, ψ]T with θ, ψ being theangular positions in pitch axis and yaw axis, respectively,input vector u(t) = [Vm,p, Vm,y]T with Vm,p, Vm,y being thepitch and yaw motor voltage respectively and output vectoris y(t) = [θ, ψ]T . The corresponding system matrices aregiven as follows:

A =

0 0 1 0

0 0 0 1

0 0 − Bp

JTp

0

0 0 0 − By

JTy

, B =

0 0

0 0

Kpp

JTp

Kpy

JTp

Kyp

JTy

Kyy

JTy

,

C =

[1 0 0 00 1 0 0

], and D =

[0],

where JTp= Jeq,p+mhelil

2cm and JTy

= Jeq,y+mhelil2cm.

The values of the parameters used in this helicopter modelare adopted from [16]. The constraints on input are −24V≤ Vm,p ≤ 24V and −15V ≤ Vm,y ≤ 15V.

V. HARDWARE IMPLEMENTATION OF DNN-MPCIn this section, we present the training, hardware setup,

and implementation design flow of DNN-MPC.

A. Design Flowchart of DNN-MPC

Fig. 3 shows the detailed design flowchart of DNN-MPC.There are three main steps in the development of the DNN-MPC. First, the design of classical offset-free MPC, second,the design of DNN-MPC and third is to implement DNN-MPC on hardware.

• MPC: in this step an offset-free MPC is designed forreference tracking with the constraints on input andstates. MPC has simulated in open-loop with randominitial conditions (states) to generate learning and targetdata for DNN training. The pairs of states and inputsare stored for further process.

• DNN: in this step the stored data is divided randomlyfor training, testing, and validation purpose. The numberof hidden layers, hidden neurons, activation function,training algorithm, and loss function is selected as perapplication requirements. Once DNN is trained it istested and validated. For unsatisfactory result DNNis retrained by re-initializing weights and biases orchanging number of hidden layers and hidden neurons.

• HIL co-simulation: the developed DNN-MPC controlleris implemented on a STM32 microcontroller to validateits performance through HIL co-simulations.

MPC

DNN

HIL

System Modeling

Offset-free MPC formulation

Data generation

Extract training, testingand validation data

Selection of hidden layer, neuronsand activation function

Selection of training algorithmand loss function

Train the DNN

Testing and validationof trained DNN

Retrain network for unsatisfactorytesting and validation result

Compile offset-free MPC andDNN-MPC on STM32

HIL co-simulation

Fig. 3. Design flowchart of DNN-MPC in MATLAB/Simulink and STM32MCU.

B. Training of DNN-MPC

To generate data for DNN training, MPC is formulated us-ing a model of the helicopter and constraints of the states andthe control inputs. MPC runs in open-loop and for each sam-ple time initial conditions for states and references are chosenrandomly from operating state-space of the system and areuniformly distributed. MPC solves the quadratic optimizationproblem for the given conditions using mpcqpsolversolver using interior point method [17] and generates optimalcontrol actions. The initial conditions and the optimal actionsare stored for DNN training. If the optimization problem isinfeasible, the data points are discarded. The training datasets have information about the policy implemented by theMPC, while solving the optimization problem. Hence thenumber of generated data pair and their quality has a majorinfluence on the approximation behavior of DNN controller.

The learning data for the FFNN is an array of length sixthat consists of randomly generated system states, referencesfor pitch and yaw tracking. Teaching data is an array oflength two that consists of the first optimal solution of theMPC controller, i.e. pitch and yaw motor voltages. For RNNimplementation two more indexes added to the learning arraythat contains the randomly generated initial condition forcontrol actions. The training data set for neural networktraining is normalized between [−1, 1]. Out of the total data,70% of the generated data is randomly selected for training

Page 5: Hardware Implementation of Low-Complexity Deep Learning ...

the neural network and the remaining 30% for testing andvalidation purpose. Both FFNN and RNN consists of twohidden layers with ten neurons each. All the hidden nodeshave tanh activation function with a linear function forthe output layer. DNN is trained by using MATLAB deeplearning toolbox. feedforwardnet and layrecnet areused to generate FFNN and RNN model respectively. Train-ing the neural network use optimization process to find theoptimal values of the weights and biases of each neuronby minimizing a given loss function. In this work, the lossfunction is chosen as a mean squared error (MSE) whichis the difference between target output and network output.MSE is expressed mathematically as follows:

MSE =1

n

n∑i=1

(yi − yi)2, (14)

where n is the number of data points, yi and yi are targetvalue and network output respectively. The training is doneusing Levenberg Marquardt algorithm [18]. Weights, andbiases are updated using the back propagation of erroralgorithm. Around 100000 data points are generated fortraining. Time required for DNN to train is 8 minutes 20seconds and 25 minutes 40 seconds for FFNN and RNNrespectively. After the training, the neural network is testedand validated with test data and validation data sets. Forunsatisfactory result, the neural network is retrained by re-initializing the weights and biases or altering the hiddenlayers and hidden neurons.

C. Hardware Setup and Configuration

The designed DNN-MPC is implemented on a STMNucleo-144 development board with STM32F746ZGT6MCU. It is an ARM 32-bit Cortex-M7 microcontrollerhaving 1024 Kb of program memory, 320 Kb of SRAMand delivers clock speed up to 180 MHz. The algorithmis developed and run in MATLAB/Simulink environment.Communication between PC and Nucleo board is carryoutthrough UART serial communication with baud rate of115200.

A MATLAB/Simulink scheme is developed which has asystem model to represent the real plant and controller infeedback configuration. The subsytem for the controller isdeveloped and build to generate C/C++ code and deployed onto the hardware. During simulation the controller is loadedinto the STM32 microcontroller to execute the calculation inreal-time and generate control actions that is given to plantmodel running in a Simulink. The hardware sampling time,memory footprint, controller output, and the plant output isstored for performance comparison and validation of DNN-MPC. Fig. 5 shows the HIL co-simulation set-up for DNNbased MPC.

VI. HARDWARE RESULTS

A. Reference Tracking Performance

In this work, prediction horizon (N ), output penalty (Q),and input penalty (R) were set to 10, 0.001×I(nu×nu),and 100×I(ny×ny), respectively. Helicopter sampling time is

chosen as 0.1 s. We demonstrated the performance of DNN-MPC with random disturbances applied to pitch angle.

Performance of DNN-MPC is tested for the referencetracking of pitch and yaw angles for a given trajectoryfor simulation time of 100 s. During HIL co-simulation, aconstant disturbance of magnitude 0.1 is applied to pitchangle between 30 to 70 s of simulation. Fig. 6 showsthe response of classical offset-free MPC and DNN-MPCs(FFNN and RNN). It can be seen from Fig. 6 (a) that theoffset-free MPC tracks pitch angle reference trajectory verywell and mitigate disturbance faster (see zoomed subplot atthe top right). It is interesting to see that DNN-MPC alsotracks the reference trajectory well but shows a little steady-state error and also mitigates disturbances but takes moretime as compared to offset-free MPC. Fig. 6 (b) shows theresponse of yaw angle for all three controllers. It clearlyshows the effect of pitch angle on yaw angle response. Fig. 6(c-d) shows the pitch and yaw motor voltages obtained bythree controllers. It can be seen that the DNN-MPC tookless voltage to track reference trajectories. Which means thatDNN-MPC shows satisfactory performance with less voltage.

B. Performance comparison

The performance of the controller in the presence ofunmeasured disturbance is presented in the Table I. Memoryfootprints, hardware sampling time, MSE, integral squareerror (ISE), integral time square error (ITSE) and integralabsolute error (IAE) are considered as performance indices.The results show that both the DNN controllers reducememory footprints remarkably, and there is a reduction incomputational cost for FFNN. MSE and other key perfor-mance indices are not much affected by the implementationof DNN controller.

TABLE ICOMPARISON OF KEY PERFORMANCE INDICES OF DNN-MPC AND

OFFSET-FREE MPC.

KPI Offset-free MPC FFNN-MPC RNN-MPC

Memory (KB) 155 116 116

Execution time (s) 0.01 0.008 0.013

Pitch Yaw Pitch Yaw Pitch Yaw

MSE 0.7569 5.36E − 4 3.4548 0.0044 3.4993 0.0038

ISE 7.07E5 5.35E2 3.4E6 4.39E3 3.45E6 3.78E3

IAE 8.60E4 2.58E3 3.49E5 2.53E4 3.46E5 2.95E4

ITSE 7.07E8 5.36E5 3.40E9 4.39E6 3.45E9 3.78E6

ITAE 8.60E7 2.58E6 3.40E9 2.53E7 3.46E8 2.95E7

VII. CONCLUSIONS

Model predictive control (MPC) is an advanced controlstrategy that predicts the future behaviour of the plantwhile considering the system dynamics and constraints. Thisoptimization-based control algorithm needs a huge amountof computational resources as it solves the optimizationproblem at each sampling time. This computational loaddemands powerful hardware and lightweight algorithms forimplementing MPC on an embedded system with limited

Page 6: Hardware Implementation of Low-Complexity Deep Learning ...

MPC

Optimization

Modelx0 ∈X

u0 ∈U

u∗yr

2 DOF

Helicopter

yuyr

d

Data acquisition

Model selection

Model training

Model tuning

Open-loop MPC DNN DNN performanceevaluation

Fig. 4. Schematic of the DNN-MPC. From left to right: MPC runs in open-loop with random initial data. DNN trained with the labeled dataset. TrainedDNN model is implemented in closed-loop control to evaluate performance.

STM32F746ZGT6

Host SystemHIL Co-simluation

Results

Fig. 5. Hardware set-up used in the HIL co-simulation of DNN-MPC tocontrol 2DOF helicopter.

0 20 40 60 80 100

Time t [s]

-25

-15

-5

5

15

25

Yaw

Moto

r V

olt

age

Vm

, y(t

)

[v]

(d)

0 20 40 60 80 100

Time t [s]

-25

-15

-5

5

15

25

Pit

ch M

oto

r V

olt

age

Vm

, p(t

)

[v]

(c)

0 20 40 60 80 100

-1.5

-1.05

-0.6

-0.15

0.3

0.75

Yaw

Angle

(t

)

[deg

]

(b)

0 20 40 60 80 100

-12

-6

0

5

11

17

Pit

ch A

ng

le

(t)

[deg

]

(a)

29.5 30 30.5 31 31.510

13

16

29.5 30 30.5 31 31.5

-0.2

-0.1

0

0.1

29.5 30 30.5 31 31.5

-10

0

10

29.5 30 30.5 31 31.5

-2024

Fig. 6. Reference tracking performance of the DNN-MPC and offset-freeMPC implemented on hardware for disturbance applied on pitch axis.

computational resources. A deep neural network (DNN) isan attractive alternative for MPC as it provides a closeapproximation for various linear/nonlinear functions. In thispaper, a feed-forward neural network (FFNN) and recurrentneural network (RNN)-based linear MPC is developed. Theseneural network algorithms, which are trained offline, canefficiently approximate MPC control law which can be easilyexecuted on low-level embedded hardware. The closed-loopperformance is verified on a hardware-in-the-loop (HIL) co-simulation on ARM microcontroller. The performance of theproposed DNN-MPC is demonstrated with a case study of a 2

degree-of-freedom (DOF) helicopter. Hardware result showsthat the DNN-MPC is faster and consumes less memory ascompared to MPC while retaining most of the performanceindices.

REFERENCES

[1] J. M. Maciejowski, Predictive control: with constraints. Pearsoneducation, 2002.

[2] J. A. Rossiter, Model-based predictive control: a practical approach.CRC press, 2003.

[3] J. H. Lee, “Model predictive control: Review of the three decadesof development,” International Journal of Control, Automation andSystems, vol. 9, no. 3, p. 415, 2011.

[4] Y. Wang and S. Boyd, “Fast model predictive control using online op-timization,” IEEE Transactions on control systems technology, vol. 18,no. 2, pp. 267–278, 2009.

[5] S. Boyd, N. Parikh, and E. Chu, Distributed optimization and statisti-cal learning via the alternating direction method of multipliers. NowPublishers Inc, 2011.

[6] S. Richter, C. N. Jones, and M. Morari, “Computational complexitycertification for real-time mpc with input constraints based on the fastgradient method,” IEEE Transactions on Automatic Control, vol. 57,no. 6, pp. 1391–1403, 2011.

[7] C. Jugade, D. Ingole, D. Sonawane, M. Kvasnica, and J. Gustafson,“A framework for embedded model predictive control using posits,” in2020 59th IEEE Conference on Decision and Control (CDC). IEEE,2020, pp. 2509–2514.

[8] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The ex-plicit linear quadratic regulator for constrained systems,” Automatica,vol. 38, no. 1, pp. 3–20, 2002.

[9] B. Karg and S. Lucia, “Efficient representation and approximation ofmodel predictive control laws via deep learning,” IEEE Transactionson Cybernetics, vol. 50, no. 9, pp. 3866–3878, 2020.

[10] U. Maeder and M. Morari, “Offset-free reference tracking with modelpredictive control,” Automatica, vol. 46, no. 9, pp. 1469–1476, 2010.

[11] F. Borrelli, A. Bemporad, and M. Morari, Predictive control for linearand hybrid systems. Cambridge University Press, 2017.

[12] J. Nocedal and S. Wright, Numerical Optimization, 2nd ed. Springer-Verlag, USA, 2006.

[13] I. Safran and O. Shamir, “Depth-width tradeoffs in approximatingnatural functions with neural networks,” in International Conferenceon Machine Learning, 2017, pp. 2979–2987.

[14] E. Asadi, M. G. da Silva, C. H. Antunes, L. Dias, and L. Glicksman,“Multi-objective optimization for building retrofit: A model usinggenetic algorithm and artificial neural network and an application,”Energy and Buildings, vol. 81, pp. 444–456, 2014.

[15] X. Bao, Z. Sun, and N. Sharma, “A recurrent neural network basedmpc for a hybrid neuroprosthesis system,” in 2017 IEEE 56th AnnualConference on Decision and Control (CDC). IEEE, 2017, pp. 4715–4720.

[16] Q. Quanser, “2- dof helicopter user and control manual,” Markham,Ontario, 2006.

[17] “The Mathworks, Inc., Natick, Massachusetts,” Available athttp://www.mathworks.com, 2020.

[18] J. J. More, “The levenberg-marquardt algorithm: implementation andtheory,” in Numerical analysis. Springer, 1978, pp. 105–116.