An adaptive learning and control architecture for mitigating...

15
Received: 13 November 2018 Revised: 16 April 2019 Accepted: 22 May 2019 DOI: 10.1002/acs.3032 SPECIAL ISSUE ARTICLE An adaptive learning and control architecture for mitigating sensor and actuator attacks in connected autonomous vehicle platoons Xu Jin 1 Wassim M. Haddad 1 Zhong-Ping Jiang 2 Aris Kanellopoulos 1 Kyriakos G. Vamvoudakis 1 1 School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, Georgia 2 Control and Networks Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, New York Correspondence Wassim M. Haddad, School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0150. Email: [email protected] Funding information Air Force Office of Scientific Research, Grant/Award Number: FA9550-16-1-0100; National Science Foundation, Grant/Award Number: CPS 1851588 and ECCS-1501044; North Atlantic Treaty Organization, Grant/Award Number: SPS G5176; Office of Naval Research, Grant/Award Number: N00014-18-1- 2160 Summary In this paper, we develop an adaptive control algorithm for addressing security for a class of networked vehicles that comprise a formation of n human-driven vehicles sharing kinematic data and an autonomous vehicle in the aft of the vehicle formation receiving data from the preceding vehicles through wireless vehicle-to-vehicle communication devices. Specifically, we develop an adap- tive controller for mitigating time-invariant state-dependent adversarial sen- sor and actuator attacks while guaranteeing uniform ultimate boundedness of the closed-loop networked system. Furthermore, an adaptive learning frame- work is presented for identifying the state space model parameters based on input-output data. This learning technique utilizes previously stored data as well as current data to identify the system parameters using a relaxed persistence of excitation condition. The effectiveness of the proposed approach is demon- strated by an illustrative numerical example involving a platoon of connected vehicles. KEYWORDS adaptive control, adaptive learning, connected vehicle formations, relaxed excitation conditions, sensor and actuator attacks, uniform boundedness 1 INTRODUCTION The problem of control design of vehicle platoons has attracted considerable attention among researchers in the field of control, optimization, and communication. 1-5 Given the increasing number of transportation congestion and accidents world-wide, extensive research efforts have been devoted to increasing the adaptation, autonomy, connectivity, safety, and reliability of vehicular platoon control systems. Connected networks of vehicles often involve distributed decision-making for coordination involving information flow enabling enhanced operational effectiveness via cooperation. It is evident that as the technology and complexity of autonomous vehicles evolves, several grand research challenges need to be addressed. These include securing the autonomous vehicle from malicious cyberattacks that might increase engine revolutions per minute, disabling a cylinder or even disengaging the engine completely, activating airbags while driving to obscure vision, tampering with the breaking system causing skidding or preventing the braking system from being engaged when driving, setting the vehicle data display to an erroneous speed so that the driver is unaware that they are violating speed limits, or instigating a malfunction in the vehicle's position system. 1788 © 2019 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/acs Int J Adapt Control Signal Process. 2019;33:1788–1802.

Transcript of An adaptive learning and control architecture for mitigating...

  • Received: 13 November 2018 Revised: 16 April 2019 Accepted: 22 May 2019

    DOI: 10.1002/acs.3032

    S P E C I A L I S S U E A R T I C L E

    An adaptive learning and control architecture formitigating sensor and actuator attacks in connectedautonomous vehicle platoons

    Xu Jin1 Wassim M. Haddad1 Zhong-Ping Jiang2 Aris Kanellopoulos1

    Kyriakos G. Vamvoudakis1

    1School of Aerospace Engineering, GeorgiaInstitute of Technology, Atlanta, Georgia2Control and Networks Lab, Departmentof Electrical and Computer Engineering,Tandon School of Engineering, New YorkUniversity, Brooklyn, New York

    CorrespondenceWassim M. Haddad, School of AerospaceEngineering, Georgia Institute ofTechnology, Atlanta, GA 30332-0150.Email: [email protected]

    Funding informationAir Force Office of Scientific Research,Grant/Award Number: FA9550-16-1-0100;National Science Foundation,Grant/Award Number: CPS 1851588 andECCS-1501044; North Atlantic TreatyOrganization, Grant/Award Number: SPSG5176; Office of Naval Research,Grant/Award Number: N00014-18-1- 2160

    Summary

    In this paper, we develop an adaptive control algorithm for addressing securityfor a class of networked vehicles that comprise a formation of n̂ human-drivenvehicles sharing kinematic data and an autonomous vehicle in the aft of thevehicle formation receiving data from the preceding vehicles through wirelessvehicle-to-vehicle communication devices. Specifically, we develop an adap-tive controller for mitigating time-invariant state-dependent adversarial sen-sor and actuator attacks while guaranteeing uniform ultimate boundedness ofthe closed-loop networked system. Furthermore, an adaptive learning frame-work is presented for identifying the state space model parameters based oninput-output data. This learning technique utilizes previously stored data as wellas current data to identify the system parameters using a relaxed persistenceof excitation condition. The effectiveness of the proposed approach is demon-strated by an illustrative numerical example involving a platoon of connectedvehicles.

    KEYWORDS

    adaptive control, adaptive learning, connected vehicle formations, relaxed excitation conditions,sensor and actuator attacks, uniform boundedness

    1 INTRODUCTION

    The problem of control design of vehicle platoons has attracted considerable attention among researchers in the field ofcontrol, optimization, and communication.1-5 Given the increasing number of transportation congestion and accidentsworld-wide, extensive research efforts have been devoted to increasing the adaptation, autonomy, connectivity, safety, andreliability of vehicular platoon control systems. Connected networks of vehicles often involve distributed decision-makingfor coordination involving information flow enabling enhanced operational effectiveness via cooperation. It is evident thatas the technology and complexity of autonomous vehicles evolves, several grand research challenges need to be addressed.These include securing the autonomous vehicle from malicious cyberattacks that might increase engine revolutions perminute, disabling a cylinder or even disengaging the engine completely, activating airbags while driving to obscure vision,tampering with the breaking system causing skidding or preventing the braking system from being engaged when driving,setting the vehicle data display to an erroneous speed so that the driver is unaware that they are violating speed limits, orinstigating a malfunction in the vehicle's position system.

    1788 © 2019 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/acs Int J Adapt Control Signal Process. 2019;33:1788–1802.

    https://doi.org/10.1002/acs.3032https://orcid.org/0000-0001-9788-2051https://orcid.org/0000-0002-4362-3043https://orcid.org/0000-0003-1978-4848

  • JIN ET AL. 1789

    The design and implementation of a secure control framework for connected autonomous transportation systems is anontrivial task involving the consideration and operation of computing and communication components (see the workof Antsaklis6 and the references therein) interacting with the physical, cyber, and human-in-the-loop processes. Eventhough adaptive control can be used to address autonomous networked systems, the pervasive security and safety chal-lenges underlying connected autonomous transportation systems place additional burdens on standard adaptive controlmethods. Specifically, although adaptive control and learning architectures have been used in numerous applications toachieve stability and improve system performance, their standard architectures are not designed to address adversarialactuator, sensor, and communication attacks.

    In recent research, we have developed new adaptive control architectures that can foil malicious sensor and actuatorattacks for linear systems.7-10 The proposed adaptive control frameworks provide an integrated alternative to traditionalmethods inspired by fault detection, isolation, and recovery.11-14 More specifically, the proposed architectures utilize adap-tive control theory to effectively address malicious sensor and actuator attacks and enable adaptive autonomy. Unlikeother approaches focusing on fault detection, isolation, and recovery,11-13 the proposed frameworks are not only compu-tationally inexpensive but also do not require boundedness of all of the compromised closed-loop system signals. Theframework in the works of Jin et al9,10 can account for sensor and actuator attacks that can corrupt all or part of the avail-able sensor and actuator signals simultaneously, that is, we do not assume that only a subset of the sensor and actuatorchannels are corrupted at any given time as in the works of Fawzi et al15 and Weimer et al.16 Furthermore, we do notassume that the sensor and actuator attacks are constrained to a particular model as in the works of Schenato et al17 andGupta et al,18 which does not necessarily capture a realistic behavior of an attacker. Finally, unlike the results in otherworks,19-21 which only consider steady-state operation models, the framework in the works of Jin et al9,10 can addresstransient performance as well as steady-state system stability and performance.

    In this paper, we build on the adaptive control framework of Jin et al10 to develop an adaptive controller for a team ofconnected vehicles subject to time-invariant state-dependent sensor and actuator attacks. The proposed controller guar-antees uniform ultimate boundedness of the closed-loop networked system. The adaptive controller is composed of twocomponents, namely, a nominal controller and an additive corrective signal. It is assumed that the nominal controllerhas been already designed and implemented to achieve a desired closed-loop nominal performance. Using the nominalcontroller, an additive adaptive corrective signal is designed and added to the output of the nominal controller in orderto suppress the effects of the sensor and actuator attacks. Thus, the proposed controller is modular in the sense that thereis no need to redesign the nominal controller in the proposed framework; only the adaptive corrective signal is designedusing the available information from the nominal controller and the system.

    To account for variability in the system model parameters for different drivers, we additionally present an adaptivelearning framework for identifying the state space model using input-output data. Specifically, a learning algorithmbased on concurrent learning and experience replay architectures presented in the works of Chowdhary and Johnson22and Adam et al23 is employed to identify the model parameters under a relaxed excitation condition rather than theclassical persistency of excitation condition.24 Finally, we note that a preliminary conference version of this paperappeared in the work of Jin et al.25 This paper considerably expends on the aforementioned work25 by providing detailedproofs of all the results along with presenting additional results on learning from adaptive control under excitationconditions.

    The notation used in this paper is standard. Specifically, R denotes the set of real numbers, Rn denotes the set of n × 1real column vectors, and Rn×m denotes the set of n × m real matrices. We write || · || for the Euclidean vector norm, || · ||1for the absolute sum norm, AT for the transpose of the matrix A, and In or I for the n×n identity matrix. Furthermore, wewrite tr(·) for the trace operator, (·)−1 for the inverse operator, and V ′(x) ≜ 𝜕V(x)

    𝜕xfor the Fréchet derivative of V at x. Finally,

    we write 𝜆min(A) (respectively, 𝜆max(A)) for the minimum (respectively, maximum) eigenvalue of the Hermitian matrix Aand x (respectively, x) for the lower bound (respectively, upper bound) of a bounded signal x, that is, for x(t) ∈ Rn, t ≥ 0,x ≤ ||x(t)||, t ≥ 0 (respectively, ||x(t)|| ≤ x, t ≥ 0).

    2 VEHICLE PLATOON MODEL

    In this paper, we consider a platoon of n̂+1 automobile vehicles shown in Figure 1 traveling rectilinearly, where hi(t), t ≥ 0,denotes the bumper-to-bumper distance between vehicle i and its preceding vehicle i − 1, and vi(t), t ≥ 0, denotes thevelocity of vehicle i. The n̂ forward vehicles, which only transmit position and velocity signals through vehicle-to-vehicle

  • 1790 JIN ET AL.

    FIGURE 1 Platoon formation of the n̂ + 1 vehicles [Colour figurecan be viewed at wileyonlinelibrary.com]

    communication, are presumed to be driven by a human. The dynamics of the ith vehicle are given by

    .hi(t) = vi−1(t) − vi(t), hi(0) = hi0, t ≥ 0, (1)

    .vi(t) = 𝛼i[𝑓 (hi(t)) − vi(t)] + 𝛽i.hi(t), vi(0) = vi0, (2)

    where i = 2, 3, … , n̂, 𝛼i and 𝛽 i are human parameters with 𝛼i denoting a headway gain and 𝛽 i denoting a relative velocitygain such that 𝛼i > 0 and 𝛼i + 𝛽 i > 0, and

    𝑓 (hi(t)) =⎧⎪⎨⎪⎩

    0, if hi(t) ≤ hs,12

    vm(

    1 − cos(𝜋

    hi(t)−hshg−hs

    )), if hs < hi(t) < hg,

    vm, if hg ≤ hi(t).

    (3)

    Here, f (·) denotes a range policy and implies that vehicle i remains stationary if hi ≤ hs, where hs is the stop headwaydistance. Moreover, vi(t), t ≥ 0, increases as hi(t), t ≥ 0, increases over the range [hs, hg], where hg is the headway distancefor maximum velocity. Additionally, if hi ≥ hg, then vehicle i travels at the maximum velocity vm.

    The goal of each driver of the n̂ following vehicles is to actuate the vehicle to the desired velocity of the leading vehiclev1(t) ≡ v1, t ≥ 0, and to the desired headway h∗ = f −1(v1). Without loss of generality, we assume that 0 < v1 < vm. Notethat the human parameters 𝛼i and 𝛽 i can vary for different drivers.

    Defining Δhi(t) ≜ hi(t) − h∗, i = 2, … , n̂ + 1, and Δvi(t) ≜ vi(t) − v1, i = 2, … , n̂ + 1, and linearizing the nonlinearmodel (1) and (2) about the equilibrium point (h∗, v1), we obtain

    Δ.hi(t) = Δvi−1(t) − Δvi(t), Δhi(0) = Δhi0, t ≥ 0, (4)

    Δ .vi(t) =𝛼i

    𝜏𝑓Δhi(t) − (𝛼i + 𝛽i)Δvi(t) + 𝛽iΔvi−1(t), Δvi(0) = Δvi0, (5)

    where i = 2, … , n̂ and 𝜏 f = 1∕f ′(h∗). Note that Δv1 = 0 by definition. The dynamics of the autonomous (n̂ + 1)th vehiclereceiving kinematic data from the n̂ forward vehicles through vehicle-to-vehicle communication are given by

    Δ.hn̂+1(t) = Δvn(t) − Δvn̂+1(t), Δhi(0) = Δhi0, t ≥ 0, (6)

    Δ .vn̂+1(t) = u(t), Δvi(0) = Δvi0, (7)where u(t), t ≥ 0, is the control input.

    It is important to note here that our vehicle platoon model involves one autonomous vehicle, and hence, the controlinput u(t), t ≥ 0, is a scalar. Our framework can be extended to the case where the platoon is composed of several subpla-toons for which a distributed control architecture would be required. This extension would involve mixed human-drivenand autonomous vehicles with a more complex group-to-group/platoon-to-platoon communication architecture. Thesimplified platoon architecture involving one autonomous vehicle at the aft of several human operated vehicles is a stan-dard model used in the transportation literature (see, for example, the work of Jin and Orosz26). As shown in the work ofJin and Orosz,26 even for this simplified platoon model, deriving an optimal solution that guarantees string stability andLyapunov stability for the platoon is a nontrivial task.

    Using (1)-(7), it follows that the dynamics for the connected vehicles are given by

    .x(t) = Ax(t) + Bu(t), x(0) = x0, t ≥ 0, (8)

    http://wileyonlinelibrary.com

  • JIN ET AL. 1791

    where x(t) = [Δh2(t),Δv2(t), … ,Δhn̂+1(t),Δvn̂+1(t)]T and A ∈ R2n̂×2n̂ and B ∈ R2n̂ are given by

    A =

    ⎡⎢⎢⎢⎢⎣

    F2 0 … … 0G3 F3 0 ⋮0 ⋱ ⋱ ⋱ ⋮⋮ ⋱ Gn̂ Fn̂ 00 … 0 Gn̂+1 Fn̂+1

    ⎤⎥⎥⎥⎥⎦,

    B = [0 … 0 0 1]T,

    and where, for i = 2, … , n̂ and 𝑗 = 3, … , n̂,

    Fi =[

    0 −1𝛼i∕𝜏𝑓 −𝛼i − 𝛽i

    ], G𝑗 =

    [0 10 𝛽𝑗

    ],

    and

    Fn̂+1 =[

    0 −10 0

    ], Gn̂+1 =

    [0 10 0

    ].

    The following lemma is needed for the main result of this paper.

    Lemma 1 (Gao et al27). If 𝛼i > 0 and 𝛼i + 𝛽 i > 0, i = 2, … , n̂, then the pair (A,B) is stabilizable.

    Next, we assume that the networked vehicle system given by (8) is subject to sensor and actuator attacks so that thecompromised system state is given by

    x̃(t) = x(t) + 𝛿s(x(t)) (9)

    and is available for feedback, where x̃(t) ∈ Rn, t ≥ 0, n = 2n̂, and 𝛿s ∶ Rn → Rn captures sensor attacks. Specifically,if 𝛿s(·) is nonzero, then the uncompromised state vector x(t), t ≥ 0, is corrupted by a faulty (or malicious) signal 𝛿s(·).Alternatively, if 𝛿s(x(t)) = 0, then x̃(t) = x(t), t ≥ 0, and the uncompromised state vector is available for feedback.

    Here, we assume that the sensor attack in (9) is parameterized as 𝛿s(x) = qx, where q ∈ R is the sensor uncertainty, andhence, by (9), we obtain x̃ = (1 + q)x. Thus, we assume that q > −1 in order to construct a feasible corrective signal v(t),t ≥ 0, since q = −1 results in x̃(t) = 0, and hence, it is not possible to construct v(t), t ≥ 0, to asymptotically recover thenominal (ie, uncompromised) system performance.

    Furthermore, we assume that the control input is also compromised and is given by

    ũ(t) = u(t) + 𝛿a(x̃(t)), (10)

    where ũ(t) ∈ R, t ≥ 0, denotes the compromised control command signal and 𝛿a ∶ Rn → R captures actuator attacks. Inparticular, if 𝛿a(·) is nonzero, then the uncompromised control signal u(t), t ≥ 0, is corrupted with a faulty (or malicious)signal 𝛿a(·). Alternatively, if 𝛿a(x(t)) = 0, then ũ(t) = u(t), t ≥ 0, and the control signal is uncompromised (see Figure 2).

    FIGURE 2 Closed-loop dynamical system in the presence of sensor and actuator attacks

  • 1792 JIN ET AL.

    Here, we assume that the actuator attack in (10) can be parameterized as

    𝛿a(x̃) = W T𝜑(x̃) + 𝜎(x̃), (11)

    where W ∈ Rp is an unknown weighting vector, 𝜑(·) ∈ Rp is a known nonlinear function, and 𝜎(x̃) ∈ R is unknownand assumed to be bounded, that is, |𝜎(x̃)| ≤ �̄�, x̃ ∈ Rn, and �̄� > 0 is unknown. Note that assuming |𝜎(x̃)| ≤ �̄�,x̃ ∈ Rn, is without loss of generality since a worst-case actuator attack will lead to actuator amplitude saturation inpractice. Alternatively, we can assume that 𝜎(x̃), x̃ ∈ Rn, satisfies a Lipschitz condition. See Remark 4 for furtherdetails.

    Remark 1. The sensor attack model can capture a multiplicative attack, wherein the attacker can corrupt the sensormeasurements in a relative sense. For example, under this multiplicative attack model, a malicious attack on a vehi-cle speed sensor will display a fraction of the vehicle's speed resulting in an unintentional increase in the vehicle'sregulated velocity. Alternatively, the actuator attack model can also capture an additive state-dependent signal thataccounts for a parameterization of the system attack modes as well as any residual signals.

    Remark 2. In our formulation, we assume that vehicle communication is instantaneous; we do not consider anynetwork communication attacks, link failures, or packet dropouts. These extensions, along with message transmissionand processing delays, will be considered in future research.

    Note that the compromised controlled system is given by

    .x(t) = Ax(t) + Bũ(t), x(0) = x0, t ≥ 0. (12)

    For 𝛿s(x(t)) ≠ 0, t ≥ 0, and 𝛿a(x(t)) ≠ 0, t ≥ 0, our objective is to design a feedback controller c of the form

    u(t) = Kx̃(t) + v(t), (13)

    where K ∈ R1×n is a feedback gain stabilizing the uncompromised (ie, nominal) system and v(t) ∈ R, t ≥ 0, is a correctivesignal that suppresses or counteracts the effect of the state-dependent sensor and actuator attacks 𝛿s(x(t)), t ≥ 0, and𝛿a(x(t)), t ≥ 0, to approximately recover the nominal system performance achieved when the uncompromised state vectoris available for feedback and control signal is uncompromised.

    Depending on the available resources of the attacker and the knowledge of the system, different malicious attacks canbe injected into cyber-physical systems.28 These available resources are generally categorized as disclosure and disrup-tion resources. Disclosure resources enable an attacker to gather sensitive information about the system, such as sensormeasurements and control input commands, whereas disruption resources enable the attacker to affect and alter thesystem operation by violating data integrity. Such attacks can include eavesdropping, denial of service, and false datainjection.

    In our formulation, we are considering data deception (or false data injection) attacks, wherein an attacker modi-fies the data sent from the sensors to the controller as well as the signal sent from the controller to the actuators. Inthis type of attacks, the attacker's goal is to prevent the sensor and actuator from receiving correct signals and con-trol packets, and thereby transmitting erroneous information and commands to the sensors and actuators. The sensorattack is modeled as a multiplicative state-dependent sensor attack and the actuator attack is modeled as an additivestate-dependent signal to the output of the nominal controller by an attacker. We emphasize that, in fault detectionand accommodation control, the sensor and actuator faults have a time-varying (but not state-dependent) structure, andhence, fault tolerant control techniques are not usually applicable for dealing with state-dependent sensor and actuatorattacks.

    It is also important to note that the assumptions and approaches of the fault tolerant schemes considered in the literature(see, for example, other works13,29,30) are completely different from the proposed framework. Namely, the main advantagesof the proposed approach are that the controller framework is easily implemented for a given closed-loop system withoutthe need of redesigning the nominal controller and does not need a separate attack detection module.

  • JIN ET AL. 1793

    3 ADAPTIVE CONTROL FOR STATE-DEPENDENT SENSOR ANDACTUATOR ATTACKS

    In this section, we design the corrective signal v(t), t ≥ 0, in (13) to achieve ultimate boundedness in the presence ofstate-dependent sensor and actuator attacks. First, note that since (A,B) is stabilizable, there exists a feedback gain matrixK ∈ R1×n such that Ar ≜ A+BK is Hurwitz. In this case, it follows from converse Lyapunov theory31 that for every positivedefinite matrix R ∈ Rn×n, there exists a unique positive-definite matrix P ∈ Rn×n satisfying

    0 = ATr P + PAr + R. (14)

    To achieve ultimate boundedness in the face of state-dependent sensor and actuator attacks, we use the corrective signalgiven by

    v(t) = −�̂�(t)Kx̃(t) − Ŵ T(t)𝜑(x̃(t)) − �̂�(t)sgn(BTPx̃(t)), (15)

    where, for 𝛼 ∈ R, sgn(𝛼) ≜ 𝛼|𝛼| , 𝛼 ≠ 0, and sgn(0) ≜ 0, with update laws.�̂�(t) = 𝛾 x̃T(t)PBKx̃(t) − 𝜉1�̂�(t), �̂�(0) = �̂�0, t ≥ 0, (16)

    .Ŵ(t) = 𝜂𝜑(x̃(t))x̃T(t)PB − 𝜉2Ŵ(t), Ŵ(0) = Ŵ0, (17)

    .�̂�(t) = 𝜈|x̃T(t)PB| − 𝜉3�̂�(t), �̂�(0) = �̂�0, (18)

    where �̂�(t) ∈ R, t ≥ 0, is the estimate of 𝜇 ≜ q(1 + q)−1 ∈ R that depends on the sensor uncertainty q, Ŵ(t) ∈ Rp, t ≥ 0,is the estimate of the parametric uncertainty W, �̂�(t) ∈ R, t ≥ 0, is the estimate of the unknown bound �̄�, and 𝛾 ∈ R,𝜂 ∈ R, 𝜈 ∈ R, 𝜉1 ∈ R, 𝜉2 ∈ R, and 𝜉3 ∈ R are positive design gains.

    Next, using (10) and (11), (12) can be equivalently written as

    .x(t) = Ax(t) + B[u(t) + W T𝜑(x̃(t)) + 𝜎(x̃(t))], x(0) = x0, t ≥ 0. (19)

    Now, define �̃�(t) ≜ 𝜇 − �̂�(t), t ≥ 0, W̃(t) ≜ W − Ŵ(t), t ≥ 0, �̃�(t) ≜ �̄� − �̂�(t), t ≥ 0, and 𝜆 ≜ (1 + q)−1. Since q > −1, notethat 𝜇 and 𝜆 are well defined and 𝜆 > 0. Next, using qx = 𝜇x̃, (9), (13), and (15), it follows from (19) that

    .x(t) = Ax(t) + BKx̃(t) + Bv(t) + B[W T𝜑(x̃(t)) + 𝜎(x̃(t))]= Ax(t) + BKx(t) + BKqx(t) + Bv(t) + B[W T𝜑(x̃(t)) + 𝜎(x̃(t))]= Arx(t) + BK𝜇x̃(t) − �̂�(t)BKx̃(t) − BŴ T(t)𝜑(x̃(t))− B�̂�(t)sgn(BTPx̃(t)) + B[W T𝜑(x̃(t)) + 𝜎(x̃(t))]

    = Arx(t) + �̃�(t)BKx̃(t) + BW̃ T(t)𝜑(x̃(t)) + B(𝜎(x̃(t))− �̂�(t)sgn(BTPx̃(t))), x(0) = x0, t ≥ 0. (20)

    Theorem 1. Consider the dynamical system given by (12) with sensor and actuator attacks given by (9) and (10), respec-tively. Then, with the controller c given by (13), the corrective signal v(t), t ≥ 0, given by (15), and adaptive laws givenby (16)-(18), the closed-loop system given by (16)-(18) and (20) satisfies

    lim supt→∞

    ||x(t)||2 ≤ c0c1𝜆min(P)

    , (21)

    where c0 ≜ 𝛾−1𝜉1𝜆𝜇2 + 𝜂−1𝜉2𝜆W TW + 𝜈−1𝜉3𝜆�̄�2 and c1 ≜ min{𝜆min(R)𝜆max(P)

    , 𝜉1, 𝜉2, 𝜉3}. In addition, the adaptive estimates �̂�(t),t ≥ 0, Ŵ(t), t ≥ 0, and �̂�(t), t ≥ 0, are ultimately uniformly bounded.

  • 1794 JIN ET AL.

    Proof. To show ultimate boundedness of the closed-loop system, consider the Lyapunov-like function given by

    V(x, �̃�, W̃ , �̃�) = xTPx + 𝜆𝛾�̃�2 + 𝜆

    𝜂W̃ TW̃ + 𝜆

    𝜈�̃�2, (22)

    where P satisfies (14). Now, the derivative of (22) along the closed-loop system trajectories of (16)-(18) is given by

    .V(x, �̃�, W̃ , �̃�) = xTPArx + xTATr Px + 2�̃�xTPBKx̃ + 2xTPBW̃ T𝜑(x̃)

    + 2xTPB(𝜎(x̃) − �̂�sgn(BTPx̃))

    + 2𝜆𝛾�̃�[−𝛾 x̃TPBKx̃ + 𝜉1�̂�

    ]+ 2𝜆

    𝜂W̃ T

    (−𝜂𝜑(x̃)x̃TPB + 𝜉2Ŵ

    )+ 2𝜆

    𝜈�̃�[−𝜈|x̃TBP| + 𝜉3�̂�], (x, �̃�, W̃ , �̃�) ∈ Rn × R ×Rp × R. (23)

    Noting that 2xTPBW̃ T𝜑(x̃) = 2W̃ T𝜑(x̃)xTPB, it follows from (23) that

    .V(x, �̃�, W̃ , �̃�) = xT

    [ATr P + PAr

    ]x + 2�̃�xTPBKx̃

    + 2𝜆𝛾�̃�[−𝛾 x̃TPBKx̃ + 𝜉1�̂�

    ]+ 2W̃ T𝜑(x̃)xTPB

    + 2𝜆𝜂

    W̃ T(−𝜂𝜑(x̃)x̃TPB + 𝜉2Ŵ

    )+ 2xTPB(𝜎(x̃) − �̂�sgn(BTPx̃)) + 2𝜆

    𝜈�̃�[−𝜈|x̃TBP| + 𝜉3�̂�] ,

    (x, �̃�, W̃ , �̃�) ∈ Rn × R ×Rp × R. (24)

    Next, using x̃TPB𝜎(x̃) ≤ |x̃TPB|�̄� and x̃TPBsgn(BTPx̃) = |x̃TPB|, it follows that2xTPB[𝜎(x̃) − �̂�sgn(BTPx̃)] ≤ 2𝜆|x̃TPB|�̄� − 2𝜆|x̃TPB|�̂� = 2𝜆|x̃TPB|�̃�, x ∈ Rn. (25)

    Now, using (14), it follows from (24) that

    .V(x, �̃�, W̃ , �̃�) ≤ −xTRx + 2�̃�𝜆x̃TPBKx̃ + 2𝜆

    𝛾�̃�[−𝛾 x̃TPBKx̃ + 𝜉1�̂�

    ]+ 2W̃ T𝜆𝜑(x̃)x̃TPB + 2𝜆

    𝜂W̃ T

    (−𝜂𝜑(x̃)x̃TPB + 𝜉2Ŵ

    )+ 2|x̃TBP|�̃�𝜆 + 2𝜆

    𝜈�̃�[−𝜈|x̃TBP| + 𝜉3�̂�]

    ≤ −xTRx + 2𝜆𝛾𝜉1�̃��̂� + 2

    𝜆

    𝜂𝜉2W̃ TŴ

    + 2𝜆𝜈𝜉3�̃��̂�, (x, �̃�, W̃ , �̃�) ∈ Rn ×R × Rp ×R. (26)

    Next, writing 2 𝜆𝛾𝜉1�̃��̂� and 2 𝜆

    𝜂𝜉2W̃ TŴ in (26), respectively, as

    2𝜆𝛾𝜉1�̃��̂� = 2

    𝜆

    𝛾𝜉1�̃�(𝜇 − �̃�) = 2

    𝜆

    𝛾𝜉1�̃�𝜇 − 2

    𝜆

    𝛾𝜉1�̃�

    2

    ≤ 𝜆𝛾𝜉1�̃�

    2 + 𝜆𝛾𝜉1𝜇

    2 − 2𝜆𝛾𝜉1�̃�

    2

    = −𝜆𝛾𝜉1�̃�

    2 + 𝜆𝛾𝜉1𝜇

    2, (27)

  • JIN ET AL. 1795

    where 𝜆𝛾𝜉1𝜇

    2 is a constant, and

    2𝜆𝜂𝜉2W̃ TŴ ≤ −

    𝜆

    𝜂𝜉2W̃ TW̃ +

    𝜆

    𝜂𝜉2W TW , (28)

    2𝜆𝜈𝜉3�̃��̂� ≤ −

    𝜆

    𝜈𝜉3�̃�

    2 + 𝜆𝜈𝜉3�̄�

    2, (29)

    it follows from (26)-(29) that

    .V(x, �̃�, W̃ , �̃�) ≤ −xTRx − 𝛾−1𝜆𝜉1�̃�2 − 𝜂−1𝜆𝜉2W̃ TW̃ − 𝜈−1𝜆𝜉3�̃�2 + 𝛾−1𝜉1𝜆𝜇2

    + 𝜂−1𝜉2𝜆W TW + 𝜈−1𝜉3𝜆�̄�2

    ≤ −𝜆min(R)𝜆max(P)

    xTPx − 𝛾−1𝜆𝜉1�̃�2 − 𝜂−1𝜆𝜉2W̃ TW̃ − 𝜈−1𝜆𝜉3�̃�2 + c0

    ≤ −c1V(x, �̃�, W̃ , �̃�) + c0, (x, �̃�, W̃ , �̃�) ∈ Rn ×R ×Rp × R. (30)

    Thus, it follows from (30) that

    0 ≤ V(x(t), �̃�(t), W̃(t), �̃�(t)) ≤ V(x(0), �̃�(0), W̃(0), �̃�(0))e−c1t + c0c1, t ≥ 0, (31)

    and hence, all the signals of the closed-loop system are uniformly ultimately bounded. Furthermore, noting that

    lim supt→∞

    xT(t)Px(t) ≤ lim supt→∞

    V(x(t), �̃�(t), W̃(t), �̃�(t)) ≤ c0c1, (32)

    it follows that (21) holds, which implies that the trajectory of the closed-loop system associated with the plantdynamics is uniformly ultimately bounded.

    Remark 3. Theorem 1 assumes that q > −1. This assumption implies that 𝜆 > 0. As long as the sign of 𝜆 is known,Theorem 1 can be used to address the case where 𝜆 < 0. The assumption q > −1 can be relaxed by utilizing tools fromthe work of Lavretsky and Wise,32 which can allow 𝜆 to have any sign as long as q ≠ −1 under the assumption that itssign is a priori known.

    Remark 4. In arriving at Theorem 1, we assumed that |𝜎(x̃)| ≤ �̄�, x̃ ∈ Rn, where �̄� > 0 is unknown. Alternatively, wecan assume that 𝜎(x̃) satisfies the Lipschitz condition |𝜎(x̃)| ≤ �̄�||x̃||, x̃ ∈ Rn, where �̄� > 0 is an unknown Lipschitzconstant. In this case, it can be shown that Theorem 1 holds with (15) replaced by

    v(t) = −�̂�(t)Kx̃(t) − Ŵ T(t)𝜑(x̃(t)) − �̂�(t)||x̃(t)||sgn(BTPx̃(t)) (33)and with (18) replaced by

    .�̂�(t) = 𝜈||x̃(t)|||x̃T(t)PB| − 𝜉3�̂�(t), �̂�(0) = �̂�0, t ≥ 0. (34)

    Note that the controller u(t), t ≥ 0, given by (13) is discontinuous because of the presence of the signum function sgn(·)in the controller architecture. This discontinuity can lead to a chattering phenomenon, which is undesirable in practice. Inorder to reduce or eliminate the chattering effect, a smooth function can be implemented instead of the signum function,33that is, we replace sgn(·) by tanh(·). Note that33

    0 ≤ |𝛼| − 𝛼 tanh(𝛼𝜀

    )≤ k0𝜀, 𝛼 ∈ R, (35)

    where 𝜀 > 0 is a design constant and k0 satisfies k0 = e−(k0+1) with k0 = 0.2785. Thus, we modify (15) as

    v(t) = −�̂�(t)Kx̃(t) − Ŵ T(t)𝜑(x̃(t)) − �̂�(t) tanh(

    BTPx̃(t)�̂�(t)𝜀

    ). (36)

  • 1796 JIN ET AL.

    In this case, (25) becomes

    2xTPB[𝜎(x) − �̂� tanh

    (BTPx̃�̂�𝜀

    )]

    ≤ 2|x̃TPB|�̄�𝜆 − 2x̃TPB�̂�𝜆 tanh(BTPx̃�̂�𝜀

    )

    = 2|x̃TPB|(�̂� + �̃�)𝜆 − 2x̃TPB�̂�𝜆 tanh(BTPx̃�̂�𝜀

    )

    ≤ 2|x̃TPB|(|�̂�| + �̃�)𝜆 − 2x̃TPB�̂�𝜆 tanh(BTPx̃�̂�𝜀

    )

    = 2|x̃TPB|�̃�𝜆 + 2𝜆 [|x̃TPB�̂�| − x̃TPB�̂� tanh( x̃TPB�̂�𝜀

    )]≤ 2|x̃TPB|�̃�𝜆 + 2𝜆k0𝜀. (37)

    Now, it follows from (30) that

    .V(x, �̃�, W̃ , �̃�) ≤ −xTRx − 𝛾−1𝜆𝜉1�̃�2 − 𝜂−1𝜆𝜉2W̃ TW̃ − 𝜈−1𝜆𝜉3�̃�2 + 𝛾−1𝜉1𝜆𝜇2

    + 𝜂−1𝜉2𝜆W TW + 𝜈−1𝜉3𝜆�̄�2 + 2𝜆k0𝜀

    ≤ −𝜆min(R)𝜆max(P)

    xTPx − 𝛾−1𝜆𝜉1�̃�2 − 𝜂−1𝜆𝜉2W̃ TW̃ − 𝜈−1𝜆𝜉3�̃�2 + c2

    ≤ −c1V(x, �̃�, W̃ , �̃�) + c2, (x, �̃�, W̃ , �̃�) ∈ Rn ×R ×Rp × R, (38)

    where c2 ≜ 𝛾−1𝜉1𝜆𝜇2 + 𝜂−1𝜉2𝜆W TW + 𝜈−1𝜉3𝜆�̄�2 + 2𝜆k0𝜀 = c0 + 2𝜆k0𝜀 and c1 = min{𝜆min(R)𝜆max(P)

    , 𝜉1, 𝜉2, 𝜉3

    }. Using similar

    arguments as in the proof of Theorem 1, it can be shown that

    lim supt→∞

    ||x(t)||2 ≤ c2c1𝜆min(P)

    , (39)

    which is identical to the result of Theorem 1 with the only difference being that c0 is replaced by c2.In practice, the constant c2

    c1𝜆min(P)in (39) should be made small so that the trajectory of the system can be regulated as

    close to the equilibrium as possible. This can be achieved by taking a large value for c1 and a small value for c2. Note thatsince c2 = 𝛾−1𝜉1𝜆𝜇2 +𝜂−1𝜉2𝜆W TW +𝜈−1𝜉3𝜆�̄�2 +2𝜆k0𝜀 and c1 = min{

    𝜆min(R)𝜆max(P)

    , 𝜉1, 𝜉2, 𝜉3}, the value of c1 can be made large by

    choosing large values for 𝜆min(R)𝜆max(P)

    , 𝜉1, 𝜉2, and 𝜉3. Alternatively, the value of c2 can be made small by choosing small valuesfor 𝛾−1𝜉1, 𝜂−1𝜉2, 𝜈−1𝜉3, and 2�̄�k0𝜀, which can be achieved by choosing large values for 𝛾 , 𝜂, and 𝜈, and a small value for 𝜀.However, since 𝛾 , 𝜂, and 𝜈 are design gain parameters used in the adaptive laws (16)-(18), selecting large values for theseparameters can introduce transient oscillations in the update law estimates of �̂�, Ŵ , and �̂� and, hence, in the controlsignal u(t), t ≥ 0. This can be remedied by adding a modification term in the update laws to filter out the high frequencycontent in the control signal while preserving uniform ultimate boundedness. For details of a similar approach, see thework of Yucelen and Haddad.34

    4 ADAPTIVE LEARNING UNDER RELAXED EXCITATION CONDITIONS

    As noted in Section 2, the headway and relative velocity gain parameters 𝛼i and 𝛽 i, i = 2, … , n̂ + 1, can vary for differ-ent drivers and, hence, are uncertain. The problem of determining a model for a given dynamical system with unknownparameters has been studied in the literature (see the work of Lewis et al35 and the references therein). Specifically, differ-ent approaches have been proposed for model identification wherein the identification procedure is performed online oroffline. While least squares algorithms have been investigated extensively for offline identification,36 research on onlinealgorithms has predominately focused on adaptive learning methods. In the work of Modares et al,37 a filtered regressoris developed that leverages concurrent learning22 and experience replay techniques23 to relax the persistence of excitationcondition that needs to be satisfied during classical learning methods.

  • JIN ET AL. 1797

    In this section, we present an algorithm that allows online identification of the system parameters for our model underthe adaptive control signal (13). Specifically, (8) is rewritten in a regressor form and a filter is characterized that comparesthe current estimate of the system with the actual behavior of the system. Here, we use a gradient descent method to learnthe identifier weights while concurrently using past and present data during learning. More specifically, we collect systemtrajectory data and add them to a history stack while including them in the learning process alongside the current data.

    To elucidate our adaptive learning framework from the adaptive control signal (13) under a relaxed excitation condition,note that (8) can be rewritten as

    .x(t) = Θ⋆𝜓(x(t),u(t)), x(0) = x0, t ≥ 0, (40)

    where Θ⋆ = [A B] ∈ Rn×(n+1) is an ideal weighting matrix identifying the unknown plant matrix and input vectorparameters, and 𝜓(x(t),u(t)) = [xT(t)uT(t)]T ∈ Rn+1 is a regressor vector. Our goal is to develop an algorithm that willallow us to identify the plant matrix A and input vector B parameters based on the control input u(t), t ≥ 0, and theobserved trajectory x(t), t ≥ 0.

    Remark 5. Note that since our model is linear, the system representation (40) has no approximation errors for thegiven regressor vector.

    Using a similar framework as in the work of Modares et al,37 we develop a filtered version of the regressor dynamics.Specifically, adding and subtracting Afx(t), t ≥ 0, where Af = 𝛼fIn and 𝛼f > 0, to and from (40) yield

    .x(t) = −Af x(t) + Θ⋆𝜓(x(t),u(t)) + Af x(t), x(0) = x0, t ≥ 0.

    Now, (40) can be rewritten as

    x(t) = Θ⋆h(x(t)) + 𝛼f l(x(t)) + 𝜀f (t),.h(x(t)) = − 𝛼f h(x(t)) + 𝜓(x(t),u(t)), h(0) = 0, t ≥ 0, (41)

    .l(x(t)) = −Af l(x(t)) + x(t), l(0) = 0, (42)

    where h(x(t)) ∈ Rn+1 is the filtered regressor version of𝜓(x(t),u(t)) for every t ≥ 0, 𝜀f (t) = e−Af tx0, and l(x(t)) ∈ Rn, t ≥ 0.To implement a learning technique for adaptation of the regressor weights, define the current approximation of the

    state by

    x̂(t) = Θ̂(t)h(x(t)) + 𝛼f l(x(t)),

    where Θ̂(t) ∈ Rn×(n+1), t ≥ 0, is the current estimate of the regressor weights. In this case, the state estimation error isgiven by

    e(t) = x̂(t) − x(t).

    Thus, the learning algorithm identifying the system dynamics is given by

    .Θ̂(t) = −Γe(t)hT(x(t)), Θ̂(0) = Θ̂0, t ≥ 0,

    where Γ is a positive-definite gain matrix that determines the speed of convergence to the ideal weights and Θ̂0 the initialestimate of the identification weights.

    The following definition is needed for the main result of this section.

    Definition 1 (Ioannou and Sun24). A vector signal p(t), t ≥ 0, is persistently excited if there exist positive constants𝛾1, 𝛾2, and a finite-time T > 0 such that 𝛾1I ≤ ∫

    t+Tt p(𝜏)p

    T(𝜏)d𝜏 ≤ 𝛾2I.

    Recall that in order for the regressor weights Θ̂(t), t ≥ 0, to converge to the ideal weights Θ∗, h(x(t)) must satisfy apersistency of excitation condition for every t ≥ 0.24 It is a common practice to achieve this by adding a probing noisesignal during learning. To relax this excitation condition, we use the framework proposed in the works of Chowdhary

  • 1798 JIN ET AL.

    and Johnson22 and Adam et al23 and employed in the work of Vamvoudakis et al38 to concurrently collect past and currentdata to avoid the need for probing and improve convergence of the identification weights. Specifically, we use the updatelaw for the regressor weights Θ̂(t), t ≥ 0, given by

    .Θ̂(t) = −Γe(t)hT(x(t)) − Γ

    np∑i=1

    e(ti)hT(x(ti)), Θ̂(0) = Θ̂0, t ≥ 0, (43)

    where e(ti) = x̂(ti) − x(ti) is obtained from the trajectory data collected at times t1 < t2 · · · < tnp , where np is the numberof previous data saved and utilized during learning.

    The following theorem presents sufficient conditions for convergence of the proposed learning algorithm.

    Theorem 2. Consider the dynamical system (8) and its regressor representation (40), and assume a weight tuning lawgiven by (43) with a filtered regressor given by (41) and (42). If there exist np such that the vector of collected past dataΛ = [h(x(t1))h(x(t2))… h(x(tnp))] contains n+ 1 linearly independent vectors, then the persistency of excitation conditionis satisfied and Θ̂(t) → Θ⋆ as t → ∞.

    Proof. The proof is a direct consequence of Theorem 1 in the work of Modares et al.37Standard adaptive identification algorithms39 do not include the term

    ∑npi=1 e(ti)h

    T(x(ti)) in the update law (43)and the persistency of excitation condition must be satisfied with knowledge of only the current value of h(x(t)).Moreover, by Definition 1, this property must hold for t ≥ 0. This is equivalent to requiring that the matrixM(t) = ∫ t+Tt h(x(𝜏))h

    T(x(𝜏))d𝜏 is positive definite for t ≥ 0. This is also equivalent to requiring that the sig-nal h(x(t)), t ≥ 0, contains at least n + 1 spectral lines. Since this is difficult to verify in practice duringonline learning, a probing noise signal is typically added throughout the whole process, which can degrade systemperformance.

    Alternatively, utilizing the presented learning framework, we can mitigate system performance degradation bycollecting data at various time instants ti and including them in the history stack Λ while adding a probing noisesignal to the system and updating the weight values as given by (43). It follows from Theorem 2 that the historystack Λ attains full rank at time t = tnp , and thus, the matrix M(tnexp) will be positive definite. Thus, from t ≥ tnp ,the persistency of excitation condition is guaranteed, and hence, both the probing noise signal and the collection ofdata can be discontinued as learning continues. Although there is no specific rule on the choice of time instants tiwhere data is collected, issues of memory, computational complexity, and bandwidth must be accounted for by thedesigner.

    5 APPLICATION TO CONNECTED VEHICLE PLATOONING

    To illustrate the key ideas presented in this paper, we simulate a platoon of 4 + 1 vehicles, where the 4 forward vehiclesare human driven with different human parameter values. The system parameters are given in Table TABLE 1. To designthe proposed adaptive control law and corrective signal given by (16)-(18), we set 𝛾 = 0.8, 𝜉 = 0.8, 𝜂 = 0.8, 𝜈 = 0.8, 𝜉1 = 2,

    TABLE 1 System parameters Parameter Meaning Valuevm[m∕s] Maximum velocity 30hs[m] Stop headway 5hg[m] Headway for maximum velocity 35h∗[m] Desired headway 20v1[m] Desired velocity 15𝛼2 Headway gain of vehicle 2 0.15𝛽2 Relative velocity gain of vehicle 2 0.25𝛼3 Headway gain of vehicle 3 0.15𝛽3 Relative velocity gain of vehicle 3 0.2𝛼4 Headway gain of vehicle 4 0.25𝛽4 Relative velocity gain of vehicle 4 0.25

  • JIN ET AL. 1799

    𝜉2 = 2, 𝜉3 = 2, 𝜀 = 0.001, and R = 3I8. In this case, P satisfying (14) is given by

    P =

    ⎡⎢⎢⎢⎢⎢⎢⎢⎣

    23.6224 −6.0540 7.2074 −20.9879 −1.3591 −2.6040 −0.2206 −1.3545−6.0540 80.3395 22.9915 12.0654 7.7603 −8.2184 4.0323 −11.87007.2074 22.9915 15.7720 −4.1640 3.8360 −5.2957 2.0694 −6.7856

    −20.9879 12.0654 −4.1640 57.3733 15.5864 −0.0798 7.7480 −16.0999−1.3591 7.7603 3.8360 15.5864 11.9348 −0.8313 5.7701 −12.4968−2.6040 −8.2184 −5.2957 −0.0798 −0.8313 17.6689 6.7693 −8.4999−0.2206 4.0323 2.0694 7.7480 5.7701 6.7693 10.4745 −15.0000−1.3545 −11.8700 −6.7856 −16.0999 −12.4968 −8.4999 −15.0000 36.0060

    ⎤⎥⎥⎥⎥⎥⎥⎥⎦. (44)

    To illustrate the results of Theorem 1 with (15) replaced by (36), consider the state-dependentsensor and actuator attacks given by q = −0.75, W = [4, 3]T, and, for t ≥ 0, 𝜑(x̃(t)) =[4 sin(x1(t)) cos(x2(t)) tanh(x3(t)), 2 sin(x4(t)) cos(x5(t)) tanh(x7(t)x8(t))]T. The system performance of the controllerwithout a corrective signal is depicted in Figures 3 and 4 for position and velocity regulation, respectively. It can beseen that due to the presence of sensor and actuator attacks, the aft vehicle fails to maintain the desired formation.

    0 5 10 15 20 25 30 35 40-900

    -800

    -700

    -600

    -500

    -400

    -300

    -200

    -100

    0

    100

    FIGURE 3 Relative distance of theconnected vehicles in the presence ofsensor and actuator attacks without theproposed corrective signal [Colour figurecan be viewed at wileyonlinelibrary.com]

    0 5 10 15 20 25 30 35 40-10

    -5

    0

    5

    10

    15

    20

    25

    30

    35

    FIGURE 4 Relative velocity of theconnected vehicles in the presence ofsensor and actuator attacks without theproposed corrective signal [Colour figurecan be viewed at wileyonlinelibrary.com]

    http://wileyonlinelibrary.comhttp://wileyonlinelibrary.com

  • 1800 JIN ET AL.

    FIGURE 5 Relative distance of the connected vehicles in thepresence of sensor and actuator attacks with the proposed correctivesignal given by (36), (16)-(18) [Colour figure can be viewed atwileyonlinelibrary.com]

    0 5 10 15 20 25 30 35 40-15

    -10

    -5

    0

    5

    10

    15

    FIGURE 6 Relative velocity of the connected vehicles in thepresence of sensor and actuator attacks with the proposed correctivesignal given by (36), (16)-(18) [Colour figure can be viewed atwileyonlinelibrary.com]

    0 5 10 15 20 25 30 35 40-8

    -6

    -4

    -2

    0

    2

    4

    6

    8

    The system performance of the controller given by (13) with the proposed corrective signal is depicted in Figures 5 and 6for position and velocity regulation, respectively. The proposed adaptive control architecture achieves satisfactory systemperformance in the face of sensor and actuator attacks.

    6 CONCLUSION

    In this paper, we developed an adaptive control framework for a team of connected vehicles subject to time-invariantstate-dependent sensor and actuator attacks. The proposed decision-maker is composed of two components, a nominalcontroller and a corrective signal, and guarantees uniform ultimate boundedness of the closed-loop platoon system. Theadditive adaptive corrective signal is designed and added to the output of the nominal controller in order to suppress theeffects of the sensor and actuator attacks. Furthermore, a learning algorithm from the adaptive controller under relaxedexcitation conditions was presented. Simulations involving a platoon of vehicles under sensor and actuator attacks areprovided to show the efficacy of the proposed approach. Future research will focus on including network communicationattacks as well as incorporating learning mechanisms with attackers and drivers having different levels of rationality.Furthermore, to account for human driver delays, car following models with reaction time delay will also be addressed.Finally, adaptive control and learning architectures for nonlinear models will also be considered.

    http://wileyonlinelibrary.comhttp://wileyonlinelibrary.com

  • JIN ET AL. 1801

    ACKNOWLEDGEMENTS

    This work was supported in part by the Air Force Office of Scientific Research under Grant FA9550-16-1-0100, NSF underGrant ECCS-1501044, NATO under Grant SPS G5176, ONR under Minerva Grant N00014-18-1- 2160, and NSF underGrant CAREER CPS 1851588.

    ORCID

    Xu Jin https://orcid.org/0000-0001-9788-2051Wassim M. Haddad https://orcid.org/0000-0002-4362-3043Kyriakos G. Vamvoudakis https://orcid.org/0000-0003-1978-4848

    REFERENCES1. Levine WS, Athans M. On the optimal error regulation of a string of moving vehicles. IEEE Trans Autom Control. 1966;11(3):355-361.2. Chu KC. Decentralized control of high-speed vehicular strings. Transportation Science. 1974;8(4):361-384.3. Raza H, Ioannou P. Vehicle following control design for automated highway systems. IEEE Control Syst Mag. 1996;16(6):43-60.4. Stankovic SS, Stanojevic MJ, Siljak DD. Decentralized overlapping control of a platoon of vehicles. IEEE Trans Control Syst Technol.

    2000;8(5):816-832.5. Morbidi F, Colaneri P, Stanger T. Decentralized optimal control of a car platoon with guaranteed string stability. In: Proceedings of the

    European Control Conference; 2013; Zürich, Switzerland.6. Antsaklis P. Goals and challenges in cyber-physical systems research. IEEE Trans Autom Control. 2014;59(12):3117-3119.7. Yucelen T, Haddad WM, Feron E. Adaptive control architectures for mitigating sensor attacks in cyber-physical systems. Cyber-Phys Syst.

    2016;2(2):24-52.8. Arabi E, Yucelen T, Haddad WM. Mitigating the effects of sensor uncertainties in networked multiagent systems. ASME J Dyn Syst Meas

    Control. 2017;139(4):1-11.9. Jin X, Haddad WM, Yucelen T. An adaptive control architecture for mitigating sensor and actuator attacks in cyber-physical systems. IEEE

    Trans Autom Control. 2017;62(11):6058-6064.10. Jin X, Haddad WM, Hayakawa T. An adaptive control architecture for cyber-physical system security in the face of sensor and actuator

    attacks and exogenous stochastic disturbances. Cyber-Phys Syst. 2018;4(1):39-56.11. Massoumnia M-A, Verghese GC, Willsky AS. Failure detection and identification. IEEE Trans Autom Control. 1989;34(3):316-321.12. Blanke M, Kinnaert M, Lunze J, Staroswiecki M, Schröder J. Diagnosis and Fault-Tolerant Control. Vol 2. Berlin, Germany: Springer; 2006.13. Hwang I, Kim S, Kim Y, Seah CE. A survey of fault detection, isolation, and reconfiguration methods. IEEE Trans Control Syst Technol.

    2010;18(3):636-653.14. Pasqualetti F, Dorfler F, Bullo F. Attack detection and identification in cyber-physical systems. IEEE Trans Autom Control.

    2013;58(11):2715-2729.15. Fawzi H, Tabuada P, Diggavi S. Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Trans Autom

    Control. 2012;59(6):1454-1467.16. Weimer J, Bezzo N, Pajic M, Pappas GJ, Sokolsky O, Lee I. Resilient parameter-invariant control with application to vehicle cruise control.

    In: Control of Cyber-Physical Systems. Berlin, Germany: Springer; 2013:197-216.17. Schenato L, Sinopoli B, Franceschetti M, Poolla K, Sastry SS. Foundations of control and estimation over lossy networks. Proc IEEE.

    2007;95(1):163-187.18. Gupta A, Langbort C, Basar T. Optimal control in the presence of an intelligent jammer with limited actions. Paper presented at: 49th

    IEEE Conference on Decision and Control (CDC); 2010; Atlanta, GA.19. Sou KC, Sandberg H, Johansson KH. On the exact solution to a smart grid cyber-security analysis problem. IEEE Trans Smart Grid.

    2013;4(2):856-865.20. Kosut O, Jia L, Thomas RJ, Tong L. Malicious data attacks on the smart grid. IEEE Trans Smart Grid. 2011;2(4):645-658.21. Kim TT, Poor HV. Strategic protection against data injection attacks on power grids. IEEE Trans Smart Grid. 2011;2(2):326-333.22. Chowdhary G, Johnson E. Concurrent learning for convergence in adaptive control without persistency of excitation. Paper presented at:

    49th IEEE Conference on Decision and Control (CDC); 2010; Atlanta, GA.23. Adam S, Busoniu L, Babuska R. Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern Part C Appl

    Rev. 2012;42(2):201-212.24. Ioannou PA, Sun J. Robust Adaptive Control. Upper Saddle River, NJ: Prentice Hall; 1996.25. Jin X, Haddad WM, Jiang Z-P, Vamvoudakis KG. Adaptive control for mitigating sensor and actuator attacks in connected autonomous

    vehicle platoons. Paper presented at: IEEE Conference on Decision and Control (CDC); 2018; Miami Beach, FL.26. Jin IG, Orosz G. Dynamics of connected vehicle systems with delayed acceleration feedback. Trans Res C Emerg Technol. 2014;46:46-64.27. Gao W, Jiang ZP, Ozbay K. Data-driven adaptive optimal control of connected vehicles. IEEE Trans Intell Trans Syst. 2017;18(5):1122-1133.28. Teixeira A, Shames I, Sandberg H, Johansson KH. A secure control framework for resource-limited adversaries. Automatica.

    2015;51:135-148.

    https://orcid.org/0000-0001-9788-2051https://orcid.org/0000-0001-9788-2051https://orcid.org/0000-0002-4362-3043https://orcid.org/0000-0002-4362-3043https://orcid.org/0000-0003-1978-4848https://orcid.org/0000-0003-1978-4848

  • 1802 JIN ET AL.

    29. Lucia W, Sinopoli B, Franze G. A set-theoretic approach for secure and resilient control of cyber-physical systems subject to false datainjection attacks. Paper presented at: Science of Security for Cyber-Physical Systems Workshop (SOSCYPS); 2016; Vienna, Austria.

    30. Manandhar K, Cao X, Hu F, Liu Y. Detection of faults and attacks including false data injection attack in smart grid using Kalman filter.IEEE Trans Control Netw Syst. 2014;1(4):370-379.

    31. Haddad WM, Chellaboina V. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton, NJ: Princeton UniversityPress; 2008.

    32. Lavretsky E, Wise K. Robust and Adaptive Control With Aerospace Applications. London, UK: Springer; 2012.33. Polycarpou MM, Ioannou PA. A robust adaptive nonlinear control design. Automatica. 1996;32(3):423-427.34. Yucelen T, Haddad WM. Low-frequency learning and fast adaptation in model reference adaptive control. IEEE Trans Autom Control.

    2013;58(2):1080-1085.35. Lewis FL, Jagannathan S, Yesildirak A. Neural Network Control of Robot Manipulators and Nonlinear Systems. London, UK: Taylor &

    Francis; 1999.36. Boutayeb H, Darouach M. Recursive identification method for MISO Wiener-Hammerstein model. IEEE Trans Autom Control.

    1995;40(2):287-291.37. Modares H, Lewis FL, Naghibi-Sistani M-B. Adaptive optimal control of unknown constrained-input systems using policy iteration and

    neural networks. IEEE Trans Neural Netw Learn Syst. 2013;24(10):1513-1525.38. Vamvoudakis KG, Miranda MF, Hespanha JP. Asymptotically stable adaptive–optimal control algorithm with saturating actuators and

    relaxed persistence of excitation. IEEE Trans Neural Netw Learn Syst. 2016;27(11):2386-2398.39. Johnson M, Bhasin S, Dixon WE. Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. Paper

    presented at: 50th IEEE Conference on Decision and Control and European Control Conference; 2011; Orlando, FL.

    How to cite this article: Jin X, Haddad WM, Jiang Z-P, Kanellopoulos A, Vamvoudakis KG. An adaptivelearning and control architecture for mitigating sensor and actuator attacks in connected autonomous vehicleplatoons. Int J Adapt Control Signal Process. 2019;33:1788–1802. https://doi.org/10.1002/acs.3032

    https://doi.org/10.1002/acs.3032

    An adaptive learning and control architecture for mitigating sensor and actuator attacks in connected autonomous vehicle platoonsAbstractINTRODUCTIONVEHICLE PLATOON MODELADAPTIVE CONTROL FOR STATE-DEPENDENT SENSOR AND ACTUATOR ATTACKSADAPTIVE LEARNING UNDER RELAXED EXCITATION CONDITIONSAPPLICATION TO CONNECTED VEHICLE PLATOONINGCONCLUSIONREFERENCES

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth 8 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /FlateEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /PDFX1a:2001 ] /PDFX1aCheck true /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError false /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (Euroscale Coated v2) /PDFXOutputConditionIdentifier (FOGRA1) /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /False

    /CreateJDFFile false /Description