Fault Tolerance in Automotive Systems_report
-
Upload
jagat-rath -
Category
Documents
-
view
214 -
download
0
Transcript of Fault Tolerance in Automotive Systems_report
-
7/24/2019 Fault Tolerance in Automotive Systems_report
1/10
1
Fault Tolerance in Automotive SystemsAdithya Hrudhayan Krishnamurthy, Ramkumar Ravikumar
Department of Electrical and Computer EngineeringUniversity of Wisconsin, Madison
{akrishnamur3, ravikumar}@wisc.edu
Abstract - Design of fault tolerant electronics hasbecome a standard requirement in the automotivesector these days. These systems increases the overall
automotive and passenger safety by liberating thedriver from handling routine tasks and also assistingthe driver during critical situations. In this paper wepresent exhaustive information about some of thecommonly used fault tolerant design techniques in theautomotive domain. We start off by analyzing X-by-wire systems, which are fault tolerant distributedsystems that are fail-operational and can maintain a
reliable state all the time. A case study on Steer-by-wireis included. Fault tolerance techniques used in thedesign of automotive software and how they help inimproving the overall reliability and dependability ofthe system is investigated. Common design techniquesused in the design of fail-safe Sensors and actuators ispresented. We conclude the paper with a section on the
design of automotive communication systems and
protocols and their ability to ensure reliablecommunication between various ECU's in the vehicle.
1. IntroductionAdvancements in the field of automotive electronics
have helped in realizing the potential of sophisticated
vehicular control systems. In addition to liberatingthe driver from routine tasks, such systems assist thedriver during critical situations, thereby enhancing
vehicular safety and performance. Among these
systems, X-by-Wire systems (where driving, steeringand braking are electronically controlled) have
provided feasible electronic and electromechanical
solutions resulting in enhanced fault tolerance and
reliability. Traditional mechanical and hydraulic
systems employed in automotive and aviation
systems are being replaced by electronic control
systems such as X-by-Wire systems. A current
premium car, for instance, implements about 270functions a user interacts with, deployed over about
70 embedded platforms. Altogether, the software
amounts to about 100 MB of binary code. Ensuring
fault tolerance in automotive software is an activearea of research. Safety-critical systems such as X-
by-wire systems and most ECUs typically use a lot of
sensors for performing their functions. Hence sensors
and actuators, which form the backbone of most
commonly used electronic systems, need to be fault
tolerant as well. Major automotive subsystems such
as chassis, air-bag, powertrain, body and comfort
electronics, diagnostics, x-by-wire, multimedia and
infotainment, and wireless rely on automotive
communication systems. Fault tolerantcommunication systems are built so they are tolerant
to defective circuits, line failures etc., and
constructed using redundant hard- and software
architectures to ensure reliable communication
between different sub-systems in a vehicle. In this
paper we present exhaustive information about fault
tolerance techniques used in the automotive industry.
2. X-by Wire in Automotive SystemsFollowing the aviation industry, the automotive
industry took to X-by-Wire systems, called Drive-
By-Wire. In the automotive environment, X
denotes the commanded action such as accelerating,braking or steering. Drive-by-Wire systems sense
driver requests and translate them into optimum steer,
brake, and acceleration manoeuvres. X-by-Wire
systems can be classified into Brake-by-Wire,
Throttle-by-Wire and Steer-by-Wire systems
depending upon the commanded action. The
following sections discuss Brake-by-Wire andThrottle-by-Wire in a succinct manner. Later, a
section has been devoted to the detailed study ofSteer-by-Wire systems.
2.1. Brake-by-WireBrake-by-Wire (BBW) systems are realized through
electro-mechanical actuators and communication
networks, instead of conventional hydraulic devices.
It offers enhanced safety, cuts off cost associated
with manufacture and maintenance of mechanical
brakes and brake fluids. It also eliminates
environmental concerns caused by hydraulic systems.There are two ways to realize a BBW system. On one
hand, the system is based on the traditional hydraulic
brake system. The by-wire function is realizedthrough hydraulic pumps and additional electric
controlled valves Electric Hydraulic Brake (EHB)[24]. In an EHB system, a hydraulic backup can be
realized with the help of valves. Once a fault is
detected, a direct hydraulic brake circuit will be
closed. On the other hand the brake-by-wire system
based on electric mechanical actuators is called as
Electric Mechanical Brake (EMB). In an EMBsystem the brake force and brake control is realized
by electric components [11]. Since a hydraulic brake
cannot be realized, the system must be extremely
-
7/24/2019 Fault Tolerance in Automotive Systems_report
2/10
2
reliable. BBW systems enable simple integration of
vehicle traction and stability control.
2.2. Throttle-by-Wire
Conventional throttle systems consist of a cablerunning from the gas pedal into the throttle body.This cable slides within a housing as it winds its wayaround various components. Such a system is
relatively bulky and prone to wear. Automotive
manufacturers have implemented a new means of
throttle control known as Throttle-by-Wire. Throttle-
by-Wire consists of a sensor providing pedal
position. The data acquired by the sensor is sent to
the Engine Control Module (ECM) that determines
the parameters to change. The ECM coordinates
components such as Anti-lock Braking System
(ABS), gear selection, fuel and air intake, andtraction control. This embedded intelligence results inincreased fuel efficiency, reduced emissions,improved performance, and reduced frictional losses.
Throttle-By-Wire allows the engine computer tointegrate torque management with traction control
and stability control.
3. Constraints for the Design of X-by-Wire
Systems
3.1. Power ConstraintThe challenge faced during the implementation of X-
by-Wire systems in vehicles has been to
accommodate such systems amidst the 14 V powernets in automobiles. The number of electricalcomponents in cars has steadily increased in recentyears. To guarantee the proper functioning of all
electrical components, a stable voltage supply isnecessary. The automotive industry considered
migrating to 42 Volt systems. With 42 Volts in the
system, vehicles could use thinner-gauge wires and
smaller motors because higher voltage would mean
reduced amperage (Which dictates the size of the
wire).
3.2. Real Time ConstraintX-by-Wire systems are intrinsically real-timedistributed systems. They implement complexmultivariable control laws and deliver real-time
information to intelligent devices that are physically
distant (for example, the four wheels). They have
stringent time constraints, such as a sampling period
of only a few milliseconds [26]. Occasional absence
of samples or out-of-bound delays at the controller or
actuator level, for instance, due to frame loss, does
not necessarily lead to vehicle instability, but
degrades steering performance (or Quality ofService). This is because most of the control lawsthat are used are designed with specific delay and/orabsence of sampling data compensation mechanisms.
End-to-end response times, such as the time between
a request from the driver and the response of the
physical system, must be bounded, typically lower
than a few tens of milliseconds. An excessive end-to-
end response time of a control loop may not only
induce performance degradation but also cause the
instability of the vehicle.
2.3. Dependability ConstraintsFor a critical X-by-Wire system, the following must
be ensured - A system failure does not lead to a state
in which human life, economics, or environment is
endangered; a single failure of one component must
not lead to the failure of the whole X-by-Wire system
[26]. Also, the system must be able to tolerate at least
one major critical fault without loss of functionality
for a long time to reach a safe parking area. The X-
by-Wire system should also offer the same
availability and maintainability as theirmechanical/hydraulic counterparts.
4. Case Study: Steer-by-Wire
Traditionally, vehicle wheels have been turned by adirect mechanical linkage between the steering
wheel, steering gears and actual wheel. In such a
system, the driver turns the steering wheels to request
the steering gears to turn the vehicle wheels. The
feedback of the torque encountered by the steering
system as the wheels are turned is provided to the
driver through mechanical linkages. This torquefeedback is critical as it provides the driver with asense of the road conditions, such as the tractionexperienced by the wheels with the road surface.
Steer-by-Wire systems eliminate mechanical
components between the steering wheel and turning
wheels. The system aims to control the wheeldirection according to the driver's request and provide
a mechanical-like force feedback to the hand wheel.
The following sections deal with the functionality of
the architecture, the real-time constraints imposed on
the system, and the protocol used by the nodes to
communicate between each other.
4.1. Functional Description and OperationalArchitecture of Steer-by-Wire SystemThe two critical services provided by the Steer-by-
Wire system involve the front axle actuation and the
hand wheel force feedback [26].
Front Axle Control - This function computes theorders that are given to the motor of the front axle,
based on the state of this front axle and the
commands given by the driver through the handwheel. The driver's requests are translated depending
on the hand wheel angle, torque and speed.Hand Wheel Force Feedback - This function
computes the orders provided to the hand wheel
motor based on the speed of the vehicle, front axleposition and the front tie rod force.
An operational architecture for the functions
mentioned above is realized - The operational
-
7/24/2019 Fault Tolerance in Automotive Systems_report
3/10
3
architecture includes four ECUs (micro-controllers)
named, respectively, HW ECU1 (Hand Wheel
ECU1), HW ECU2 (Hand Wheel ECU2), FAA
ECU1 (Front Axle Actuator ECU1), and FAA ECU2
(Front Axle Actuator ECU2). Each node is connectedto the two TDMA-based communication channels(BUS1 and BUS2). Finally, three sensors, named as1,as2 and as3, placed near the hand wheel measure the
requests of the driver in a similar way, the latter
being translated into a 3-tuple . Three other
sensors, named rps1, rps2 and rps3, are dedicated to
the measurement of the front axle position. Finally,
two motors (FAA Motor2 and FAA Motor2),
configured in active redundancy, act on the front axle
while two other motors (HW Motor1 and HW
Motor2) realize the force feedback control on thehand wheel. Sensors as1, as2 and as3 (respectively,sensors rps1, rps2 and rps3) are connected by point-to-point links, both to HW ECU1 and HW ECU2
(respectively, FAA ECU1 and FAA ECU2).
Implementation of the Front axle control function -
The requests of the driver are measured by the three
replicated sensors as1, as2 and as3 and sent to both
HW ECU1 and HW ECU2. Each ECU performs a
majority vote on the 3 received values and transmits
the data on both communication channels BUS1 andBUS2. The two ECUs, FAA ECU1 and FAA ECU2,
placed behind the front axle, consume this data, aswell as the last wheel position, in order to elaborate
the commands that are to be applied to FAA Motor 1
and 2.
Implementation of the Force feedback controlfunction- In a way similar to the previous function,
measurements taken by rps1, rps2 and rps3 are
transmitted both to FAA ECU1 and FAA ECU2.
Each of these ECUs elaborates information
transmitted on the network. The consumers of this
information are both HW ECU1 and HW ECU2which compute the command transmitted to HWMotor 1 and 2.
4.2. Fault Classification and RedundancyFault Classification A Byzantine fault is a fault
whose effect can be perceived differently by different
observers. A Coherent faults effect is seen the sameby all observers. A Transient fault is a fault whose
duration is such that the system does not reach a not-
safe-state. A Permanent fault is a non-transient one.
The property of fail silence, assumed for some
components, leads to another class of faults. Acomponent is said to be fail silent if, each time it is
not silent, we can conclude that it is functioningproperly.
ECU redundancy - Two functions need to be
implemented in ECU: Front Axle Control and Force
Feedback Control. To avoid costly and numerous
wires, ECUs have to be placed close to the sensors,
and communication between ECUs has to be
multiplexed.
Fig 1: Steer-by-Wire operational architecture
Dependability analyses are generally based on astrong hypothesis assuming that, in the whole system,
n simultaneous component failures can never occur
for any set of redundant components. Lamport statesthat 3n+1 redundant components are necessary to
tolerate n Byzantine faults [29]. In order to toleraten coherent faults, it is sufficient to have 2n+1
redundant components. According to the rule given
by Lamport, the minimum number of redundant
Hand Wheel ECUs (respectively, Front Axle ECUs)
should be 4. This solution is mainly used in the
aeronautic domain. Therefore a classical solution incost prohibitive situations is to use Fail-Silent ECUs.
In this case, only two Hand Wheel ECUs
(respectively, Front Axle ECUs) are necessary.
Hand Wheel Sensor and Actuator redundancy - A
Hand Wheel sensor produces information for two
Hand Wheel ECUs. Three Hand Wheel sensors arenecessary for ensuring that each Hand Wheel ECU,
assumed to provide a voting algorithm, is able to
tolerate one Byzantine fault (and subsequently one
coherent fault or one fail-silent sensor). A single
actuator can take charge of piloting the Front Axleand it is assumed that an actuator can never wrongly
apply an order received by a Front Axle ECU.
Considering the fail silence property of a Front AxleECU, only 2 couples (Front Axle ECU, actuator) are
necessary for the tolerance of, at most, one fault.
-
7/24/2019 Fault Tolerance in Automotive Systems_report
4/10
4
If the chosen fault-tolerance strategy is failure
recovery, redundant ECUs will work only in the case
of the primary ECU failing. Failure detection must be
quick and reliable. Otherwise, if the strategy is failure
compensation, redundant ECUs will work in parallel.Because of the stringent real-time constraint, ourarchitecture must provide failure compensation.Redundancy of identical ECUs does not prevent the
architecture from common mode failures: the
hardware of redundant ECUs should be furnished by
different suppliers and their software realized by
different teams.
4.3. Communication Protocol (TTP/C)TTP/C is a protocol based on the TTP (Time-
Triggered Protocol) and implemented so that it meets
the SAE requirements for a class C automotiveprotocol. Class C protocols are suitable for highspeed single failure operational safety criticalapplications. A principal reason that TTP/C is the
first protocol to qualify as Class C, is that theprevious protocols are all event triggered. Event-
triggered systems are susceptible to several serious
failure modes such as the babbling idiot failure.
TTP/C is a Time Division Multiple Access (TDMA)
protocol in which periodic time slots are assigned to
individual processing nodes in a system. The braking
unit has its own TTP/C nodes, each of which isreplicated as a Fault Tolerant Unit (FTU).
Node Architecture:System nodes in a TTP/C system
consist of a Host, a Controller Network Interface
(CNI), and a TTP/C communications controller. The
Host runs the application software for the relevantsystem function (for instance, the control software for
the braking system in an automobile). The CNI
stands between the Host and the TTP/C controller,
effectively de-coupling the applications-level
software from the network. Within the CNI is a
Message Descriptor List, which contains informationcontrolling bus access, and a data sharing interfacewhich is typically implemented with dual port RAM(allowing the Host and the TTP/C controller to access
the shared memory independently). The TTP/C
communications controller provides the actual
connection between the TTP/C node and the shared
network. The controller supports the protocol withseveral essential services such as guaranteed
transmission times with minimal latency, jitter, fault-
tolerant clock synchronization, and error detection.Scheduling and State Messages: The system
scheduling in TTP/C protocol is static. The points intime in which the various nodes in a system are
authorized to transmit form the lattice points of aTTP/C action lattice. The time difference between
two adjacent points in the action lattice represents the
basic cycle time of the system, and sets a lower
bound on the response time of the system.
Fig 2: Communication Subsystem with FTUs
Under TTP/C, nodes generate a state message in each
TDMA round, that are posted to the CNI by theTTP/C host for transmission over the system
network. Nodes receiving messages do not respond
with a receipt acknowledgement.Clock Synchronization: A commonly maintained
keeper of global time is thus critical to the proper
operation of the protocol. Within a cluster of TTP/C
nodes, all nodes are aware of which node has access
to the bus during a specified time slot, given the a
priori scheduling allocation. By noting the time when
messages are received from other nodes (TTP/C is abroadcast protocol, so all nodes hear all messages)
with the known schedule, a node can calculate thedifference between the clock of the sending node and
its own clock.
Composability: TTP/C supports a robust level of
composability the capability to carry a thoroughlytested subsystem into a larger system, and to be able
to depend on the subsystem retaining the same
characteristics that it demonstrated in isolation [32].
In the auto industry, this feature potentially allows
the rapid integration of components provided bymultiple suppliers into a larger framework, without
the need to perform extensive system integrationtuning and testing.
Reliability and Fault Tolerance: Several aspects of
the TTP/C protocol, and the way it is implemented in
real-world systems, serve to provide reliability and
fault-tolerant behavior.a. Membership:The role of the membership service
component of a real-time protocol is to inform all
system nodes of the failure of a node with minimal
delay [32]. Under TTP/C, a node membership field,
maintained as a status register in the CNI, contains aso-called node membership vector. This node
membership vector contains one bit for every node ina TTP/C cluster, with the bit set to True if the node in
question is operating correctly and to False if the
node is not operating or is flawed. The node
membership vector is updated by checking that the
expected messages from other nodes in the cluster are
-
7/24/2019 Fault Tolerance in Automotive Systems_report
5/10
5
received, and by analyzing the cyclic-redundancy
check (CRC) fields in the messages received.b. Fail-Silence:TTP/C nodes are designed to detect
faults in their own operation. The principle of
operation is that, each and every node must deliverresults which are correct in both the value and thetime domain, or no results at all. If a node detects anabnormality in its operation, it switches itself off. At
a software level, TTP/C supports, for example, a
variety of techniques, including double execution,
double execution with reference checks, validity
checks, assertion checking, and signature checks.c. Bus Guardian: The Bus Guardian (BG) is a
hardware element of the TTP/C Controller which
serves as a portal to the system bus. The key role of
the Bus Guardian is to enable the bus driver only
during the transmission slot for its node, andotherwise to guarantee that the bus driver is in adisabled state. This serves to prevent babbling idiotfailures [32].
d. Replication of System Components: The TTP/Cprotocol supports replication of the hardware
elements of a node, dual system buses and error
detection [32].
5. Fault tolerance in Automotive SoftwareImproving software fault tolerance is a common
interest for aeronautics, railway and automotivesoftware-based systems. However, the automotivecontext meets more stringent economical constraintsand resource limitations, due to higher volume of
vehicle production [1]. The amount of software has
evolved from zero to tens of millions of lines of code.
Today, more than 80% of the innovations in a carcome from computer systems; software has thus
become a major contributor to the value of
contemporary cars [4]. A study made by Mercer
Management Consulting and Hypovereinsbank in
2001 claims that the total value of software in cars
will rise from 4% to 13% by 2010 [3] and digitalhardware and software is expected to account for upto 30% of the overall cost of a car [5].
5.1. Automotive Software ClassificationAutomotive software is very diverse, ranging from
entertainment and office-related software to safety-
critical real-time control software. It can be clusteredaccording to application area and the associated
nonfunctional requirements. The following five
clusters are usually distinguished [4] as Multimedia,
telematics, and HMI software, body/comfort
software, software for safety electronics,power trainand chassis control software and infrastructure
software. Automotive software development posesgreat challenges to automotive manufacturers since
an automobile is inherently distributed and subject to
fault-tolerance and real-time requirements. Reliability
and robustness for automotive software is a critical
requirement of ECU software, so that fault tolerance
mechanisms can handle detected faults locally
without propagation to other SW-components [2].
5.2. Current ApproachesA fault-tolerant architecture based on computationalreflection is proposed [1]. The reflection paradigmfor fault-tolerance purpose relies on the ability of a
system to check and to correct itself in a separate
abstraction level. The software architecture is divided
into two parts (functional and defense software) that
interact together through an interface. It is assumed
that the defense software has enough knowledge of
the structure and expected behavior of functional
software, to control it. The defense software detects
errors by checking safety properties and performs
recovery using generic instrumentation andinfrastructure functionalities. As failures can impact
both data and control, the failure model is structuredinto two parts: data flow and control flow. Critical
control flow failures can disrupt control events,sequence of execution and execution time. Critical
data flow failures can affect both value and timing in
the system. The runtime behavior of a system is
described as a sequence of scheduled entitiesthat are
triggered by events and generate triggering events.Control flow of a scheduled entity relates to the
control events starting or stopping its execution.These events are produced by the environment orother entities. In parallel, data flow of a scheduledentity corresponds to the input data it consumes and
the output data it produces during its execution.
The Defense software is organized around loggingtables and three types of services that control
information logging, error detection and error
recovery. Logging or tracing are mechanisms often
needed for debugging and diagnosis issues. The
logging strategy has to select rigorously thenecessary and sufficient critical information to get atruntime, according to fault tolerance concerns. Thelogging architecture is organized into severalbracket-tables that are updated and used at runtime.
Each table is associated with a dedicated logging
routine that uses preferably existing infrastructure
services to get information. Once application-specific
safety properties are specified, the correspondingerror detection routine is developed as an executable
assertion. An assertion is verified at runtime within a
corresponding checking routine. When an error is
detected, the checking routine triggers a recovery
routine.
At the application level, once an error is detected, theapplication is turned into a safe state. Degraded data
can recover to their valid values and new
communication request may arise for missed data
acknowledgement. At the infrastructure level,
-
7/24/2019 Fault Tolerance in Automotive Systems_report
6/10
6
recovery actions on control flow include reset,
terminate and restart a task or a set of OS objects. In
the proposed fault-tolerant architecture, eachchecking routine is associated with one or morerecovery routines that call available executiveservices and update logging tables, if necessary. Therecovery action depends on the detected error. Twotypes of software instrumentation considered are
hooksand basic services. Hooks are the means to tie
up defense software to functional software, and to
insert code. Basic services play the role of software
sensors and actuators. Fault injection techniques are
used to measure fault-tolerance coverage, and to
detect remaining software errors of defense software.
The paper, however, does not mention about the
amount of data that can be handled.
An alternate method for ensuring fault tolerance inautomotive software is proposed in [6]. Since it might
be difficult for application programmers to
implement fault tolerance features by themselveswhile designing automotive software, it is desirable
that the fault tolerance is provided by the middleware
and hidden from application programmers.
Middleware is a software layer that connects and
manages application components running on
distributed hosts. It exists between network operating
systems and application components. Middlewarehides and abstracts many of the complex details ofdistributed programming from applicationdevelopers. Figure 3 shows the structure of the
proposed middleware for automotive systems. At the
top level, there exists the publish/subscribe interface
that is used by application components. Beneath theinterface, there are QoS Configuration module for
specifying real-time requirements, Resource
Allocation module for guaranteeing timeliness
operations, and Clock Synchronization module for
providing a global time base. All incoming and
outgoing messages pass through Fault-ToleranceLayer in order to guarantee reliability. Finally,messages are transmitted and received by TransportLayer.
Fig. 3: Structure of proposed middleware
Because software is very flexible and relatively
cheap, it is a very desirable medium forimplementing fault tolerance mechanisms [7].
Redundancy in software can be accomplished
through redundant computation and redundant
storage. The most common computational
redundancy technique involves the use of a software
controlled watchdog timer. As the program executes,
the timer is periodically reset by writing a new value
into the timer register. In the event of a failure, the
watchdog will restart the processor from a re-entrypoint in the code. This technique is particularly usefulfor avoiding deadlock conditions due tocommunication failures. Another simple technique
for data redundancy is complement data write. Rather
than simply duplicating data, it is complemented first.
This adds in some additional fault tolerance by
providing the means to detect stuck-at-faults. As
duplicating every piece of data might be expensive,
often only safety-critical date is duplicated [8].
Corruption of the instruction memory can be detected
and tolerated by duplicating the program in memory
and executing each copy in sequence. RedundantOrthogonal Coding (ROC), a technique similar to n-version programming, can be used to increasereliability [8]. The difference is that in ROC,
different algorithms are intentionally used to performthe same calculation. This addresses the problem of
n-version programming where different people tend
to design programs for a given tasks in very similar
ways and also to make similar mistakes in the design
by explicitly ensuring dissimilarity. This technique
may also catch programming errors. Most of the error
detection techniques focus on memory integrity. Acommon technique is to use checksums to verify thecontents of the memory [8]. Another technique todetermine that the system memory is still working
correctly is to test the RAM by writing various
patterns of ones and zeros into every bit in memory.
Another common error detection technique is to useassertion tests inside a program. Using assertion
checks can catch programming errors as well as
errors arising from unusual conditions.
Currently, the amount of error diagnosis and error
recovery in cars is rather lightweight [9]. In the CPUssome error logging takes place, but there is neitherconsideration nor logging of errors at the level of thenetwork and the functional distribution. There is no
comprehensive error diagnosis and no systematic
error recovery beyond individual CPUs. Failure
logging to the end of better diagnosis for
maintenance has emerged as a relevant researchproblem. There are some fail-safe and graceful
degradation techniques found in cars today, but a
systematic and comprehensive error treatment is
missing. With the upcoming multi-core controllers
for embedded applications, an interesting area forresearch is how this can be exploited also for
redundancy/recovery strategies. Several keys areas ofresearch have been identified in [4]. Reliability and
safety concerns are important for all functions
relevant to driving, from engine control and
passenger safety functions to X-by-wire functions
-
7/24/2019 Fault Tolerance in Automotive Systems_report
7/10
7
where mechanical transmission is replaced by
electrical signals.
6. Fault tolerant actuators and sensors
Safety-critical automotive applications, such as steer-by-wire systems, are in most cases control systemswith hard real time requirements. Such systemstypically have a number of sensors (inputs) [12]
connected to them, whose values are processed in
order to produce the control actions. Consequently
the sensors are the first in the flow of information and
control computations rely of these values. Therefore
it is important that the sensors can be trusted.
Actuators are essential for the reliable operation of
various components in the automobile including
brakes, valves and cylinders.
6.1. Fault tolerant sensorsA fault-tolerant sensor configuration should be atleast fail-operational for one sensor fault [11]. This
can be obtained by hardware redundancy with thesame type of sensors or by analytical redundancy
with different sensors and process models. Sensor
systems with static redundancy are realized with a
triplex system and a voter. A configuration with
dynamic redundancy needs at least two sensors and
fault detection for each sensor. The fault detection
can be performed by self-tests.
Fig. 4: Triplex system with static redundancy and duplex systemwith dynamic redundancy
Depending on the importance of the sensed quantity,additional sensors may or may not be needed to
obtain the required dependability at the system-level.
This type of sensors requires some form of sensor-
internal redundancy in the form of a built-in self-test
(BIST) and/or internal replication. If a particular
sensor is not fail-silent, it can be replicated in order to
obtain the fail-silent property at the system level. In
this case, the values of the sensors are collected and
analyzed by an intelligent unit that makes a decisionof which value to use in further calculations. Therequired degree of replication is dependent on thecriticality of the sensor. If knowledge of the sensedquantity is critical, sensor triplication is necessary.
However, if a lack of knowledge about the sensed
quantity is acceptable, dual redundancy is sufficient.
A prototype of a steering angle sensor is
demonstrated in [10]. By extension to a diverse four-
sensor system (there are six pairs of sensor elements),
the steering angle sensor is fault tolerant since it is
able to tolerate the loss of one or two sensor elements
and to diagnose the failed sensor elements. Hardware
and/or Software Instrument Fault Detection, Isolation
and Accommodation (IFDIA) schemes are finding
more and more applications in automotivemeasurement and control systems. A scheme fordetection, location and accommodation of faults is
presented in [13]. This scheme was designed to
identify and accommodate some kinds of faults that
may affect manifold pressure, crankshaft speed and
throttle valve angle position sensors. It is reported
that the realized scheme is able to identify and
accommodate also small faults in all considered
sensors.
6.2. Fault tolerant actuators
Actuators generally consist of different parts: inputtransformer, actuation converter, actuationtransformer, and actuation element. The actuationconverter converts one type of energy (e.g., electrical
or pneumatic) into another (e.g., mechanical orhydraulic). Fault-tolerant actuators can be designed
by using multiple complete actuators in parallel, with
either static redundancy or dynamic redundancy with
cold or hot standby [11]. Another possibility is to
limit the redundancy to parts of the actuator that have
the lowest reliability. To achieve fault tolerant
control either the actuator must not be a single pointof failure or it has to be fault tolerant. For functionswithout inherent redundancy, actuators must bereplicated [12]. When only two actuators are used,
the failure of one actuator must not affect operation
of the remaining unit. Sensors to continuously
monitor the actuator behavior (e.g. injector current,motor current, motion, force, torque, etc) are
typically used. When these sensors indicate a serious
actuator failure, the power to the actuator should be
switched off. As cost and weight generally are higher
for them than for sensors, actuators with fail-
operational duplex configuration are preferred [11].Static redundant structures, where both parts operatecontinuously or dynamic redundant structures withhot or cold standby can be chosen. For dynamic
redundancy, fault-detection methods for the actuator
parts are required.
A prototype of an actuator is demonstrated in [10].Typical electromagnetic faults such as Winding open
circuit, winding short circuit may occur within an
actuator. To develop one single drive that can
continue to operate with any one of these faults, it
became clear that the most successful designapproach was to use a multiple phase drive in which
each phase may be regarded as a single module. Theoperation of any one module must have minimal
impact upon the others, so that in the event of that
module failing the others can continue to operate
unaffected. When both sensor and actuator failures
-
7/24/2019 Fault Tolerance in Automotive Systems_report
8/10
8
occur at the same time, their mutual effects on
residuals make fault isolation difficult [14]. A
hexadecimal decision table to relate all possible
failure patterns to the residual code has been
proposed in [14]. Detection and isolation of multiplesensor and actuator failures in automotive engines isachieved. Simulation and experimental resultsindicate that the proposed diagnostic system not only
can be applied to cases where all failures occur in the
same sector, but is also appropriate for isolating
multiple failures occurring simultaneously in sensors
and actuators.
7. Fault tolerant Automotive Communication
Systems
The specific requirements of the different car
domains have led to the development of a largenumber of automotive networks such as LIN, J1850,CAN, TTP/C, FlexRay, media-oriented systemtransport, IDB1394, etc. One of the important
requirements of an automotive communicationsystem is fault-tolerance [16]. Fault tolerant
(typically safety-critical) communication systems are
built so they are tolerant to defective circuits, line
failures etc., and constructed using redundant hard-
and software architectures.
7.1. Event triggered vs. Time triggered SystemsThere are two main paradigms for communications inautomotive systems [15]: time triggered and eventtriggered. Event triggered means that messages are
transmitted to signal the occurrence of significant
events (e.g., a door has been closed). In this case, the
system possesses the ability to take into account, asquickly as possible, any asynchronous events such as
an alarm. Event-triggered communication is very
efficient in terms of bandwidth usage since only
necessary messages are transmitted. In time triggered
systems, frames are transmitted at predetermined
points in time, which is well suited for the periodictransmission of messages as required in distributedcontrol loops. Each frame is scheduled fortransmission at one predefined interval of time,
usually termed a slot, and the schedule repeats itself
indefinitely. As the frame scheduling is statically
defined, the temporal behavior is fully predictable;
thus, it is easy to check whether the timingconstraints expressed on data exchanges are met.
7.2. Controller Area Network (CAN)
CAN (Controller Area Network) is the most widely
used in-vehicle network. It was designed by Bosch inthe mid 80's for multiplexing communication
between ECUs in vehicles and thus for decreasing theoverall wire harness: length of wires and number of
dedicated wires. CAN on a twisted pair of copper
wires became an ISO standard in 1994 and is now a
de-facto standard in Europe for data transmission in
automotive applications, due to its low cost, its
robustness and the bounded communication delays.
CAN has several mechanisms for error detection
[17]. For instance, it is checked that the CRCtransmitted in the frame is identical to the CRCcomputed at the receiver end, that the structure of theframe is valid and that no bit-stuffing error occurred.
Each station which detects an error sends an "error
tag" which is a particular type of frame composed of
6 consecutive dominant bits that allows all the
stations on the bus to be aware of the transmission
error. CAN possesses some fault-confinement
mechanisms aimed at identifying permanent failures
due to hardware dysfunctioning at the level of the
micro-controller, communication controller or
physical layer. The scheme is based on error countersthat are increased and decreased according to
particular events. The main drawback is that a nodehas to diagnose itself, which can lead to the non-
detection of some critical errors. Without additionalfault-tolerance facilities, CAN is not suited for
safety-critical applications such as future X-by-Wire
systems [17]. For instance, a single node can perturb
the functioning of the whole network by sending
messages outside their specification (i.e. length and
period of the frames). A framework to provide
selective fault-tolerance for messages with variousfault-tolerance requirements scheduled on CAN is
proposed in [19]. The set of messages are analyzedoff-line and scheduling attributes are provided that
ensures feasible transmission of messages as well as
retransmissions upon error occurrences that satisfy
the fault-tolerance requirements.
7.3. Time-Triggered CAN (TTCAN)TTCAN uses the CAN standard but, in addition,
requires that the controllers have the possibility to
disable automatic retransmission of frames upon
transmission errors and to provide the upper layerswith the point in time at which the first bit of a framewas sent or received. The key idea is to propose aflexible time-triggered/event-triggered protocol.
TTCAN defines a basic cycle as the concatenation of
one or several time-triggered (exclusive) windows
and one event-triggered (arbitrary) window. Though
TTCAN is built on a well-mastered and low-costtechnology, CAN, does not provide important
dependability services such as the bus guardian,
membership service and reliable acknowledgment
[17]. It does not provide the same level of fault
tolerance as TTP and FlexRay, which are the othertwo candidates for x-by-wire [16]. Strong points of
TT-CAN are the support of coexisting event- andtime-triggered traffic together with the fact that it is
standardized by ISO. It is also on top of standard
CAN which allows for an easy transition from CAN
to TT-CAN.
-
7/24/2019 Fault Tolerance in Automotive Systems_report
9/10
9
7.4. FlexRayThe FlexRay network is very flexible with regard to
topology and transmission support redundancy. It can
be configured as a bus, a star or multistar. It is not
mandatory for each station to possess replicatedchannels or a bus guardian, even though this should
be the case for critical functions such as the Steer-by-Wire [15]. FlexRay also provides fault tolerance by
distributed time-triggered synchronization (clock
synchronization) and error containment on the
physical layer through an independent bus guardian.
FlexRay allows both time-triggered and event-
triggered communication by means of a
communication cycle, where a time-triggered (static)
window and event triggered (dynamic) window are
concatenated. The time triggered window uses
TDMA like TTP, but unlike TTP, a given node maybe able to access the bus multiple times before allremaining nodes access it. The event-triggeredwindow uses a technique called Flexible TDMA
(FTDMA) to provide event triggered behaviorwithout collisions. According to the FlexRay
specification [21] a frame contains a 24 bit CRC
checksum to ensure the integrity of the frame
transmission. The probability of undetected network
errors is less than (6*10^-8). Adequately addressing
fault-tolerance is one of the key aspects that needed
to be considered during the design of FlexRay. Toallow a single communications system to support thediverse needs of automotive applications acrossdifferent application domains the consortium decided
to introduce a concept of scalable fault-tolerance.
Scalable fault-tolerance aims at allowing FlexRay to
be used economically in distributed non fault-tolerantsystems as well as in distributed fault-tolerant
systems.
In addition FlexRay can be deployed using optional
local or remote channel guardians that protect the
communications channels from transmission faults
that violate the TDMA scheme. The clocksynchronization algorithm supports fault-tolerant aswell as non fault-tolerant synchronization. For fault-tolerant synchronization the synchronization
algorithm considers the transient / permanent fault
class as well as the symmetric / asymmetric fault
class [22]. In this protocol, the synchronization of the
global time happens at the macrotick level, with theuse of a cluster-wide clock synchronization
algorithm. This clock synchronization algorithm
continues to operate even in the event of an ECU
failure in the system, unlike a master-slave
synchronization algorithm. Table 1 summarizes thekey differences between the automotive protocols
discussed so far.
7.5 Recent WorkA simulation study for fault-tolerant sensor networks
for cars on-board control is presented in [18]. On-
board communication and control networks are built
using Gigabit Ethernet. Sensors are smart and they
are the sources of traffic. Actuators are smart and
they are the sinks of traffic.
Table 1: Summary of Automotive Protocols
The controller is a personal computer. The sources of
real time traffic (sensors) are tripled while thenumber of sink nodes (actuators) is not increased.
This increase in the number of sensors is made to test
the possibility to build triple-modular redundancy(TMR) on the sensors level for fault-tolerance. The
disadvantage of TMR is obviously cost since three
sensors are required to produce an output that could
be generated by just one sensor. On the other hand,
the advantage of TMR is an increase in reliability.
The outputs have to go through a voter. Voting can
also be done in software which has been used in this
study. The controller, after reading the outputs of thethree sensors, executes a routine that compares thesethree outputs. A major problem here is that the three
outputs may not completely agree because thesensors, while identical, may not produce the same
exact output. The first solution to this problem is the
mid-value select technique. The second technique
is to ignore the least significant bits of the data. The
number of significant bits that have to agree depends
on the application. With TMR, the data from the
three copies of the sensor is compared and voted
upon. As long as there is only one failed sensor, the
controller will know which of the three packets to
discard and the system will remain operational. Whenthe second sensor fails, the entire system will fail. A
methodology of interconnecting the automotive busnetworks in a fault tolerant way is proposed in [20].
8. ConclusionIn this paper, we have provided a survey of fault
tolerant design techniques and methodologies used in
the automotive industry. X-by-wire systems that are
discussed in length are expected to be integrated into
most automobiles in the future. Automobile
manufacturers such as Toyota, Nissan and BMW
USAGE CAN TTCAN FlexRay
Chassis YES YES NOAirbags YES NO NO
Powertrain YES YES SOME
X-by-wire SOME YES YES
Multimedia NO NO NO
Telematics NO NO NO
Diagnostics YES SOME SOME
REQUIRE-MENTS
CAN TTCAN FlexRay
Fault tolerance SOME SOME YES
Determinism YES YES YES
Bandwidth SOME SOME YES
Flexibility YES YES YES
Security NO NO NO
-
7/24/2019 Fault Tolerance in Automotive Systems_report
10/10
10
have already introduced Brake-by-wire technology in
some of their recent models. As the amount of
software that goes into a modern car increases
steadily with the introduction of navigation systems,
instrument clusters, software fault tolerant techniqueswill become a mandatory requirement in theautomotive industry. FlexRay is expected to be thenetwork of choice in future X-by-wire designs due to
its high bandwidth. We feel a key challenge in the
area of automotive software will be handling the
huge amount of data that is to be processed (such as
map data in a navigation system), and yet provide
fault tolerance without causing any hindrance to user
experience. As modern cars slowly transition to
functioning as an information hub by providing
connectivity to Laptops, PDAs and cell phones,
another challenge would be to introduce some faulttolerant services in protocols such as Bluetooth,ZigBee and MOST. This represents an opportunityfor research for engineers from all backgrounds.
References[1] C. Lu, Jean-Charles Fabre, Marc-Olivier Killijian, An
approach for improving Fault-Tolerance in Automotive
Modular Embedded Software, Proc. of the 17th
International Conference on Real-Time and Network
Systems, 2009.
[2] Xi Chen, Requirements and concepts for future
automotive electronic architectures from the view ofintegrated safety, PhD Thesis, Universittsverlag
Karlsruhe, 2008.[3] H. Gustavsson, J. Sterner, An industrial case study of
Design Methodology and Decision Making for
Automotive Electronics, Proc. of the ASME
International Design Engineering Technical
Conferences & Computers and Information in
Engineering Conference, 2008.[4] M. Broy, I.H. Kruger, A. Pretschner, C. Salzmann,
Engineering Automotive Software, Proc. of IEEE,
Vol. 95, Issue 2, pp. 356-373, 2007.[5] M. Broy, Automotive Software and Systems
Engineering, Proc. of the 2nd
ACM/IEEE Conference
on formal Methods and Models for Co-Design , 2005.
[6] J. Park, S. Kim, W. Yoo, S. Hong, Designing Real-Time and Fault-Tolerant Middleware for Automotive
Software, Proc. of International Joint ConferenceSICE-ICASE,pp. 4409-4413, 2006.
[7] D. Palsetia, S. Pieper, Fault Tolerance in Automotive
X-by-Wire, Project Report, Department of Electrical
and Computer Engineering, UW-Madison, 2005.
[8] E.G. Leaphart, B.J. Czerny, J.G. DAmbrosio, B.T.
Murray, C.L. Denlinger, D. Littlejohn, Survey of
Software Failsafe Techniques for Safety-CriticalAutomotive Applications, SAE 2005-01-0779, SAE
World Congress, 2005.[9] A. Pretschner, M. Broy, I.H. Kruger, T. Stauner,
Software Engineering for Automotive Systems: A
Roadmap, Proc. of Future of Software Engineering ,pp. 55-71, 2007.
[10] E. Digler, R. Karrelmeyer, B. Straube, Fault Tolerant
Mechatronics, Proc. of 10th
International On-Line
Testing Symposium, 2004.[11] R. Isermann, R. Schwarz, S. Stolzl, Fault-Tolerant
Drive-by-wire Systems, IEEE Control SystemsMagazine, Vol. 22, Issue 5, pp. 64-81, 2002.
[12] A. Manzone, A. Pincetti, and D. De Costantini, Fault
Tolerant Automotive Systems: An Overview, Proc. of
the 7th International On-Line Testing Workshop, pp.
117-121, 2001.
[13] D. Capriglione, C. Liguori, C. Pianese, A. Pietrosanto,On-Line Sensor Fault Detection, Isolation, andAccommodation in Automotive Engines, IEEE
Transactions on Instrumentation and Measurement,Vol. 52, Issue 4, pp. 182-189, 2003.
[14] P. Hsu, K. Lin, L. Shen, Diagnosis of Multiple Sensor
and Actuator Failures in Automotive Engines, IEEETransactions on Vehicular Technology,Vol. 44, Issue4, pp. 779-789, 1995.
[15] N. Navet, Y. Song, F. Simonot-Lion, C. Wilwert,Trends in automotive communication systems, Proc.
of IEEE, Vol. 93, Issue 6, pp. 1204-1223, 2005.
[16] T. Nolte, H. Hansson, L.L. Bello, Automotive
Communications - Past, Current and Future, Proc. of10
th IEEE Conference on Emerging Technologies and
Factory Automation, Vol. 1, pp. 985-992, 2005.[17] N. Navet, F. Simonot-Lion, A Review of Embedded
Automotive Protocols, Technical Report, Nancy
Universit, 2008.[18] R.M. Daoud, H.H. Amer, H.M. Elsayed, Y. Sallez,
Fault-Tolerant Ethernet-Based Vehicle On-Board
Networks, Proc. of 32nd
Annual Conference on
Industrial Electronics, pp. 4662-4665, 2006.
[19] H. Aysan, A. Thekkilakattil, R. Dobrin, S. Punnekkat,Fault Tolerant Scheduling on Controller Area Network(CAN), Proc. of Emerging Technologies and Factory
Automation Conference,pp. 1-8, 2010.
[20] H. Kimm, Ho-Sang Ham, Integrated Fault Tolerant
System for Automotive Bus Networks, Proc. of 2nd
International Conference on Computer Engineering
and Applications, pp. 486-490, 2010.[21] FlexRay Consortium. (2004, June) FlexRay
Communication System, Protocol Specification,
Version 2.0. [Online]. Available:http://www.flexray.com
[22] C. Temple, Networking the FlexRay Way - An
overview of the FlexRay Communications System,Technical Report, Freescale Semiconductor.
[23] R. Garbenfeldt, X-by-wire: Driving Your Car and the
Semiconductor Industry, Technical Report, 2005.[24] F. Seidel, X-by-wire, Technical Report, Chemnitz
University of Technology, 2009.
[25] B. Selic, Fault tolerance techniques for Distributed
Systems, [Online], Available:http://www.ibm.com/developerworks/rational/library/1
14.html#N101B5[26] C. Wilwert, N. Navet, Y. Song, F. Simonot-Lion,
Design of automotive X-by-wire systems, Technical
Report.[27] L. He, Z. Yu, C. Zong, H. Zhao, The Dual-core Fault-
tolerant control for Electronic Control Unit of Steer-By-wire System, Proc. of International Conference on
Computer, Mechatronics, Control and Electronic
Engineering , pp. 436-439, 2010.[28] E. Touloupis, J.A. Flint, V.A. Chouliaras, A Fault-
Tolerant Architecture For Automotive Applications.
Technical Report, Loughborough University.[29] L. Lamport, R. Shostak, M. Pease, The Byzantine
Generals Problem , ACM Transactions onProgramming Language and Systems, vol. 4, no. 3,
pp382-401, 1982.
[30] IEC61508-1, Functional Safety of Electrical ElectronicProgrammable Electronic Safety-related Systems - Part1 : General requirements, IEC/SC65A, 1998.
[31] D. Jhalani, S. Dhir, Survey of Fault TolerantTechniques in Automotives, University of Wisconsin
Madison.
[32] H. Curtis, R. France, Time Triggered Protocol(TTP/C): A Safety-Critical System Protocol, literatureSurvey, University of Texas-Austin,1999.