DECISION-MAKING FOR AUTONOMOUS CONSTRUCTION...

Malardalen UniversitySchool of Innovation Design and Engineering

Vasteras, Sweden

Thesis for the Degree of Master of Science in Intelligent EmbeddedSystem - 30.0 credits

DECISION-MAKING FORAUTONOMOUS CONSTRUCTION

VEHICLES

Marielle [email protected]

Sweta [email protected]

Examiner: Daniel SundmarkMalardalen University, Vasteras, Sweden

Supervisors: Saad Mubeen, Ning XiongMalardalen University, Vasteras, Sweden

Company supervisor: Torbjorn MartinssonVolvo Construction Equipment, Eskilstuna, Sweden

June 18, 2019

Marielle Gallardo & Sweta Chakraborty Decision-Making

Abstract

Autonomous driving requires tactical decision-making while navigating in a dynamic shared spaceenvironment. The complexity and uncertainty in this process arise due to unknown and tightly-coupled interaction among traffic users. This thesis work formulates an unknown navigationproblem as a Markov decision process (MDP), supported by models of traffic participants anduserspace. Instead of modeling a traditional MDP, this work formulates a Multi-policy decisionmaking (MPDM) [1] in a shared space scenario with pedestrians and vehicles. The employed modelenables a unified and robust self-driving of the ego vehicle by selecting a desired policy along thepre-planned path. Obstacle avoidance is coupled within the navigation module performing a detouroff the planned path and obtaining a reward on task completion and penalizing for collision withothers. In addition to this, the thesis work is further extended by analyzing the real-time constraintsof the proposed model. The performance of the implemented framework is evaluated in a simulationenvironment on a typical construction (quarry) scenario. The effectiveness and efficiency of theelected policy verify the desired behavior of the autonomous vehicle.

Keywords: shared-space users, MPDM, timing analysis, planning and decision-making, autonomousvehicle, MDP, reinforcement learning, social force model

i


Acknowledgements

The authors would like to express sincere gratitude to the academic supervisor of this thesis, NingXiong, co-supervisor Saad Mubeen and thesis examiner Daniel Sundmark for their continuoussupport, motivation and immense knowledge. The valuable guidance and feedback improved thequality of the thesis work by steering it in the right direction.

Besides our advisor, we would also like to express great gratitude towards our industrial su-pervisor, Torbjorn Martinsson for providing such an interesting thesis opportunity and continuousassistance throughout the entire process.

This thesis work would not have been achievable without the co-operation and skills of theaforementioned persons. The authors would like to express their deep appreciation and will begrateful forever.

ii


Abbreviation

ALF ARTIST2 language for WCET flow analysisCFG Control flow graphCNN Convolutional neural networkDARPA Defence advanced research agencyE2EDA End-to-end delay analysisGPU Graphics processing unitHRTA Holistic response time analysisIDE Integrated development environmentILP Integer linear programmingLIDAR Light detection and rangingMBD Model based developmentMCDM Multi criteria decision makingMDP Markov decision processMPDM Multi-policy decision makingPOMDP Partially observable Markov decision processRADAR Radio detection and rangingRCM Rubus component modelRTA Response time analysisSFM Social force modelSWC Software circuit or Software componentSWEET Swedish execution time toolTP Timed pathWCET Worst-case execution timeWCRT Worst-case response time

iii


Table of Contents

1. Introduction 11.1. Industrial use case scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Motivation and problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3. Initial assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4. Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2. Background 52.1. Rational agents and environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2. Modeling the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2..1 Markov decision process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2..2 Partially observable Markov decision process . . . . . . . . . . . . . . . . . 8

2.3. Modeling the agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3..1 Creating a belief state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3..2 The decision process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4. Timing analysis of real-time systems . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4..1 Real-time characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4..2 Task chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4..3 End-to-end Timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4..4 The Rubus tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5. Control components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5..1 Systems required for autonomous navigation . . . . . . . . . . . . . . . . . 16

3. Related work 183.1. Decision-making algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1..1 Trajectory optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1..2 Rule-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1..3 Machine learning approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1..4 Probabilistic approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2. Worst case execution time analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2..1 Static WCET analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2..2 Measurement-based WCET analysis . . . . . . . . . . . . . . . . . . . . . . 213.2..3 Hybrid WCET analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3. End-to-end timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4. Research method 234.1. System development research method . . . . . . . . . . . . . . . . . . . . . . . . . 234.2. Application of the research method . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5. Proposed solution 255.1. Specification of the decision-making system . . . . . . . . . . . . . . . . . . . . . . 255.2. Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3. Specification of timing properties and requirements . . . . . . . . . . . . . . . . . . 275.4. The behavior of software components . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5. WCET estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.6. End-to-end timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6. Implementation 316.1. Autonomous driving in shared space formulated as a MDP . . . . . . . . . . . . . 31

6.1..1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.1..2 Transition function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.1..3 Action space and policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.1..4 Reward model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2. Social force model for crowd dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 33

iv


6.2..1 Pedestrian forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2..2 Vehicle forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.2..3 SFM based policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3. Solving MDP with multipolicy decision-making . . . . . . . . . . . . . . . . . . . . 366.4. Simulation environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.5. WCET test scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7. Results 397.1. How multiple forces influence the motion of traffic participants . . . . . . . . . . . 39

7.1..1 Motion actions of traffic participants towards goal . . . . . . . . . . . . . . 397.1..2 Avoidance of obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407.1..3 Interaction force among traffic users . . . . . . . . . . . . . . . . . . . . . . 417.1..4 Selection of an optimal policy in a typical construction terrain . . . . . . . 417.1..5 WCET estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8. Discussion 448.1. Analysis of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

8.1..1 Motion actions of traffic participants towards goal . . . . . . . . . . . . . . 448.1..2 Obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448.1..3 Interaction force among traffic users . . . . . . . . . . . . . . . . . . . . . . 448.1..4 Selection of an optimal policy in a typical construction terrain . . . . . . . 458.1..5 WCET estimation and end-to-end timing analysis . . . . . . . . . . . . . . 46

8.2. Simulation environment and parameters . . . . . . . . . . . . . . . . . . . . . . . . 46

9. Conclusion and future work 479.1. Formulating a tactical decision making system as a MDP framework . . . . . . . . 479.2. Solving a MDP prototype by selecting an optimal policy for the system . . . . . . 479.3. Effect of system load on real-time characteristics . . . . . . . . . . . . . . . . . . . 479.4. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

References 55

v


List of Figures

1 Illustrates the difference between the urban traffic scenario (A) and the constructionsite environment (B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Control System for Autonomous Vehicle . . . . . . . . . . . . . . . . . . . . . . . . 43 A visualization of the agent interacting with the environment . . . . . . . . . . . . 54 Vehicle B must predict the maneuver of the vehicle A before proceeding . . . . . . 65 Grid-world example where the agent aims to find the diamond while avoiding the

rocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Example of how the policy favor complete safety over being fast . . . . . . . . . . . 87 Presenting the state transitioning in a MDP . . . . . . . . . . . . . . . . . . . . . . 108 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Single-Rate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310 Multi-Rate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1311 Data Propagation Delays: Age Delay . . . . . . . . . . . . . . . . . . . . . . . . . . 1412 Data Propagation Delays: Reaction Delay . . . . . . . . . . . . . . . . . . . . . . . 1513 Periodic Task Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1514 Modelling of software architecture in RCM . . . . . . . . . . . . . . . . . . . . . . 1615 A flow diagram of the components required for autonomous navigation . . . . . . . 1716 A Flow Diagram of Multi-methodological Research Approach . . . . . . . . . . . . 2317 Process of System Development Approach . . . . . . . . . . . . . . . . . . . . . . . 2418 A Flow Diagram of the Approach Applied For System Design . . . . . . . . . . . . 2619 Overview of Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 2620 The different Move Crowd SWCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2721 Overview of the forces exerted on pedestrian i . . . . . . . . . . . . . . . . . . . . . 2822 Example code to demonstrate path clustering . . . . . . . . . . . . . . . . . . . . . 2923 Software Architecture Modelled in Rubus ICE . . . . . . . . . . . . . . . . . . . . . 3024 Demonstrates the angle ϕyδ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3525 Effective field of view for human drivers . . . . . . . . . . . . . . . . . . . . . . . . 3526 A presentation of the simulated environment where the black dots represent pedes-

trians and the blue circles represent vehicles . . . . . . . . . . . . . . . . . . . . . . 3827 Shows the behavior of the agent while navigating towards the goal without disturbance 3928 The autonomous agent adapts its velocity and direction towards the left as one of

the vehicles are located in its path . . . . . . . . . . . . . . . . . . . . . . . . . . . 4029 Show how the autonomous vehicle avoids static obstacles detected on its path . . . 4030 Show how agents interact with each other, the autonomous vehicle stops in capture

3 and 4 when pedestrians move towards it . . . . . . . . . . . . . . . . . . . . . . . 4131 The vehicle in front is estimated to navigate to the right and as such the agent

continues on its path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4232 The agent stops when vehicles are heading towards it . . . . . . . . . . . . . . . . . 4233 The vehicle in front is heading in the same direction as a result the agent chooses

to follow the leading vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4234 The agent though their are parallel and not in a straight line still considers to observe

the agent and follow it as they have the same direction and orientation . . . . . . . 43

List of Tables

1 Analysis Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 Analysis Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 WCET Measurement for 10 pedestrians, 10 obstacles and 20 vehicles . . . . . . . 43

vi


1. Introduction

In recent years, there has been an increased interest in the research and development of autonomousvehicles. The potential benefits of self-driving vehicles on public roads is well recognized withinthe automotive industry [2]. Every year approximate 1.35 million people are killed due to caraccidents, and the main cause is directly linked to human errors [3]. The increased deploymentof autonomous systems in the vehicles could provide long term benefits in terms of greater trafficsafety, increased efficiency and lower fuel consumption [4]. Today automation technology is alreadybeing incorporated in modern cars, such as dynamic steering response, adaptive cruise control, andautomatic braking systems.

The strive for automation does not only apply to the automotive sector. The effective use ofautomation in the construction industry is viewed as one of its greatest opportunities [5]. Accidentswith heavy equipment-vehicles are common [6] and automating machinery strives to provide a saferwork site. Employing fully autonomous heavy equipment-vehicles could also potentially increaseproductivity and efficiency [7]. Such machines are capable of dealing with work tasks withouthalt and provide quality work in a fast-paced manner, thus effectively speeding up the process offinishing the overall work operation.

Despite the benefits of automation, there are several problems to overcome in the process ofautomating these vehicles [8]. Constructing reliable and safe autonomous systems require modelingand handling all the complex scenarios that the vehicle might encounter in the environment.Automation control systems are reliant on robust predictions and sensor measurements to reliablyadapt to different scenarios. In such situations, the world model must account for any uncertainty insensor measurements, and consider the behavior of different traffic participants to make appropriaterisk assessments of future planning and motion [9].

In automation, planning and decision-making methods provide a framework for modeling com-plex scenarios and automatically determine an appropriate maneuver for the situation. Consideringthe complexity of the real world and the limited precision of sensors a probabilistic decision modelis often employed [10] [11]. With a probabilistic model, the vehicle makes robust decisions by pre-dicting and assessing future consequences. While proper theoretical frameworks for planning underuncertainty exists, called Markov decision process (MDP) and Partially observable Markov deci-sion process (POMDP), tractable solutions require choosing a suitable architecture for handlingthe different models of the environment [12].

In the architecture of an autonomous vehicle, motion planning is a fundamental part that affectsthe performance of the autonomous vehicle [13]. Motion planning regards with the high-level goalof transporting the vehicle from point A to point B in a collision-free manner. The task of movingthe vehicle from one point to another alone involves controlling several maneuvers, e.g. providingappropriate vehicle speed and wheel steering angle. Along with this, in order for the autonomoussystem to simultaneously provide the ability to avoid any obstacle that could result in an accidentor collision the system has to re-plan the original maneuver tactics. For re-planning, it is extremelyimportant that a feasible solution is calculated in a timely manner [14]. Preventing planning latencyis therefore of most importance. Managing all subsystems simultaneously becomes a highly complexproblem. A hierarchical software architecture, allowing for different planning methods in differentscenarios, is therefore considered a key solution for solving large planning problems [15].

This thesis work aims to investigate how a decision-making module can be designed to navigatein a dynamic, unstructured construction environment. The environment constrains the system toestimate the behavior of traffic participants in the space to account for uncertain encounters. Toensures the decision-making system is capable of performing safe online trajectory planning a setof well-defined maneuvering tactics are needed to be established. Furthermore, the evaluation ofdecisions is required to be made within a strict time-constraint to meet productivity goals andensure safety.

1.1. Industrial use case scenario

The complexity of the environment among different driving scenarios differs greatly. Urban or citydriving is viewed as one of the most difficult environments to model since it involves a varietyof complex driving situations. There are a lot of different traffic participants, traffic rules and

1


road types to consider. A lot of research regarding autonomous driving has therefore mainlyfocused on driving in highway scenarios since these are considered most traceable [14]. In thesecases, the decision system is often modeled as taking tactical high-level decisions such as choosingbetween lane-changing or lane-staying while avoiding collisions. These decisions are then sent tothe trajectory generator that provides low-level lateral and longitudinal control.

The environment at the construction site varies from urban and highway driving scenariosas can be depicted in 1. Unlike in public roads, there are no lanes and space is less limited.Furthermore, there are no predefined traffic rules. Humans operating the vehicles are instructedto perform specific tasks, e.g. going towards a pile and picking up gravel. At the same time, thework of all vehicles at the site are administratively controlled by humans remotely or at the siteto guarantee an efficient workflow.

A B

Figure 1: Illustrates the difference between the urban traffic scenario (A) and the construction siteenvironment (B)

The main tasks and goals likewise differ for construction vehicles as opposed to regular cars. Forself-driving cars, the main objective is to travel from point A to B in a safe manner. Constructionvehicles are not only concerned with providing a safe trajectory. Other tasks, such as digging andloading material efficiently are equally important.

The main purpose of this work is to model a generic decision system that can handle differentscenarios while considering high-level goals. To provide a proof of concept the scenario is requiredto be explicitly defined. In this work, the system will consider the autonomous vehicle to be awheel loader traveling towards a pile. Any other vehicles in its surrounding are considered to behuman operated. The high-level goal is to reach the pile efficiently and safely. Along the way, thevehicle may encounter different static or dynamic objects.

In motion planning, objects in the vehicle’s path are treated as obstacles and thus avoided [16].In real situations, such as at the construction site, different objects should be treated differentlydepending on the circumstance and established rules. For instance, if some particular vehicleis crossing the path of the autonomous wheel loader it might require to always give way to thevehicle according to company rules. Or in the presence of humans, it might be ordered to preform anemergency stop, signal and wait until the human is completely out of sight to continue. Consideringthe wheel loader is the only autonomous vehicle in the site it should further not behave abruptlyor scare others.

The purpose of the hierarchical decision system is to provide rules, similar to traffic rules, thatobey company requirements. The system takes behavior decisions on how the different objects willbe handled, while still considering the main goals. The decision system then sends commands tothe motion planner and controllers (e.g. adaptive cruise control) to adapt lateral and longitudinalvelocity.

This thesis work is a collaboration with Volvo Construction Equipment with the purpose ofcreating a decision-making system for autonomous construction vehicles. Productivity and safety

2


are important aspects and such the main objectives of the decision engine are to make safe androbust decisions while considering timing requirements. In this work, the aimed level of automationis level 5 according to SAE international [17], which means fully automated under all conditions.

1.2. Motivation and problem formulation

Modeling a hierarchical decision engine reduces the probability of acquiring planning latency andexecution errors that occur due to complex decision making [14]. However, there is no standardhierarchical solution, and finding an appropriate hierarchical structure is a challenge [15]. At thesame time, for the decision-making algorithm to provide the ability to perform real-time decisionsan algorithm that can evaluate decisions within a strict time-constraint is required.

A hierarchical control model for driver behavior further divide the decision-making in au-tonomous vehicles into three categories [18] since decision-making is required for different areasand applications. The model is based on how humans select tactics to achieve a goal and compriseof a strategic level developed for long-term goals, a tactical level selecting high-level maneuversto react to the current situations, and a control level handling low-level decisions such as theappropriate steering angle for adapting to the the high-level maneuvers.

The proposed system operates at the tactical level. A key challenge in autonomous cars is howto model decision systems that operate at this level since it is difficult to determine the appropriatebehavior for when there are a lot of outside and uncertain factors that affect the proper maneuver.Thus, in order to provide a framework for planning under uncertainty, the problem is modeled asan MDP or POMDP.

As can be observed the complexity of designing a tactical decision system is large. The envi-ronment, possible scenarios, inital assumptions, and rules are required to be well established andlimited to provide a solution. With these guidelines the work in this thesis aims to answer thefollowing questions whilst maintaining the focus on construction vehicle domain:

• Research question 1: How can a tactical decision engine for construction vehicles be formu-lated as a partially observable- or Markov decision process?

• Research question 2: How can real-time characteristics be taken into account while modelinga tactical decision engine?

• Research question 3: How can we solve the POMDP/MDP formulation and obtain an optimalpolicy1 considering real-time constraints?

1.3. Initial assumptions

Limiting the scope of this thesis work is very important considering the complexity of the problem.Therefore several assumptions have to be made beforehand. Discussing the limitations requireunderstanding the structure of a typical control system for autonomous vehicles [19]. Figure 2illustrates such a system which can be further divided into separate modules.

The perception subsystem corresponds to the extraction of available information from thevehicle’s prevailing surrounding environment. The data is obtained from both onboard and externalsensors. The abstract data (e.g. vehicle position, static or dynamic objects in the surrounding)gathered from these sensors are then processed and subsequently transferred in an adequate formto the following rules interpreter and behavior module. This work assumes the collection of all rawdata from the perception layer to be performed exclusively through onboard sensors. Additionally,it is also presumed that the construction site may be populated by both static and dynamic objects.For example, at the construction site, static obstacles can be considered as piles of material or barrelor any other parked vehicles whereas dynamic obstacles are working vehicles or human.

Based on the information provided by the perception subsystem the traffic rules interpretermodule checks the current rule and points at the most suitable reactions. Unlike urban or highwaytraffic conditions at a construction site, there are no predefined traffic rules. The authors of thisthesis have defined possible rules considering a typical construction environment at the behavior

1An optimal policy lets the agent known the most favorable action to take in each state, such that it maximizesthe total reward from the environment with respect to a given reward criterion

3


Car State Environment

Sensor Data Perception

Traffic RulesInterpreter

BehaviorController /

Decision Making

Behavior Interpreterand LowlevelController

Figure 2: Control System for Autonomous Vehicle

controller module. Furthermore, for the sake of simplicity an even terrain is considered prevailingat the assumed construction site.

The behavior controller or decision-making subsystem comprises a set of closed-loop controlalgorithms and a certain number of user-defined rules. Each algorithm is able to maneuver thevehicle under a given environmental situation (e.g. following or avoiding other vehicles or braking).The role of the behavior controller module is to decide upon which specific algorithm the systemshould perform in any distinct scenario. Finally, the output decision is forwarded to the behaviorinterpreter and low-level controller module, which then plans and executes the designated controlaction such as planning the best route, modulating the vehicle’s rotational or transnational velocity,steering angle and other actuators. This thesis mainly focuses on the design of a decision-makingmodule assuming other models are already provided as given inputs to the system.

1.4. Thesis outline

The rest of the thesis work is organized as follows: Section 2. explores relevant background for un-derstanding the topic of the thesis. In the background, the two different approaches are discussedfor modeling the decision-making environment. First, the Markov decision process (MDP) is dis-cussed and then the Partially observable Markov decision process (POMDP) is presented. Section3. presents and discusses various related work. Continuing this, a scientific approach for researchmethodology is introduced together with its application. This is described in Section 4. Section 5.discusses how to formulate the decision-making system as an MDP. Later the software architectureand timing properties of the system are presented. Lastly, how to select a suitable timing analysistechnique for the designed system is addressed. In Section 6. various implementation conceptsand formulas are described in detail. Section 7. evaluates the performance of the implementedsystem with different test-case scenarios and Section 8. further discusses the result along with thelimitations of the system. Finally, the thesis is concluded in Section 9. following a short proposalon the further development of the designed system.

4


2. Background

This chapter addresses the theoretical foundations for understanding the concept of real-timedecision-making in intelligent agents. Firstly, the concepts of rational agents and environmentsare discussed. An overview of the theory and algorithms behind Partially observable- and Markovdecision processes are then presented. Following this, a general overview of real-time schedulabilityanalysis is briefly discussed. The chapter ends with presenting the components and algorithmsrequired for the vehicle system to sense, react and navigate.

2.1. Rational agents and environments

In artificial intelligence, the study of thought processing, reasoning, and behavior of rational agentsare of interest. An agent is any system that can acquire information from the environment, makedecisions and take actions. The agent can be anything from a human, robot or nonphysical entitysuch a software program. The fundamental part is that the agent entity has a way of perceivingits environment and acting upon the environment. Consider us humans that have eyes and earsto orientate ourselves in the world or a robot system that has different vision sensors and variousmotors operating as actuators.

An agent is said to be rational if it can take the ideal action, given the information it knows.The study of finding the optimal action is what the field of decision making concerns itself with.To solve this problem the decision-making system is composed of two distinct components, theagent and its environment as depicted in Figure 3.

Interpretation

Decision of actions

Sensors

Actuators

Agent

Environm

ent

Actions

Observations

Figure 3: A visualization of the agent interacting with the environment

The environment provides a way to model the agent’s world and how it interacts with it. Rewardingcertain actions in the environment provides a way to make the agent behave in a certain manner.The agent component can further be divided as the interpreter and decision solver. The interpreteris the part of the agent responsible for modeling its sensors. The interpreter is responsible forobserving and creating a belief state of where the agent is believed to be in the environment.The solver is the brain of the agent. It considers the environment, the rewards given for certainactions, and the observations made from the interpreter to decide upon what action to make. Thecomponents of the system are further discussed in detail below.

2.2. Modeling the environment

The properties of the environment determine how the environment is modeled. An environmentis said to be fully observable if the agent, by its sensors, has a complete awareness of the state ofthe world at every point in time. The benefit of such a world model is that the agent does notneed to create a memory database of past observations to keep track of the environment. Considerthe checkers game. The environment is fully observable in the sense that the agent has full stateawareness of the game by knowing the game rules and the number of pieces the agent and the

5


component has access to. Contrarily, the environment is partially observable if the agent only haslimited awareness of the world state. The limited awareness is either a result of incorrect sensormeasurements from the agent or because the information is missing from the environment. Forautonomous vehicles navigating in traffic, the scenario deals with partial information. The vehicledoes not fully know the intention of other traffic participants and as such must make predictionsof future states.

For autonomous vehicles, the environment is stochastic since the outcome of a specific stateis uncertain. This is opposed to when the environment is deterministic, meaning the next stateis completely determined by the current state and the action performed by the agent. However,most real-world environments are not deterministic as can be illustrated with the example of navi-gation for autonomous vehicles represented in Figure 4. A mathematical framework is required formodeling a stochastic world environment. The following sections will discuss the widely employedMarkov decision process framework for modeling stochastic decision making where the outcome ispartly uncertain.

A

?B

A

Figure 4: Vehicle B must predict the maneuver of the vehicle A before proceeding

2.2..1 Markov decision process

A Markov decision process (MDP) is a 5-tuple 〈S,A, T,R, γ〉 where every element is defined asfollows:

• S: a finite set of states

• A: a finite set of actions for each state

• T : a transition model T (s, a, s′) = P (s′|s, a) giving the probability to go to another state s′

given the current state s and executing the action a

• R: a reward function R(s, a) providing a scalar value as a reward for being in a particularstate s and executing action a

• γ: a discount factor γ ∈ [0, 1) for adjusting the preference of immediate rewards or largerrewards in the future

The MDP framework aims to be a straightforward model for achieving goals that require learningfrom interacting with the environment. The agent learns that certain actions in a particular stateare more favorable than others. The interaction is continuous as the agent in every point in time

6


has to select an action, change to a different state and react to the new situation. The transitionmodel governs the probability that the agent actually changes to the desired state. For MDPsthe environment is stochastic and fully observable [20]. The stochasticity is due to the agent notbeing completely certain of what state it will be placed in after taking a particular action, butthe environment is fully observable as the agent’s observations of the environment are assumed tobe correct. Consider the following grid-world example where the goal of the agent is to find thediamond and avoid the rocks:

Figure 5: Grid-world example where the agent aims to find the diamond while avoiding the rocks

• States are the different grid blocks

• Actions are up, down, left, right

• The transition function states that there is a probability of 30% that the incorrect action istaken, if the agent wants to go up (or down) the probability of actually going up (or down)is 70% and 15% chance that the agent goes left or right. If the agent goes left (or right) theprobability of actually going left (or right) is 70% and 15% chance that the agent goes up ordown. Once the agent is in a particular state it is assumed to fully know its position in thegrid-world.

• The reward model gives the agent -10 points when it hits the rocks, +20 points when findingthe diamond and -0.2 points for transitioning to any other block. The negative reward formoving into an empty block defines a way to prioritize reaching the treasure as quickly aspossible at each step.

The MDP has the Markov property stating that once in the current state necessary information ofthe history of past states is accessible without the need to explicitly memorize or save past events.In mathematical expressions, this means that state s′ = st+1 has the Markov property only if:

P [st+1|st] = P [st+1|s1, s2, ..., st], (1)

where the state can access relevant information from the history.

As can be observed in the same grid-example, the agent can take different routes for arriving

7


at the main goal. The main objective of an MDP is to find a policy π(s) → a that determinethe optimal action for every state. The optimal policy π∗(s) → a is a policy that maximizes theexpected reward R(s, a). Any arbitrary policy π(s) is defined as follows:

π = Eπ

[ ∞∑t=0

γtR(st, π(st))

],

(2)

where the policy maps every state with a particular action and the expected accumulated returnstarting from state s, taking action a, and following policy π, tell us the greatness of the policy.The policy is guaranteed to converge over an infinite time horizon since γ ∈ [0, 1).

The optimal policy π∗(s) can be defined as follows:

π∗ = argmaxπ

E

[ ∞∑t=0

γtR(st, π(st))

],

(3)

where the only difference is the maximized accumulated reward for sampling all actions from aparticular state defined by the policy.

Consider the grid world example again. Once the agent is in a particular state it can take differentroutes for achieving the main goal of getting the diamond as previously mentioned. The agentcould take a safer route away from the rocks or take a more risky but shorter path alongside therocks as can be depicted in Figure 5. Finding the optimal policy is deciding which strategy is mostoptimal.

The transition- and reward functions alter what parameters are of most importance and there-fore affects what defines the optimal policy. In the grid-world case, the reward function might favorsafety over being fast. Consider the same reward- and transition model example presented before.The high-lighted state in Figure 6 might seem unreasonable. However, the transition model statesthat when the agent goes down there is no probability of going up. Thus, the agent can completelyavoid the rock. The agent might be stuck in the same state for a while but because there is aprobability of 15% of going right the agent will eventually reach the goal. This is fine because thereward model only gives the agent a penalty of -0.2 for transitioning to other states. Hitting therock is worse as it gives the agent -10 points. The optimal policy has therefore defined that themost optimal way to act in states near the rock is to completely avoid it.

Figure 6: Example of how the policy favor complete safety over being fast

There are different methods for finding the optimal policy, such as Dynamic Programming anddifferent tree search methods. The solver component in the agent has the role to find the mostoptimal policy to execute. A more descriptive role of the solver part of the agent is presented insection 2.3..2.

2.2..2 Partially observable Markov decision process

Partially observable Markov decision processes, as can be observed by the name, are an extensionof MDPs for modeling partly observable stochastic decision making to account for sensor- and

8


environmental uncertainties. Contrary to the case of MDPs, the agent is not capable of observingthe full state. Instead, the agent has to make observations to hint about the true state. This isthe case for many real systems that are relying on uncertain or subjective sensors to model theenvironment. Suppose the agent fails to perform an action or the environment changes due tooutside forces. POMDPs provide a way to handle these situations.

In a POMDP the set of observations can be probabilistic to model the probability of eachobservation for each state in the model. This is referred to as the belief state and creates thenotion of possible states the agent might be in. The comprised knowledge about the current statein POMPD would generally require to keep track of the entire history of past states, makingthe POMDP a non-Markovian process, however maintaining the belief state (i.e probability dis-tribution) of possible states provides the information needed to maintain the history [20]. ThusPOMDPs are Markovian processes.

POMDPs are a 7-tuple 〈S,A, T,R, γ,Ω, O〉 were the first five elements are a regular MDP andthe two last elements are defined as follows:

• Ω: a set of observations

• O: a set of observation probabilities O(s′, a, o) = P (o|s′, a) to specify the probability ofreceiving the observation o ∈ Ω given that the system ended up in state s′ after taking actiona

The belief state B is the key component for the interpreter part of the agent. As new observationsare made the belief state is updated to get more accurate knowledge about the possible state. Theinterpreter and belief states are discussed below.

2.3. Modeling the agent

This section first discusses the agent component required for interpreting sensor data to model abelief state of the environment. The actual decision process in the agent is then described.

2.3..1 Creating a belief state

The creation of the belief state is important for modeling sensor uncertainty. In the process, theagent starts with a prior belief distribution. As new observations are made the belief distributionis then updated to create a posterior belief distribution. The process of updating the belief stateis referred to as filtering. Filtering allows the agent to refine its belief of possible states.

The Markovian property states that the new belief state over states only requires the notion ofthe previous belief distribution, current observation and the action taken. In mathematical terms,this is expressed as follow:

b′ = τ(b, a, o) (4)

where b′ is the posterior belief state and b is the prior belief distribution.

The posterior, updated, belief distribution can be computed as follows:

b′(s′) = αP (o|s′, a)∑s

P (s′|s, a)b(s), (5)

where b′(s′) is the probability of state s′, α is a normalizing constant that makes the belief statesum to 1. The probability of state s′ is dependent on the probability of making observation o ∈ Ωgiven state s′ and action a, considering the transition probability to be in state s′ and the previousbelief state b(s).

For updating the belief distribution different filtering method exists, such as Bayesian filteringand Monte Carlo particle filters. Due to the Markovian property of the belief state, a POMDPcan be formulated as an MDP where the belief states are the states. The problem is that thebelief-state space is continuous considering the possibility of infinite probability distributions of

9


the belief state. Finding the optimal policy π∗(b), therefore, requires other solutions or adaptingthe standard methods employed for solving classical discrete MDPs [20].

2.3..2 The decision process

The actual decision procedure happens when the agent, after making observations, decide whataction to execute considering its beliefs. The process is narrowed down to finding the optimalpolicy in MDPs and POMDPs. There are several methods and algorithms for finding an optimalpolicy. The methods are further categorized as either online- or offline approaches.

The offline approaches assume the solutions can be calculated beforehand in every state (e.g.supervised learning used to equip the agent with knowledge beforehand to choose good actions)whereas online methods find the optimal policy directly only from the current state (e.g. MonteCarlo Tree Search simulating best action to take in every state). For problems with large spacecalculating the policy beforehand for every state might be impossible [21]. The often employedValue iteration and policy iteration methods are presented below to describe how an optimal policycan be solved for MDPs.

Example of how to find an optimal policy

Value iteration and policy iteration are dynamic programming methods. Both these methodsassume that the agent knows the MDP model of the environment. This refers to the agent havingknowledge about the transition and reward functions. Assuming this knowledge the agent can planwhat action to take in beforehand (offline) [21].

In MDP there is the notion of states and q-states. States are regular states while q-states arechance nodes. While an agent is at a particular state it determines to take a certain action. Itthen transitions to a q-state where the transition model determines the probability to land in thenext state. This process repeats itself. The reward function gives a value when the transition hashappened. The actual process is illustrated in Figure 7.

a

s

s, a

s'

s, a, s'

state

qstate

action

transition

Figure 7: Presenting the state transitioning in a MDP

The utility is the sum of discounted rewards. To maximize long-term discounted rewards requiresfinding policy π that maximizes the expected future utility of each state s, and for the q-state s, a:

V π(s) = ERt+1 + γRt+2 + γ2Rt+3 + ..|st = s, π, (6)

Qπ(s, a) = ERt+1 + γRt+2 + γ2Rt+3 + ..|st = s, at = a, π, (7)

10


where γ is the discounted reward.

Both these functions are referred to as value functions. The value function V π(s) specifies howgood a state s is if following the policy π. In other words, the value function is equal to the ex-pected future utility starting from state s and executing the actions specified by the policy. Withthis value function, the policy is evaluated depending on the long-term value the agent is expectedto get from following the policy. The action-value function Qπ(s, a) specifies how good a particularaction is from a particular state while following policy π. It specifies the expected future utilityfrom a q-state following the actions defined by the policy.

The optimal value function V ∗(s) is the function that has a higher value over other functions.It is the expected utility starting in s and acting optimally. The optimal value function Q∗(s, a)means the highest value for having taken action a from state s and then behaving optimally. Sincethe optimal value function V ∗(s) defines acting optimal from state s and Q∗(s, a) defines takingan action a from state s and then acting optimal, V ∗(s) equals the maximum Q∗(s, a) over allpossible actions defined as follows:

V ∗(s) = maxa

Q∗(s, a) (8)

The definition of the optimal Q-function is the following:

Q∗(s, a) = R(s, a) + γ∑s′

P (s′|s, a)V ∗(s′), (9)

where R(s, a) is the immediate reward after performing the action a while in state s, and sincethere is a distribution over outcomes the sum over all possible next states s′ are required to beaveraged over the probability of landing in those states when taking action a, times the discountedexpected future utility starting from s′ which is given by V ∗(s′). These are recursive Bellmanequations that can be solved.

Value iteration initializes the value function V (s) to arbitrary random values. The Q(s, a) andV (s) values are updated iteratively in a bottom-up way to compute the optimal value functions.The idea is to set time-limited values. At every time step Vk+1(s) the value of Vk(s) is knownand thus both values are recursively updated. This process is repeated until the values converge.The optimal policy is given once the optimal values of every state are set. The following equationcalculates the optimal value of s for a finite horizon:

Vk+1(s) = maxa

[R(s, a) + γ

∑s′

P (s′|s, a)Vk(s′)]

(10)

The complexity of value iteration is as high as O(S2A) since the equation needs to look at eachstate, action and next state per iteration. The algorithm is impractical over large state- and actionspace. Policy iteration reduces the complexity to O(S2) per iteration by calculating the valuefunction for a fixed policy. The action space is reduced considering the policy maps every state toone action. The process first evaluates an arbitrary policy π in a similar method as value iterationuntil convergence:

V πik+1(s) = R(s, πi(s)) + γ∑s′

P (s′|s, πi(s))V πik (s′) (11)

Then a new better policy is created by comparing the evaluated policy to the actions available. Theaction with the best value is chosen and the process is repeated for every new state. Convergenceis reached once the evaluated policy is equal to the new optimal policy. Instead of returning thevalue of the best action, the action is returned. In mathematical form this is expressed as follows:

πi+1(s) = arg maxa

[R(s, a) + γ

∑s′

P (s′|s, a)V πi(s′)]

(12)

11


2.4. Timing analysis of real-time systems

This section first addresses the general concept of schedulability analysis which verifies whetherthe timing constraints specified on the system are satisfied or not. Following this, the end-to-endtiming constraints are discussed. At the end of this section, the utilized industrial tool suite fortiming analysis is presented.

2.4..1 Real-time characteristics

A real-time system is required to interact and communicate with its environment in a timely man-ner. Such a system must processes information and deliver a logically correct output within astrict timing constraint. Regardless of the system load, failure of such a system in many applica-tions may result in catastrophic consequences such as endangering human life or the environment.Hence, the temporal aspects of these systems must be predictable. While considering the real-timecharacteristics of the embedded system, each software component of the system is reviewed as aset of tasks. The three fundamental characteristics of these task sets in a periodic task model arelisted as worst-case execution time (C ), deadline (D) and period (T ). A brief description of eachconcept is presented below and shown in Figure 8a and 8b.

• Task: A sequence of a program executed on a processor aimed to perform a certain compu-tation.

• Time period: The time interval between activation of two consecutive task instances.

• Deadline: The latest time within which the execution of the task has to be finished afterits release.

• Worst-case execution time (WCET): The maximum execution time of a program onspecific hardware without any interruption. Estimation of safe and tight WCET of a programis crucial for reliable and accurate behavior of the real-time system.

• Worst-case response time (WCRT): The longest possible time required by a task tofinish its execution while considering worst-case interference (preemption and blocking) fromother tasks in the model.

Period start Start of next period

Finishing time

worst-case execution time

Time Period

Absolute Deadline

Relative deadline

Response time

(a) Without interference

Period start Start of next period

Finishing time

execution time

Time Period

Absolute Deadline

Relative deadline

Response time

(sum of all parts)

worst-case interference

(b) With interference

Figure 8: Task Model

The response time of a task is bounded by interference from other tasks in the model. Moreprecisely, a task execution may be preempted by other higher priority tasks or blocked by lowerpriority tasks. As depicted in Figure 8a the execution of a task is not interfered by any othertasks in the model. Hence, the task’s response time and WCET are equal. On the other hand(see Figure 8b) due to interference by another task in the model, the response time of the task ishigher than the WCET.

12


The real-time requirement of each software component primarily depends on two major constraints.First, the WCET of each task must be less than or equal to its deadline. Moreover, the deadlinefor each task should be less than or equal to its period, as shown in Equation 13.

C ≤ D ≤ T (13)

Another requirement is the schedulability of task sets. This implies each task requires to havea maximum response time i.e., the time elapsed between the moment the task becomes ready toexecute and finishes its execution must be less than or equal to its deadline [22]. Schedulabilityanalysis methods evidence the meeting of timing requirements of the system. One such method isthe Response time analysis (RTA), which checks the schedulability of a system by concluding bothnecessary and sufficient conditions respectively [23]. In real-time systems and networks, the upperbounds on the response times of tasks or messages are calculated by the RTA method.

2.4..2 Task chain

A task chain contains a sequence of tasks and can be classified as a trigger or data chain dependingon the activation of tasks in the chain. Therefore, the analysis of the task activation pattern is ofsignificance to the performance of the real-time system. The classified task chains are presentedand described in the list below:

• Trigger chain: The first task in the chain is initially activated by a triggering source (e.g,event, clock or interrupt) as shown in Figure 9. The rest of the tasks in the trigger chain areactivated by their predecessor tasks. In such a chain, there is an implicit precedence relationbetween any two neighboring tasks, i.e., a task can only be triggered for execution after itspredecessor task has finished its execution. If a system is modeled solely with trigger chainsthen the system is said to be single-rate.

datasinkSensor data

input

Figure 9: Single-Rate System

• Data chain: Each task in the data chain is activated individually with an independentclock and often with distinct periods (see in Figure 10). Hence, all tasks are independent ofeach other. Signals from peripheral devices or messages from the network interfaces providedata to the first task in the chain. However, the rest of the tasks receive data from theirpredecessors. If a system contains a single data chain with different clocks then it is called amulti-rate system.

Sensor datainput

datasink

Figure 10: Multi-Rate System

13


2.4..3 End-to-end Timing analysis

In the case of a pure trigger chain or single-rate system, every task in the chain is not indepen-dently activated. In fact, only the first task in the chain is activated by an event while amongother tasks precedence constraint exists. In this work, analysis for end-to-end data propagationdelay is discussed exclusively. In order to verify the end-to-end delay constraint, especially in theautomotive domain, the age delay and reaction delay are expected to be calculated. To explainthe analysis method, a few relevant concepts are required to be briefly described. The followinglist present concepts for understanding end-to-end data propagation delay:

• Timed path(TP): A sequence of all task instances along a task chain in the system.

• Reachable timed path: The path where data can actually propagate from the first taskto the last task in the chain.

• Age delay: The maximum delay on reading the output among all reachable timed path inthe chain as shown in Figure 11. The age delay can be calculated manually either using amathematical equation or by creating a graphical representation. The mathematical equationfor age delay estimation is expressed in 14.

AgeDelay = MaxDelay(TP1), ..., Delay(TPM ) (14)

Delay(TPM ) is the data propagation delay in each timed path and can be calculated byequation 15.

Delay(TPM ) = αn(TPMn ) +Rn(TPMn )− α1(TPM1 ) (15)

α1(TPM1 ) is the activation time of the instance of the 1st task in path TPM .αn(TPMn ) is the activation time of the instance of the nth task (the last task in the chain)in path TPM .Rn(TPMn ) is the response time of the instance of the nth task in the path TPM .

0 2 4 6 8 10 12 t

Hyperperiod

Age Delay

Periodic Task Arrival

Data Propagation

PeriodicTask

TP1 TP2

TP1 TP2 ReachableTimed Path

Figure 11: Data Propagation Delays: Age Delay

• Reaction delay: In order to calculate the reaction delay in a task chain it is required toconsider the effect of missing new data as input in the first instance of a task. Reaction delayis defined as the maximum delay among all reachable timed path in a single hyperperiod andhaving non-duplicate or ”first” output of the chain. The mathematical formula to calculatereaction delay is expressed in Equation 16 and for the graphical representation see Figure 12.

ReactionDelay = MaxDelay(TP1), ..., Delay(TPM ) (16)

Delay(TPM ) is the delay in each such path and can be calculated by equation 17.

Delay(TPM ) = αn(TPMn ) +Rn(TPMn )− α1(Pred(TPM1 )) (17)

14


0 2 4 6 8 10 12 t

Hyperperiod

Reaction Delay

Periodic Task Arrival

Data Propagation

PeriodicTask

Task just missedthe new data whichis read by the new

instances of the firsttask

TP

ReachableTimed Path

TP

Figure 12: Data Propagation Delays: Reaction Delay

α1(Pred(TPM1 )) is the activation time of the predecessor instance of the 1st task in pathTPM .

Example of an age and reaction delay calculation

Consider a task chain (see Figure 13) where all tasks are periodic and activated independently.Task sets are communicating with each other through registers. The timing diagram of the taskchain is depicted in Figure 11 and 12.

Register 1 Register 2

High PriorityPeriod = 8

Medium PriorityPeriod = 2

Low PriorityPeriod = 4

Figure 13: Periodic Task Chain

Age delay calculation for the task chain:

Delay(TP 1) = α3(TP 13 ) +R3(TP 1

3 )− α1(TP 11 ) = α3(1) +R3(1)− α1(1) = 0 + 1.5− 0 = 1.5

Delay(TP 2) = α3(TP 23 ) +R3(TP 2

3 )− α1(TP 21 ) = α3(2) +R3(2)− α1(1) = 4 + 1− 0 = 5.0

AgeDelay = MaxDelay(TP 1), Delay(TP 2) = Max1.5, 5 = 5

Reaction delay calculation for the task chain:

Delay(TP) = α3(TP3) +R3(TP3)− α1(Pred(TP1)) = α3(3) +R3(3)− α1(1) = 8 + 1.5− 0 = 9.5

ReactionDelay = MaxDelay(TP ) = 9.5

2.4..4 The Rubus tool

In order to develop a model and a component-based resource-constrained embedded system theRubus tool [24] is utilized. The overall goal of this tool-set is to form a resource efficient system. Thetool is developed by Arcticus System [24] and in close collaboration with other research institutesand industrial partners [25], [26]. The Rubus concept is based around the Rubus component model(RCM) [27] and its integrated component development environment. The environment is calledRubus-ICE and is a commercial tool for modeling an application, including its control flows, dataflows, and constraints.

15


Software Circuit_1 Software Circuit_2

Clock

GND

Input Trigger Port

Input Trigger Port

Input Data Port

Trigger Terminator

Input Data Port

Figure 14: Modelling of software architecture in RCM

The Rubus component model (RCM)

The software architecture is referred to as the fundamental structure of a software system. Itcomprises of software components and their interactions in terms of data and control flow. Thecontrol flow is triggered by either a periodic clock, interrupts or internal and external events. Thepurpose of the Rubus component model is to express the infrastructure for the software architectureand analyze the components timing constraints. The basic software component in RCM is calleda software circuit (SWC) and its purpose is to encapsulate functional code. The trigger and dataflow in SWCs are realized by a trigger and data port respectively. In RCM the functional code isseparated from the framework implementing the execution model. Hence, a SWC has no knowledgeregarding its connection to other components which benefices the reuse of components in differentcontext [28]. Figure 14 illustrates the interaction of the components with regards to both data andtrigger.

The Rubus analysis framework

In the component-based design of the software architecture, the real-time properties and charac-teristics are specified at the architectural level. These real-time characteristics are WCET andstack usage. The designer must express these properties while designing the model of a system.Later, the scheduler will consider these real-time constraints while generating the schedule of theoverall software architecture.

2.5. Control components

The initial intention of this section is to give a conceptual and theoretical representation of howthe environment and the agent can be modeled with MDP and POMDP. Different methods forsolving MDP and POMDP are presented as they provide a way to make accurate predictions aboutthe future state and decide the optimal actions. Later concepts for understanding and analyzingreal-time systems are presented. Various tools and methods for creating real-time constrainedsystems are discussed as they are required to be utilized for designing the initial structure of thedecision-making system.

The next section intends to give a more concrete representation of the different control com-ponents required for autonomous vehicle navigation. It aims to provide the reader with an un-derstanding of what role the different subsystems have in relation to the decision-making module.This is important since the subsystem are all interconnected.

2.5..1 Systems required for autonomous navigation

In order to enable the construction vehicle to navigate autonomously and safely in a real-worldcluttered construction site, several technical requirements need to be taken into consideration.

16


The system requires algorithms for path planning, path tracking, local navigation, and decision-making. All these components are independent and directly impact the autonomous navigation,treated as a fundamental skill of the agent. The different components for autonomous navigationand decision-making are shown in Figure 15.

SensorData

TechnologyLocalization

Low-level

actuator

ObjectPerception

PathPlanning

WorldModel

DrivingManeuvers

Real-Time

DecisionMaking

Environment

Figure 15: A flow diagram of the components required for autonomous navigation

• Sensor data technology: Sensor technology merges multiple data coming in from differentsensors (e.g. LIDAR, RADAR, Stereo Camera, GPU, Odometry specified by the company)and delivers accurate and reliable information regarding the vehicle’s immediate surroundingenvironment.

• World model: Software components collect sensor data and model a static view of vehicle’senvironment.

• Localization: This system tracks the vehicle’s current position in a static map.

• Path planning: Path planning is further divided into global- and local path planning. Theglobal path planning algorithm deliberately determines an optimal path on configured space,thus enabling the vehicle to navigate from a start location to a goal location, while avoidingstatic obstacles present in the configuration space. The local path planning system utilizesreal-time sensor data and provides a fast and reactive algorithm. The algorithm enables theavoidance of dynamic obstacles emerging in the environment in a minimal amount of timewhile proceeding towards the overall goal location.

• Object perception: Detects objects and obstacles via incoming real-time sensor data whichprovides various information about the detected objects such as the type, size, and color.

• Low-level actuator: Utilizes algorithms such as path tracking which enables the vehicle tofollow a predefined trajectory accurately, as provided by the path planner. A path trackingalgorithm utilizes the positional- or the localization information to control the vehicle’s speedand steering angle.

• Decision making: The role of the decision-making system is choosing from competingpriorities to select the most suitable action based on the current scenario. This modulemust take into account the uncertainties in the environment and other model imperfections.The output from the decision-making system compiles the output from path tracker module,thereby translating into an intelligent selection of the vehicle’s velocity and steering angle.

17


3. Related work

In this chapter, an overview of the state-of-the-art research conducted in the domain of autonomousnavigation is described. In this thesis work, different fields of scientific approaches are considered.Consequently, the related work is divided into various subsections. Primarily, different approachesfor solving decision-making problems for autonomous driving are discussed. Following this, meth-ods for WCET estimation of real-time systems are presented. Afterward, the end-to-end timinganalysis constraints are discussed briefly. At the end of this chapter, a motivation for the chosenmethods are presented.

3.1. Decision-making algorithm

There are several decision making algorithms which have been designed and developed for au-tonomous navigation. However, selection of one such method is highly dependent upon the pre-vailing uncertainty in the environment, efficiency in producing optimal behaviour and complexityof the implementation. A short overview of different algorithms is presented below.

3.1..1 Trajectory optimization

Over the past decade, a number of investigators have proposed decision-making solutions for au-tonomous vehicles through the lens of safe and efficient trajectory optimization algorithms. In asimulated environment, Gu et al. [29] uses dynamic programming and generates an efficient trajec-tory planner in an uncertain environment. In their work, the generate and test approach is used toselect the best trajectory solution which encodes the suitable driving maneuvers. Although theirsimulation result was promising, their method was never implemented on a real vehicle scenario.

A novel collision-free motion algorithm among multiple agents, moving in a common workspaceis proposed by Berg et al. [30]. In their work, each agent is completely independent and doesnot communicate with other agents. A velocity obstacles approach is used for obstacle avoidanceconsidering a known simulated environment. The main drawback of this method is the strongassumptions which make the approach less applicable in a real-world scenario. Another trajectoryoptimization method is proposed in [31]. The system generates an intelligent driving maneuverwhile accounting for both static and dynamic obstacles in the environment. The authors alsocompared the performance of the motion planner before and after trajectory optimization. Bothsimulation and experimental results revealed the reduction of computational run-time for motionplanner by 52% with improved trajectory quality. However, their method was primarily imple-mented for a simple passing maneuver. Another drawback in their experimental methodology wasthe fact that it was only limited to static obstacles. Similar approach is also used by Kuwata etal. [32], Tumova et al. [33] and Vasile et al. [34].

In a complex cluttered traffic environment, modeling interactions from the surrounding vehiclesare necessary. The key problem with trajectory optimization methods is that the methods can onlypredict other vehicle’s current position in the traffic, but not their future reactions in response toactions from ego vehicle. However, in a real traffic scenario closed-loop coupled interactions amongother vehicles are also accounted for. Instead of providing one or several collision-free paths, the egovehicle required a decision-making system which suggests when and how to navigate in a dynamicenvironment.

3.1..2 Rule-based approach

As of today the most common approach to decision making in autonomous navigation is to manuallymodel reactions to situations [35]. State machines are commonly used to assess the situations andtake decisions in a single framework. During the 2007 DARPA urban challenge, the first notableapproaches for autonomous vehicle’s decision-making architecture in an urban driving environmenthad been proposed [36]. Due to the controlled traffic situation, the design of the decision-makingsystem was more inclined towards a rule-based expert system. A concurrent hierarchical statemachine is used by Ziegler et al. [37] to model the basic behaviors in accordance with the trafficrules. In their work, the trajectory planner controls the driving maneuver only for merging theego vehicle into traffic. However, several strong assumptions have been made such as other cars

18


in the traffic maintaining the right bound distance and to not accelerate while driving. Theuncertainty arising in common traffic situations was also not considered. Team Junior [38] inthe DARPA challenge finds critical zones where the ego vehicle has to wait in different trafficsituations. Based upon ego vehicle’s speed and proximity relative to other vehicles they conducteda threshold test to provide with such territory. The concept of a rule-based decision-making systemis implemented in a wide variety of solutions ranging from decision trees [39], behavior networks[40], heuristic approaches [41], distributed voting-based behavior architectures [42] or hybrids ofrule-based decision-making and behavior models [43].

In order to adapt these approaches towards real-world scenario, more and more states, andtransitions need to be considered for a complex and cluttered scenario. For example, with the statemachine approach, individual complex tasks, like following a leader vehicle or merging into traffic,usually need tailored solutions. This process is tedious, error-prone and not robust considering theincreasing complexity of driving situations.

3.1..3 Machine learning approach

To make decisions for autonomous navigation various learning based techniques have been endorsedin the automotive industry. Muller et al. [44] proposed a vision based obstacle avoidance system foran autonomous mobile robot. The system is trained from end to end to map the raw input imagestranslated to steering angles. The learning system is a CNN model which predicts the relationbetween input images and the driver’s steering angle. An experimental result shows high-speedperformance and real-time obstacle detection and path navigation. A real-time learning basedtrajectory generator approach is proposed by Guo et al. [45] for implementing an advanced driverassistance system with lane-following and adaptive cruise control function. Particularly, in theirwork, the ego vehicle learned the local trajectory generation from the existing leader vehicle inthe host lane. An experimental result depicts the effectiveness of the proposed system in a typicalbut challenging urban environment. Other machine-learning techniques such as Gaussian mixedmodels [46], Gaussian process regression [47], Inverse reinforcement learning [48] also provide goodsolutions by predicting uncertain conditions in the environment and planning a course throughthem.

With these approaches, the generation of optimal decisions is limited by extensive training ofthe system, which in itself is a pretty complex and time-consuming endeavor. Additionally, it isdifficult to reproduce multiple training scenarios which can represent the complex combinations ofdiverse situations arising from the real world.

3.1..4 Probabilistic approach

Human behavior is highly specific and complex, hence probabilistic prediction is used by severalresearchers as an input to the motion planner [49]. Damerow et al. [50] proposed a behaviorgeneration approach for advanced driver-assistance systems, considering multiple traffic scenariowith the different occurrence of probability. They created several cost-maps for each traffic situationto solve the global planning problem. Although they managed to generate different behaviorapplicable for different situations they made unrealistic initial assumptions which may not holdtrue in real traffic conditions.

Zhan et al. [51] presented a non-conservative defensive strategy for urban driving. Theyconsidered two deterministic driving maneuvers (passing and yielding) for other vehicles in traffic.The simulation results demonstrated the realistic behavior of the system under uncertainty. Inprobabilistic approaches, the problem is often formulated as a POMDP where the intention andre-planning procedure of other agents are not directly observable. Different navigation problems areformulated and solved by general POMDP model as observed in [52] and [53]. However, the maindrawback of this method is the requirement of high computation run-time even for small states,actions and observation spaces compared to a real-world scenario. To overcome the problem withtraditional POMDP, a group of researcher provided a set of pre-computed initial policies to thePOMDP model. The POMDP solver then refines these policies and switches on and off over time toselect the best policy. However, with these methods, significant resources are required to computea set of policies which in turn limits the planning horizon, states, actions, and observation spaces.

19


A more practical approach towards the decision-making problem for autonomous driving isobserved in [54]. The authors proposed a Multiple Criteria Decision Making (MCDM) theory togenerate a high-level decision process in the navigation system. However, they do not predictthe future intentions of other vehicle’s participating in the traffic. In these approaches, motionplanning algorithms determined various discrete behaviors of other vehicles and their probabilities.The motion action of other vehicles and the ego vehicle are not modeled independently (non-interactive). Hence, the planning and decision-making modules are not integrated into the system.The robotics community stated such system results in an uncertain explosion in future states andthus results in the freezing robot problem [55]. However, within this class, the interaction mayoccur implicitly by re-planning but it is not explicitly planned.

More recently, in a highway entry scenario involving merging of ego vehicle into moving traffic,Wei et al. presented a set of possible high-level policies in MDP. The authors performed a forwardsimulation to find the best policy in the most likely traffic scenario. Then they scored everypolicy against ego vehicle’s cost function and the best policy was executed [56]. Guo et al. [57]generates a hybrid potential map which accounts for detecting obstacles and predicting risks in theenvironment. Their MDP model is used to generate candidate actions/policies and it is based onmachine learning tuning. Although, in a simulated environment both the authors are able to getpromising results their work still lacks real-world experimentation. Cummingham et al. solved thecomplex POMDP model by formulating the highway lane-changing problem into a multi-policydecision making (MPDM) process. The authors modeled a high-level decision process into a setof policies that encode close-loop behavior and uses manual tuning to select the best policy forvehicle control. They modeled the high-level behaviors of all agents in the system and thus, theprinciple decision making process is robust in scenarios with extensive coupled interaction withinagents [58]. An extension of this work [1] was carried out in an indoor setting, where the numberand complexity of the candidate policy are much higher.

3.2. Worst case execution time analysis

There are three main methodologies to analyze WCET for real-time applications in order to observethe maximum amount of time an application is required to execute. The comparison metrics ondifferent WCET analysis methods is due to the restriction of non-availability of the source code,properties of hardware and prohibition of reverse engineering on the binary [59]. Below a shortreview of different WCET approaches are discussed. The key differences in estimating the WCETare also presented in this paper and the state-of-the-art method is highlighted.

3.2..1 Static WCET analysis

Static analysis methods have been around for a long time and they generate strong theoreticalresults by performing compile-time optimization of object code observed in [60] and [61]. Wilhelmet al. presented an abstract interpretation method besides static analysis to determine the boundson the execution time of the program [62]. These bounds represent timing constraints of a hardreal-time system. In their method, both cache and pipeline analysis results are integrated toprovide safe and tight bounds on execution time. However, the experimental outcomes are limitedto cache analysis and an extremely simple pipeline. Alur et al. proposed Timed Automata methodto model the timing constraints of a real-time system and require time bounds [63]. The timinganalysis method is similar to a finite-state machine where every edge corresponds to a matrix ofintervals restricting various delays. This method provides bounds in the form of upper and lowerbounds on the execution time for a finite state system. The main drawback of this method is thatverification of the analysis result to the industrial benchmark was not shown.

The timing analysis of assembler programs using symbolic simulation was observed in [64], [65]and [66]. However, there works abide to machine-independent level and with the major assumptionof a unit time-based system. Furthermore, optimal results were only obtained from a highlysimplified architecture due to the complexity of the method.

Few commercial static analysis tools are also available in the market such as aiT [67] andBound-T [68]. In a case study conducted by Andreas Ermedahl et al., they presented their view ofusing aiT WCET analysis tool from AbsInt GmbH [69] in the paper [70] and [71]. In their work,

20


analysis of time-critical code finds the upper time bounds for an embedded product. However,the primary goal of the study was not to obtain the accurate WCET estimation, but rather toinvestigate the practical difficulties in current state-of-the-art approaches.

Jan Gustafssonet al. developed an open source static analysis tool called the Swedish executiontime tool (SWEET) [72]. The toolset derives the timing bound for a program based on the flowfacts, automatically obtained from the analysis of the control flow graph. ARTIST2 language forWCET flow analysis (ALF) is an open source intermediate level language of source code designedas an input to the SWEET tool for flow analysis. However, the conversion of source code to ALFis challenging and error-prone.

Static WCET analysis guarantees safe and tight upper bounds by analyzing source and objectcode of the program without performing the actual execution of the task. However, commercialtools available in the market are highly expensive while other methods and open source tools are acomplex, error-prone and time-consuming endeavor. Consequently, the report from the industrialexperience of static WCET analysis has so far been rather limited [73].

3.2..2 Measurement-based WCET analysis

With a small variation from the traditional approach, various measurement-based methods aredeveloped to estimate the WCET of the software architecture [74]. The WCET estimation of aprogram based on the end-to-end measurement is observed in [75] and [76]. In this method, theControl flow graph (CFG) of the program is analyzed and a number of run-time measurements aretaken. From the number of executions of each basic block and measurement of the total executiontime, a set of linear equations are made and later solved using an Integer Linear programmingsolver (ILP). However, this method suffers from uncertainty due to the dynamic behavior of theprocessor. Williams [77] proposed a path testing method for WCET estimation by consideringlimited path coverage of the program using the PathCrawler tool [78]. The method automaticallygenerates test input data for all feasible paths in the source code. In this approach, the finaloutcomes suffer from a potential lack of scalability and hence, the analysis of the complete path ofthe whole program is unachievable in practice. Moreover, the number of feasible paths could riseup to infinity even for a small program. Another measurement-based approach called probabilisticmeasurement-based WCET analysis is implemented in [79]. The approach generates probabilitydistributions of the execution time of program segments and combines them to measure the overallWCET of the program. Wenzel et al. [80] clustered the program into small segments and consideredmeasurements only from feasible path coverage. The highest observed execution time of a segmentprovides the overall WCET of the program. However, this method is time expensive and theanalysis result is also restricted to programs without loops. A similar approach to generatessafe WCET estimation is implemented by Deverge et al. [81] where the loop constraints areconsidered. In this method, several observation points are placed in the form of time-stampat clustered boundaries, where the collection of measurement data on smaller segments of theprogram are made. However, in their work, several strong assumptions have been made such asthe measurement of the execution time of the same program path with different data input yieldsthe same results.

With these approaches, the WCET estimation of the program is performed on the targethardware and measures the actual execution times. These methods do not consider the cache stateof the processor and its effects on the result. Thus, the generated outcomes are always lower thanthe worst-case result. Although these methods may not be suitable for a hard real-time systemthey are sufficient for a soft real-time system.

3.2..3 Hybrid WCET analysis

Modern high-performance processors contain unpredictable and undocumented components thatinfluence the timing behavior of the program [82]. Thus, hybrid WCET analysis methods havebeen developed to address this problem. A very successful commercial tool named RapiTime fromRapita Systems [83] estimates the worst-case behavior of each software component using a hybridmethod. In a study conducted by Dreyer et al. [84], the measurement of data is made together witha static analysis to estimate the worst-case behavior of the program. In their work, a meaningfulestimation of the WCET of a modern processor under realistic conditions is briefly demonstrated.

21


Although, the final outcomes of this method are highly realistic the generalization of the methodwith multiple hardware platforms is compromised. Furthermore, this method does not address thechallenges in the systematic generation of trace data for the measurement.

The hybrid method of estimating WCET is a relatively new method in this domain. Thetiming constraint of the thesis work and the complexity of the analysis approach restricts theimplementation of such a method.

3.3. End-to-end timing analysis

The state-of-the-art timing analysis results, i.e., holistic response time analysis and end-to-enddelay (latency) analysis in an industrial tool suite is discussed in [85]. Rajeev et al. proposeda scalable model-checking based technique to compute the worst-case response time and end-to-end delay timing in an automotive system [86]. In their work, a formal model for creating asoftware architecture is generated and then analyzed. However, their work cannot proceed beforethe implementation phase as this method requires system level information. A framework [87] forthe computation of end-to-end delay for multi-rate, register-based automotive systems is presentedby Feiertag et al. [88]. The authors highlight that the ”maximum age of data” and ”first reaction”are two distinct ”meanings” of end-to-end delay analysis. They are involved with division of controlsystems and body electronics respectively. Becker et al. [89] presented a method for computingend-to-end delay analysis applied on four distinct system knowledge levels. The implementationof the holistic response time analysis (HRTA) and end-to-end delay analysis (E2EDA) as a plug-infor the Rubus-ICE tool suite is done by Mubeen et al. which can be observed in [90] and [91].

3.4. Discussion

An overview of a wide range of state-of-the-art methods and state-of-the-art practices are presented,which later helps in formulating the methodology to be followed in this thesis. The desired goal ofconducting such an extensive literature study is to limit the scope of this thesis.

In summary, the approach to solving the decision-making problem is in line with the systemsas proposed in [57], [58] and [1]. However, this work is distinguished from the other two previousmethods in the aspect of the use case scenario (e.g. at the construction site) where the workingenvironment is comprised of static obstacles, human as well as vehicles. Additionally, no predefinedtraffic rules are provided for such an environment. This contributes towards the novelty anduniqueness of this work.

The study into the WCET analysis of the software components is highly relevant for this thesiswork. Estimating the timing constraints of the software architecture is beneficial to ensure thefunctional safety and optimal allocation of the resources. Therefore, selection and adoption of asuitable WCET analysis approach are highly significant. Some of the key features highlighted inthe state-of-the-art papers that hold relevance are decisions regarding the implemented languageof the source code, dynamic behavior of the program and also on the target hardware platform.This thesis work is aligned with the measurement based WCET estimation as mentioned in [81].

An extensive literature survey in the domain of timing analysis of the real-time system is highlyvaluable. The outcomes from the timing analysis significantly influenced the precise temporal be-havior of an automotive system. Therefore, apart from the measurements of worst-case executiontime, computation of the end-to-end data propagation delay analysis is equally prioritized. Al-though this method implicitly performs holistic response time analysis yet, this thesis work is notinterested in finding the response time of individual software components. Alternately, this the-sis work is primarily focused on calculating worst-case age and reaction delay of the integratedsoftware architecture based on the paper by Mubeen et al. [85].

22


4. Research method

This chapter primarily addressed three major concepts. Firstly, the scientific research methodwhich is followed in this work is discussed. Secondly, a comparison with another suitable method-ology and advocacy for the selected method is described. Finally, an overview of how the thesiswork will be designed and solved with this research method is presented.

4.1. System development research method

Following a good research methodology for system development leads to the generation of fruitfulresults. Nunamaker et al. [92] proposed a framework which explains the development of a systemformulated as a research methodology. The authors presented an integrated multi-dimensionaland multi-methodological approach described in Figure 16. The integrated approach incorporates

Theory Building

Conceptual FrameworksMathematical Models Methods

Observation

Case StudiesSurvey StudiesField Studies

Experimentation

Computer simulationsField ExperimentsLab Experiments

System Development

PrototypingProduct DevelopmentTechnology transfer

Figure 16: A Flow Diagram of Multi-methodological Research Approach

four building blocks: Theory building, observation, experimentation, and system development.The system development block is the heart of this approach and other different blocks are alsointerconnected. The idea of this approach is to conduct the research in an iterative way of the form:Concept → Development → Impact, by allowing transition between one phase to another phase.Hence, the issues which are generated during any stage of the development process can furtherbe assessed and refined based on the results of the succeeding step. This methodology is mostlyapplicable to research in engineering and science where the initial formulation of research questionsare difficult. This research approach validates the system by conducting performance testing of thesystems under development. The main advantage of the system development approach is that thedeveloped system serves as both a proof of concept for the fundamental research and also becomesthe focus of an extended and continuing research.

4.2. Application of the research method

The application of the scientific method in this work is as follows (see Figure 17):

• Development of the software architecture: This is the first step of this thesis work. It defineswhat software components or which functions will later be required to be implemented.

• Timing Analysis: Evaluates whether timing constraints for the software architecture are metor not.

23


• Implementation: Software functionalities are implemented in this step.

• Simulation: Simulation tests are conducted to evaluate the performance of the implementedfunctions.

• Evaluation: Evaluates the system performance.

Development ofsoftware architecture Timing Analysis Implementation Simulation Evaluation

Figure 17: Process of System Development Approach

4.3. Discussion

In this thesis, the development of a functional decision-making system is of utmost importance. Thedevelopment of such a system has also considered several assumptions. Considering the aforemen-tioned limitations the thesis is more inclined towards this methodology, as the system developmentblock is the most crucial among others. With this approach, continuous improvement of the sys-tem development is possible due to the flexibility of the research method. This methodology isiterative, and therefore backtracking the system development is also applicable. This is necessaryas it is possible to address issues that may have been discovered during the process.

24


5. Proposed solution

This chapter discusses the employed method of modeling the decision-making module. It pro-vides an overview of the different fields of work required for creating such a system. Firstly, therequirements for developing the software system are considered. The software architecture andspecification of timing properties are discussed in detail. Then, the implementation and behaviorof each software component are described. The chapter ends with presenting the timing analysistechniques deployed on the designed system.

5.1. Specification of the decision-making system

To realize the objective of designing a tactical decision-making system, the characterization of thesystem is required to be clearly specified. The requirements contain the behaviors, attributes, andproperties of the system. The main purpose of the module is to evaluate tactical decisions tonavigate in dynamic, unstructured environments while considering high-level goals. For this aim,the evaluation of decisions are required to be made within a strict time-constraint to meet produc-tivity goals and ensure safety. Furthermore, the system should comprise of a set of well-definedrules of behavior to manage the interaction with other traffic participants in the shared space.This further ensures the decision-making system performs deterministic planning and operationswhich are essential for obtaining a more natural social interaction with other traffic participants.A summarized list of the required behavior and characteristics of the module are presented below.

Required characteristics of the system:

• Behave in a socially intelligent way in interaction with other humans in the shared space

• Determine tactical decisions from a set of well-defined rules to ensure unpredictable behavioris eliminated

• Value high productivity and, at the same time, consider the safety of other traffic participants

• Preform real-time decision

We deploy a systematic approach to guarantee all requirements are built into the solution. Oncethe system specifications are defined we analyze and extract existing software models to build asoftware architecture. Following the completion of the architecture, the properties and require-ments regarding timing are identified for each SWCs. This ensures all future deadlines will bemet. We then implement the behavior of each SWCs considering the timing requirements and thespecifications of the system. Lastly, the WCET for every task is calculated in order to validate thetiming requirements. This is achieved by performing end-to-end timing analysis on the entire sys-tem. An overview of the approach is presented in Figure 18. The individual modules are discussedin more depth in the following sections.

5.2. Software architecture

Modeling of software architecture using the principles of model- and component-based softwaredevelopment [93] [94], is the process of developing the software architecture with the help of softwarecomponents and their interactions. It provides a higher level of abstraction of software functions.Modeling of systems provide the benefit of capturing software functionality requirements, such asresource deployment, SWC behavior, reliability, timing, and safety [95]. The first step involvesanalyzing existing software designs and implementations to build an initial architecture model.Relevant SWCs to the architecture are then extracted. This includes the structure and sequenceof the software components and their communication via ports. Once the data and activationflow are identified the interfaces of individual SWCs, e.g. trigger ports, data ports, and interfaces,are analyzed to verify the consistency between them. In this work, we utilize the Rubus ICEdevelopment tool suite for modeling.

In the Rubus development process the end-to-end timing analysis is performed in an earlystage to verify the system meets real-time constraints before the actual implementation of the

25


System Specification

Software Architecture Modeling

Timing Properties & Requirements

SWC Behavior

WCET Estimation

End-To-End Timing Analysis

Figure 18: A Flow Diagram of the Approach Applied For System Design

software. Then synthesis is performed to generate code . However, the software architecture inthis work leverages different state-of-the-art algorithms and techniques to provide autonomousnavigation and decision-making. These components are complex and require partial manual effortin coding from design. We, therefore, reverse the process by performing Timing Analysis aftercode generation to verify the correct behavior of each SWCs. The advantage is that the completetiming information is available then, which is only possible at the implementation stage.

The defined task chain in the architecture requires to consider real-time requirements. Thismeans timing characteristics such as periods and deadlines are still considered before implementingthe SWCs. The architecture is depicted in Figure 19. The architecture hides the real structure ofthe Move Crowd SWC as it behaves differently depending on which policy is being evaluated bythe MPDM algorithm. The distinct structures of the SWCs are depicted in Figure 20.

Sim_Crowd

Walls

DIP Timestep

IT OT

Pedestrians

Vehicles

Crowd

Crowd

DIP

DIP

IT OT

DIP

DIP

DIP

Move_Crowd_ξForceValue DIP

MPDM

IT OT

Cost

DIP

IT OT

Reward

Optimal_PolicyOptimalPolicyDIP

IT OT

Policy

Env_Data DOP

Figure 19: Overview of Software Architecture

To provide the system with tactical decision reasoning, we model and test the multi-policy decision-making (MPDM) algorithm designed by Cunningham et al. [58]. The algorithm provides a multi-policy strategy for autonomous navigation, in which the behavior of the system is determined froma finite policy set. The algorithm simulates individual policies to evaluate their performance. Thealgorithm is discussed in more detailed in Section 5.4. and 6.3. We extend MDPM to navigatein dynamic, shared space, construction environments. In this context, spontaneous interactions

26


MovePedestrian

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

MoveVehicle

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

Move_Crowd_Navigate

DIP

DIP

IT OT

DOP

DIP

DIP

DOP

DOP

DOP

DIP

DIP

IT OT

DOP

DIP

DIP

VehicleForces

DOP

DOP

DOP

PedestrianForces

IT

DIP

DIP StopForceForce

MovePedestrian

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

MoveVehicle

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

Move_Crowd_Stop

StopVehicle

OT

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

DIP

DIP

IT OT

DOP

DIP

DIP

DOP

DOP

DOP

DIP

DIP

IT OT

DOP

DIP

DIP

VehicleForces

DOP

DOP

DOP

PedestrianForces

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

MovePedestrian

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

MoveVehicle

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

Objects

DIP

DIP

Goal

IT OT

Pedestrians

Vehicles

DIP

DIP

FollowingForce

FollowVehicle

DIP

DIP

IT OT

DOP

DIP

DIP

DOP

DOP

DOP

DIP

DIP

IT OT

DOP

DIP

DIP

VehicleForces

DOP

DOP

DOP

PedestrianForces

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

DIP

DIP

IT OT

DOP

DIP

DIP

DOP

DOP

DOP

PedestrianForces

VehicleForce

PedestrianForce

ObjectForce

DrivingForce

Move_Crowd_Follow

Figure 20: The different Move Crowd SWCs

between different traffic participants are designed according to group dynamics. Shared spacedmodeling is discussed further in Section 5.4. and 6.2. At this stage, the behavior of each SWCis still not implemented. Only a high-level structure is designed. The behavior of each SWC isexplained in the following sections.

5.3. Specification of timing properties and requirements

The real-time constraints of a system determine the time at which the system is required to respondand communicate with its environment. For a real-time system, the correct time of reaction isequally important as the logical correctness of the response. The reaction time of the systemdepends on the timing properties of the architecture. As can be observed in Figure 19 and 20, thesoftware architecture is complex in its design but simple in the activation of tasks. There is onlyone source that triggers the first task in the chain. Thus, the tasks have the same period and areactivated by their predecessor. The period is set to 300 ms based on the work of Mehta et al. [1].

The safe value of the age and reaction constraints in a trigger chain is equal to two times thevalue of the triggering clock period [85]. To support end-to-end timing analysis the WCET of eachSWCs need to be estimated. The WCET estimation is performed after the actual implementationof the software. Section 5.5. further discusses the applied method for calculating WCET estimates.

27


5.4. The behavior of software components

In order to extend the MPDM algorithm to navigate in shared space construction environments, thebehavior of different traffic users require to be simulated. These simulations are based on empiricaldata of numerous real scenarios from shared space schemes [96]. In these schemes, the movementof the users are not governed by traffic rules. Instead, interactions between the users are negotiatedby social rules. Considering there are no lanes in the scene, the users are enforced to dynamicallyadapt to each other’s behavioral changes. Behavioral models or behavioral probabilities for differentsituations are therefore developed for a larger population such as a group of individuals. In thiswork, the implementation of group dynamics is based on a multi-layer Social force model (SFM)approach.

The multi-layer SFM model describes the motion of pedestrians and vehicles in pedestrian-vehicular mixed traffic by resorting to a physical analogy. In SFM a road user’s decision formovement is influenced by various forces. The forces model the user’s desire to reach a certainlocation while keeping a certain distance from other traffic participants as depicted in Figure 21.The change of motion for participants also depends on the type of interaction between users. Avehicle’s behavior towards a pedestrian is different than from a pedestrian-pedestrian encounter.

Pedestrian i

Pedestrian j

Vehicle y

Obstacle / Wall

Goal

fij

fiy

fio

fg

Figure 21: Overview of the forces exerted on pedestrian i

These microscopic models are captured in the SWCs within the Move Crowd container of thesoftware model. With these models, we can simulate the movement of an entire crowd in theenvironment. Distinct policies that manage the behavior and movement of the decision system arethen created. These policies are partially based on how vehicles usually behave in a shared space.It is observed that vehicles in these environments also adapt to other vehicles heading towards thesame direction and consequently form a type of lane [96]. Vehicles also give priority to pedestriansto ensure safety. The decision system will give priority to all road users crossing its path and stopwhenever necessary. It will also navigate autonomously or follow and adapt to leading vehicleswhenever appropriate.

We formulate a MDP based on these specifications which is solved by the MPDM algorithm.The algorithm evaluates the performance of each policy by simulating the effect the correspondingmovement has on the crowd for a specific time horizon. Prior information of the current statusof the environment is gathered in the Crowd Sim SWC. This involves the placement of trafficparticipants. Simulation is then performed by propagating the movement of every road user for thespecified time horizon. The evaluation is dependent on the reward model defined in the MDP. TheSWC labeled Cost calculates the reward each policy produces which is sent to the Optimal PolicySWC. Once the optimal policy is determined the decision system acts according to the selectedpolicy.

This section aims to provide a high-level overview of the functionality of the SWCs. Section 6.presents a technical description of each module.

28


5.5. WCET estimation

As the availability of commercial tools for performing static WCET is rather limited a measurementbased approach for estimating WCET was deployed. The method generates test data troughsplitting paths into segments. This process is called program clustering. The process involvesplacing time-stamps, called observation points, at smaller parts of the program. This process iscontinued until the exhaustive path enumeration inside segments is tractable. The WCET is thenmeasured for each segment and considered in the overall WCET estimation. Consider the followingprogram fragment extracted from this work:

1floatD(SocialForce*simulation)

2vector<Pedestrian*>simulatedPedestrians=simulation->getPedestrianCrowd();

3Vector3fdistance_yq;

4floatdistance=0;

5for(Pedestrian*pedestrian_a:simulatedPedestrians)

6distance_yq=(pedestrian_a->getPosition()-vehiclePosAfterSim);

7if((distance_yq.lengthSquared()<(3.0*3.0)))

8distance=distance+exp(-(distance_yq.length()));

9

10

11return-distance;

12

B

A

Op2

Op6

Op9

Op12

Figure 22: Example code to demonstrate path clustering

In this example, cluster A has two paths and cluster B only has one single path. Performingmeasurements of segment A might for instance yield a value of 50 us and 225 us respectively forthe different paths. The larger value is the WCET for segment A. When estimating cluster B wemight obtain a WCET of 4030 us. This is not the WCET of the program since it could have takenthe shorter path of A. Segment A will also be called several times inside of B depending on thesize of the loop. The difference between the WCET of A and each value of A are added in theWCET of B to get the upper bound of the global WCET.

Consider the loop runs 4 times. The values of A yield 50 µs, 225 µs, 55 µs and 200 µs. If themeasurement of B yields 4030 µs then WCET of B is equal to 4030 + (225-50) + (225-225) +(225-55) + (225-200) = 175 +170+25 = 4400 µs. We deploy this iterative process for all programs.100000 samples are then collected in order to estimate the average WCET for each SWCs.

5.6. End-to-end timing analysis

Once WCET is calculated for every SWCs the end-to end timing Analysis is possible. These valuesare added to the software architecture created with the Rubus-ICE development tool. The toolthen calculates the theoretical worst-case end-to-end delays with all constraints formally analyzedand verified. Figure 23 depicts the software architecture modelled in Rubus-ICE. It is importantto note that the data ports for the different forces are not propagated to other SWCs since theyare the ”actuators” of the simulated traffic participants. With this last step, we verify the timingrequirements of the system are met.

29


Figure 23: Software Architecture Modelled in Rubus ICE

30


6. Implementation

The previous chapter briefly discusses the employed method for designing the decision-makingmodule and the approach for testing the timing behavior of the system. In this chapter, theindividual frameworks and algorithms employed to realize intelligent autonomous navigation in ashared space are presented in detail. The chapter begins by presenting the formulated MDP for theindustrial use case scenario. Then the MPDM algorithm and how it solves the formulated MDPis explained and presented. The applied process for modelling and simulating crowd behavior isdiscussed later. The chapter ends with a description of the simulation environment and the worst-case scenario created for evaluating the decision system’s ability to meet real-time constraints.

6.1. Autonomous driving in shared space formulated as a MDP

The formulated MDP can be summarized as containing the following components:

• A state space presenting the relevant physical properties of all dynamic agents, includingitself, observed in the environment.

• An action space, defining the acceleration of the autonomous agent in each time frame.

• A set of policies governing the force acted on the autonomous agent, and ultimately theacceleration of the autonomous agent in each time frame.

• A transition function determining how the agents change state in time, which is affectedby the force acted on them.

• A reward model determining the optimal policy based on the expected reward at a giventime horizon.

Each component will be discussed in the following sections.

6.1..1 States

The model in this work consists of static obstacles and moving road users assumed to be pedestriansand vehicles. The pedestrians and vehicles move freely in the shared space and interact accordingto social group dynamics. The state space is the observed state of the shared space at a givenpoint in time. It includes the physical properties of the autonomous agent and all the dynamicroad agents in the observed space. The state si ∈ Si for agent i including our autonomous vehicleconsists of measure of the position pi = (xi, yi) and velocity vi = (xi, yi) where

si = [pi, vi] (18)

The state space is the collective state s(t) ∈ S of all agents visible to our autonomous vehicle attime t. Let n be the number of agents including our autonomous vehicle then the state is the jointspace S = S1, S2...Sn.

6.1..2 Transition function

The transition function maps a current state si and an action ai to a new state. The transitionfunction is based on equations of motion. Vehicles and pedestrians are modelled as point masses toreduce complexity. As such acceleration, integrated over time, results in a velocity. The transitionfunction can be expressed as:

xt+1 = xt + xt∆t+ xt∆t2

2

xt+1 = xt + xt∆t

yt+1 = yt + yt∆t+ yt∆t2

2

yt+1 = yt + yt∆t

(19)

31


The transition function can be translated to the following linear system:

st+1 =

xt+1

xt+1

yt+1

yt+1

=

1 ∆t 0 00 1 0 00 0 1 ∆t0 0 0 1

xtxtytyt

+

∆t2

2 0∆t 0

0 ∆t2

20 ∆t

[xt(at)yt(at)

]= T (s, at), (20)

where (xi, yi) is the acceleration of the agent. The force, and therefore the acceleration, acted onthe agent is calculated utilizing a social potential force method. The method is explained in moredetail in Section 6.2.

6.1..3 Action space and policies

The autonomous agent can select between a set of policies that govern the action ai of the au-tonomous agent. The action ai ∈ Ai is the acceleration determined by policy ξi. A policy is anartificial force imposed by other agents and obstacles via the social potential force method thatcontrols individual agent’s motion. The policy set ξi ∈ ξ contains 3 distinct policies expressed as:

ξ = navigate, stop, follow (21)

The navigate policy navigates the autonomous agent in the environment while avoiding staticobstacles and other agents. The stop policy decelerates until a full stop is reached. The followpolicy follows a leader vehicle placed in front of its path that is heading in the same direction asthe agent. The individual policies are discussed in more detail in Section 6.2. and 6.2..3.

6.1..4 Reward model

The reward model captures the autonomous agent’s desire to maintain high productivity and at thesame time, consider the safety of other traffic participants. This is possible by defining a functionPG(S(ξ)) that calculates the agent’s progress towards its destination and a function DT (S(ξ))that gives a penalty for agents that are located within a certain distance from the agent.

The progress is the distance made towards the goal during the simulation horizon tH withpolicy ξi (see Section 6.3. for more about the simulation process). The more progress the agentmakes with a specific selected policy during the simulation the more rewards the selected policygets. The function is mathematically expressed as:

PG(S(ξ)) = (pv(tH , ξi)− pv(t)) · ev→gv , (22)

where pv(tH , ξ) is the position of the our autonomous agent v after simulation with policy ξi, pv(t)is the current position of the agent and ev→gv is the unit vector pointing from the current positionof v towards its goal.

The DT (S(ξ)) function gives an accumulated penalty for agents located close during eachtimestep t when simulating policy ξi. However, the agent should not get a penalty when it is closeto the leading vehicle when utilizing the follow policy. Mathematically this function is expressedas follows:

DT (S(ξ)) =

tH∑t′=t

−( n∑i=0,i6=v

e−(|pv−pi|))

|pv − pi| < C and ξi 6= follow

0 |pv − pi| > C

tH∑t′=t

−( n∑i=0,i6=v,l

e−(|pv−pi|))

|pv − pi| < C and ξi = follow

(23)

32


where the C is the specified minimum distance agents should have from our autonomous vehiclev for getting a penalty, n is the number of agents observed in the state space, and l is the foundleading vehicle.

The reward function is a linear combination of the both functions:

R(s, ξi) = PG(S(ξ)) + αDT (S(ξ)) (24)

where α is a weighting factor. The agent will ultimately select the policy ξi that accumulates thehighest reward.

6.2. Social force model for crowd dynamics

Autonomous navigation in a shared open space enforces the system to be capable of estimating thebehavior and movement of dynamic roads users. Recent research has observed that the collectivebehaviors of individuals in certain situations follow precise patterns [97]. Many models of thesepatterns have been suggested to describe the rules of pedestrian and vehicle behavior in a collectivecomposition. In particular, the SFM describing pedestrian crowd dynamics has gained a lot ofattention [98]. The SFM model has been extended in various ways to account for different scenariossuch as mixed traffic. In this work, a multi-layer SFM model is implemented to represent theoperation of both vehicles and pedestrians in shared space.

6.2..1 Pedestrian forces

The SFM model [97] considers the motion of a pedestrian to be the result of three main forces:attractive force (i) fattri that models a pedestrian’s desire to reach a certain destination at aparticular speed, repulsive force (ii) fobsi,o that reflects the pedestrian’s desire to keep a certain

distance from obstacles, and a interaction force (iii) f inti,j that captures the interaction between

pedestrian i and j. A fourth interaction force (iv) f inti,y that capture the interaction betweenpedestrian i and vehicle y is also added in the SFM model to allow for mixed traffic situations.

The concept of the SFM methods is inspired by the theory of potential-based fields. Thepotential field method is based on a simple principle i.e., the dynamic motion of traffic participantscan be described by a field of force. For this, the traffic participants are considered as an electrically-charged particle. The desired position will be an attractive pole for users. The obstacles in trafficwill be surrounded by a repulsive field in order to push away the road users. These two forcestogether with interactive forces are appended together to generate the complete social force in ashared space layout. The change of velocity vi of pedestrian i at a given point in time is then givenby the resultant equation when assuming a mass equal to the unit mass, m = 1 kg:

gi(x) =dvidt

= fattri +∑o

fobsi,o +∑j 6=i

f inti,j +∑y

f inti,y (25)

The separate forces are mathematically formulated below.

Attraction force

(i) fattri =dvidt

=v0i e

0i − vi(t)τ

, (26)

demonstrates how the actual velocity vi(t) of pedestrian i changes to the desired velocity v0i and

desired direction e0i within a certain relaxation time τ .

Obstacle force

(ii) fobsi,o = a e−dob , (27)

33


describes how the distance d perpendicular to the object influences the repulsion strength. Botha and b are constants.

Interaction forces

(iii) f inti,j = fθ(d, θ)tij + fv(d, θ)nij , (28)

where the equation composes of two distinct forces describing pedestrian i’s desire to mainly changedirection when encountered towards the side and decelerate faster when frontal encounters occur.

The component tij = Dij/||Dij || specifies the interaction direction and nij is the normalizedvector of tij oriented to the left. The interaction vector is equal to Dij = λ(vi − vj)eij , where eij ,is the the direction of pedestrian i from j, vi−vj , is the relative motion, and λ is a weighing factor.

The two forces fθ(d, θ) and fv(d, θ) are expressed with the following mathematical functions:

(iii a) fθ(d, θ) = −AK e−d/B−(nBθ)2 (29)

(iii b) fv(d, θ) = −A e−d/B−(n′Bθ)2 , (30)

where dij is the distance between the pedestrian i and j, n′ and n are constants for reflectingangular interaction, A is a constant modal parameter, B = γ||Dij || and θ is the angle between tijand eij . The factor K = θ/|θ| reflects the sign of θ to model the binary decision to avoid pedestrianj to the left or right.

(iv) f inti,y = A e(riy−diy)/Bniy, (31)

demonstrates that pedestrian i wants to keep a certain distance from vehicle y which is influencedby the distance between them. A decreasing distance yields a more repelling force exerted fromvehicle y onto pedestrian i to prevent close interactions. The components A and B are constantsreflecting the strength and reaction of the force, riy is the sum of their radii and niy is the normalizedvector pointing from vehicle y to pedestrian i.

6.2..2 Vehicle forces

For vehicles the microscopic models presented in this section is based on the work of research[99] [96] [100]. Similar to pedestrians the model consists of four distinct forces that govern thedynamics of the vehicle: attractive force (i) fattri that models a vehicle’s desire to reach a certaindestination at a particular speed, repulsive force (ii) fobsi,o that reflects the vehicle’s desire to keep a

certain distance from obstacles, and the interaction forces (iii) f inty,i and (vi) f inty,δ that captures thevehicle-pedestrian interaction and vehicle-vehicle interaction respectively. Summarizing the forcesyields the resultant equation that describes the change of velocity vy of vehicle y at a given pointin time:

fi(x) =dvydt

= fattry +∑o

fobsy,o +∑i

f inty,i +∑δ 6=y

f inty,δ (32)

Summarizing all forces

The attractive force and obstacle force are similar to the one applied for pedestrians in the originalSFM model. A fundamental feature of vehicles is the effective field of view for the driver as depictedin Figure 25. The effective field of view restricts the vehicle to directional changes and preventslateral motion. The interaction forces will therefore only be applied when a road user is located inthe forward field of view or the rear-view field of view of vehicle y. In this work, pedestrians aregiven priority when conflicts occur which is modelled in the (iii) force. The interaction forces aremathematically expressed as follow:

34


(iii) f inty,i =

dvydt =

(−v0ye0y)−vy(t)

τ − 50 ≤ ϕyδ ≤ 50

0 − 50 > ϕyδ > 50(33)

where the force applies a braking force when encountered with a pedestrian in the forward field ofview −50 ≤ ϕyδ ≤ 50. The angle ϕyδ is the angle between the vehicles, as depicted in Figure 24.In this way, the vehicle behavior is to avoid collisions with pedestrians. The model is similar tothe driving force expect a force is applied in the opposite direction of e0

y, which is the unit vectorin the direction of the vehicle’s velocity.

Vehicle y

Vehicle δ

φyδ

Moving Direction

Figure 24: Demonstrates the angle ϕyδ

Moving Direction

Forward VisionRearview vision

Vehicle y

Figure 25: Effective field of view for human drivers

(iv) f inty,δ =

A e(ryδ−dyδ)/Bnyδ − 50 ≤ ϕyδ ≤ 50 and ||δp(t+ ∆t)|| ≥ ||δp||fθ(d, θ)tyδ + fv(d, θ)nyδ − 50 ≤ ϕyδ ≤ 50 and ||δp(t+ ∆t)|| < ||δp||A e(ryδ−dyδ)/Bnyδ (180− 50) ≤ ϕyδ ≤ (180 + 50)

0 else

(34)

where the force applies a repelling force when encountered with vehicle δ moving away from vehicley (||δp(t+∆t)|| ≥ ||δp||) and when observed in the forward field of view. The vehicle will decelerateand adapt to vehicle in front. In case the vehicle is heading closer (opposite direction) a frontalcollision is about to happen. The vehicle must, therefore, compute directional changes to avoid acollision. A repelling force is also applied when encountered with vehicles observed in the rear-viewfield to capture the desire to avoid rear-end collisions. The vehicle accelerates when encounteringa vehicle near the back of the vehicle.

6.2..3 SFM based policies

The navigate policy moves the autonomous agent v according to equation 32:

35


fnavigatev (x) =dvydt

= fattry +∑o

fobsy,o +∑i

f inty,i +∑δ 6=y

f inty,δ (35)

Non-autonomous vehicles are assumed to always move according to the same navigation modelwhile pedestrians are simulated with the model 25. Unlike road users in the environment, theautonomous vehicle is the only capable of switching between policies for adapting to differentscenarios. The stop policy applies a braking force similar to the force described in equation 33:

fstopv (x) =dvydt

=(−vmaxe0

y)− vy(t)

τ(36)

The difference is the constant value vmax that captures the maximum deceleration force applicableto the autonomous vehicle.

The follow policy is a modification of the navigation policy. It appends an extra force (v)

ffollowy,δ=l to the model resulting in the following equation:

ffollowv (x) =dvydt

= fattry +∑o

fobsy,o +∑i

f inty,i +∑δ 6=y

f inty,δ + ffoly,δ=l (37)

The following force captures the vehicle-following feature observed in the work of Helbing et al.[99]. The feature is only applied when a vehicle in front is heading in the same direction as theautonomous agent. The autonomous agent adapts its velocity according to the leading vehicle infront to avoid a collision. When this force is applied the interaction force f inty,δ=l between vehicle yand leading vehicle δ = l is set temporary to zero. These features are captured with the followingequations:

ffoly,δ=l =(−voyey

τye

d(vyδ)−dyδB′yδ − ∆vyδ

τ ′ye

d(vyδ)−dyδB′′yδ Θ(∆vy)

)p (38)

and Θ(∆vy) =

1 ∆vy > 0

0 else(39)

where the braking force for adapting to the leading vehicle only should be applied when thefollowing vehicle has lager relative velocity ∆vy than the leading vehicle. The braking also dependson the dependence safe distance d(vyδ) = dy + Tyvy which is a function dependent on Ty, the safetime headway and dy, the minimum distance between the vehicles. τy and τ ′y are the accelerationtime and braking time respectively while B′yδ and B′′yδ is the range of acceleration interaction andbraking interaction respectively. To find if the vehicles are moving somewhat confluent p = 1 thefollowing equation must be true

p =

1 |ϕyδ| > 20 or |ϕyδ + ϕδy − 180| < 20

0 else(40)

When p = 0 the following force is not applied and the interaction force will seize management ofthe interaction between the vehicles.

6.3. Solving MDP with multipolicy decision-making

The decision-making system solves the MDP by simulating the overall effect each discrete policyhas on the environment during a time horizon tH . The system first obtains information aboutthe current state of the environment to prepare for simulation. Once data has been gathered themovement of the simulated agents is propagated according to the social forces specified in 6.2. forevery time step t. The algorithm then evaluates each policy according to equation 23 during thewhole time horizon by penalizing the agent for any close encounters made with road users at each

36


time step. After the simulation, the second evaluation is made according to equation 22 where theagent is evaluated based on the overall progression made towards its goal. The policy that obtainsmaximum rewards is ultimately selected. The agent then moves according to the selected policy.The algorithm is presented in the table below.

Algorithm 1 MPDM (S, tH , ξ,Nv, Np)

1: for ξ ∈ ξ do2: X = , R = 3: for t′ = t, t+ ∆t, ..., tH do4: av = fξv (S(t′))5: Vv(t

′ + ∆t) = T (Sv, av(t′)) \\ Propagate the autonomous vehicle

6: for i ∈ 1...v − 1, v + 1...Nv do \\ For number of vehicles7: ai = fi(S(t′))8: Vi(t

′ + ∆t) = T (Si, ai(t′)) \\ Propagate vehicles

9: end for10: delete a11: for i ∈ 1...Np do \\ For number of pedestrians12: ai = gi(S(t′))13: Pi(t

′ + ∆t) = T (Si+Nv , ai(t′)) \\ Propagate pedestrians

14: end for15: X = Vi(t′+∆t)...VNv (t′+∆t)+Pi(t

′+∆t)...PNp(t′+∆t) \\ The state of all agents16: Z = Z +DT (X) \\ Give penalty in every time step17: end for18: R.add((PG(X) + αZ))19: end for20: return ξ∗ = arg maxξ(R)

6.4. Simulation environment

The proposed model has been implemented in visual studio IDE and the application language iswritten in C++. The FREEGLUT library is integrated in order to create and manage windowssystem, initialize OpenGL context and handle input events through the keyboard. A mathematicallibrary for graphical programming is also integrated. A typical but challenging quarry scenario hasbeen created which further evaluates the presented approach as shown in Figure 26. The desiredspeeds of both traffic participants are Gaussian distributed with a mean value of approximately1.29 and a standard deviation of about 0.19 [97].

The simulation environment examines the behavior of the decision system in a scenario wherethe agent has to navigate in a populated constitution environment to reach diverse destinations.Several simulations with different constellations for the movement of dynamic were evaluated. Thevalue for step time ∆t was set to 0.1 ms and a horizon of tH = 1000 ms. The agent takes decisionsevery 300 ms based on the timer trigger for the MPDM algorithm. For the reward equation in 24we set α = 14. Any values under this make the agent favor progression over safety which is notdesired. We assume the position of the agents, the goals and obstacles are provided to the systembefore simulation.

6.5. WCET test scenario

The overall performance of the system is dependant on the number of obstacles and agents observedbefore simulation. A scenario capturing the worst-case situation is therefore implemented in orderto estimate the WCET of individual SWCs.

The worst-case scenario is based on the absolute maximum number of dynamic users and objectsthe vision system is likely to observe in a 500 m2 area. The distribution between different trafficparticipants and objects is 10 pedestrians, 10 static obstacles and 20 vehicles. In this way, thecalculated WCET is guaranteed to provide an upper bound for the execution time of the system.We simulate the scenario with these restrictions to estimate the WCET of independent tasks. Once

37


the WCET is collected for individual SWCs the WCET can be calculated for other desired scenariowithout the requirement to stimulate further. The structure of the software architecture capturesthe WCET for simulating a single agent in the SWCs Move V ehicle and Move Pedestrian de-picted in figure 20. These SWCs are in reality called once for every single detected pedestrian orvehicle in the environment to simulate their individual movement. This allows the WCET estima-tion for scenes with other distributions to be a matter of simply adding or subtracting the WCETfor individual agents.

Figure 26: A presentation of the simulated environment where the black dots represent pedestriansand the blue circles represent vehicles

38


7. Results

In this chapter a detailed discussion on the overall outcomes from the system based upon thepreviously discussed implementations is demonstrated. First, the reaction of multiple forces ondifferent traffic participants are analyzed. Following this, the selection of an optimal policy andhow it substantiates the effectiveness and robustness of the proposed approach is discussed. Thechapter ends with presenting the results of the WCET estimations and end-to-end analysis.

7.1. How multiple forces influence the motion of traffic participants

In order to evaluate the behavior of the implemented system a set of experiments have beenperformed in a controlled simulated environment. The final outcomes from the simulation evidencehow well the proposed algorithm performs while the traffic is in constant motion. Although wehave modelled the behavior for both pedestrians and vehicles respectively, we are mostly interestedto analyze the performance of the autonomous vehicle alone. In particular we analyze the followingscenarios:

• Motion of the autonomous vehicle with and without any obstacles.

• In response to static obstacles alone in the scenario.

• In response to interaction of multiple traffic participants.

The observation of associated forces on each traffic participants moving in a simulated quarry arepresented below.

7.1..1 Motion actions of traffic participants towards goal

In the experimental setup (see Figure 27), the green circle, representing the autonomous vehicle, ismoving towards its goal position without the intervention of any traffic obstacles. The calculatedpath is the shortest straight path towards the goal.

Figure 27: Shows the behavior of the agent while navigating towards the goal without disturbance

Another experiment conducted (see Figure 28) illustrates the change in the trajectory of theautonomous vehicle due to the presence of other users located on its path to the desired destination.

39


All dynamic traffic participants were randomly set to travel to their respective destination location.The change in maneuver results in a deviation of the vehicles velocity and direction which is latercorrected once no obstacles are observed in close proximity.

Figure 28: The autonomous agent adapts its velocity and direction towards the left as one of thevehicles are located in its path

7.1..2 Avoidance of obstacles

All traffic users always maintain a safe distance from the boundary as well as other static obstacleswithin a certain vicinity. As depicted in Figure 29 the autonomous vehicle takes the passingmaneuver action when it is in close proximity to the obstacle. The closer to the boundary or staticobstacle, the more the autonomous vehicle decelerates and aims to change the rate of turn. It isobserved from the simulation result that the velocity of the autonomous vehicle decreases rapidlyas the distance between the user and the closest vertex of the static object decreases. The overallresult of the force is a formation of a conflict free passing action.

Figure 29: Show how the autonomous vehicle avoids static obstacles detected on its path

40


7.1..3 Interaction force among traffic users

A constant traffic flow and avoidance of a stop-go behavior is observed between traffic users in theshared space. Figure 30 illustrates the mutual adjustment in the direction of various interactingtraffic users. It is observed that all the road users maintain a minimal safe distance from each otherwhile in motion. The interaction force is in effect when the traffic users are in close vicinity to eachother. However, the deceleration effect is strongest when they confront other traffic participantsfrom the front and reduced towards the side. For such an encounter, road users are bound to selecta side trajectory to pass the conflict. In this work, the resulting choice of passing maneuver ismoderately biased.

Figure 30: Show how agents interact with each other, the autonomous vehicle stops in capture 3and 4 when pedestrians move towards it

7.1..4 Selection of an optimal policy in a typical construction terrain

The decision system is observed to take tactical decisions in order to handle different scenarios.The interaction between the autonomous vehicle and other traffic participants depends upon theestimated behavior of other users. The autonomous vehicle selection of behavior can be representedwith the following scenarios and observations:

• Figure 31 shows the approaching direction of vehicle 1 in a traffic intersection. The vehicleis estimated to turn right hence not affect the trajectory of the autonomous vehicle. Theautonomous vehicle will in this situation continue to move towards its goal by choosing thenavigate policy.

• Figure 32 captures vehicle 1 moving straight towards the autonomous vehicle. Therefore,the appropriate behavior of the autonomous vehicle is to choose the stop policy in order toavoid conflict. The motion of the autonomous vehicle will resume back once vehicle 1 hasmoved away from the autonomous vehicle. For the autonomous vehicle the agent is observedto always stop when frontal encounters are estimated.

• Figure 33 demonstrates the agent’s action when vehicle 1 is preceding towards the samedestination as of the autonomous agent. As both vehicles have the same orientation and the

41


position of vehicle 1 is closer to the destination, the autonomous vehicle decides selection offollow policy in order to adapt its velocity to the leading vehicle. The agent can also followother vehicles that have the same direction but is moving in parallel as depicted in figure 34.

Figure 31: The vehicle in front is estimated to navigate to the right and as such the agent continueson its path

Figure 32: The agent stops when vehicles are heading towards it

Figure 33: The vehicle in front is heading in the same direction as a result the agent chooses tofollow the leading vehicle

42


Figure 34: The agent though their are parallel and not in a straight line still considers to observethe agent and follow it as they have the same direction and orientation

7.1..5 WCET estimation

In order to evaluate the real-time characteristics of the deployed system timing analysis was per-formed in Rubus-ICE with a total analysis time of 00 minutes and 07.098 seconds. Table 1 illus-trates the statics of the analysis tool. The measured WCET of each software component is listed inTable 3. In Rubus-ICE we provided the age and reaction constraints as 600 ms. As Table 2 depictsthe calculated age and reaction delay met their deadline hence, the employed system satisfies therequired timing constraints.

Table 1: Analysis Statics

Name ValueIterations 1

Number of mode combination 1

Table 2: Analysis Result

Constraint Maximum delay Calculated delay Mode combinationAgeNew 600ms 279909µs TBD

ReactionNew 600ms 579909µs TBD

Table 3: WCET Measurement for 10 pedestrians, 10 obstacles and 20 vehicles

Software Components WCETMPDM 0,0022 ms

Sim Crowd 0,000102 msMove Vehicle 0,012 msVehicle Force 0,1691 ms

Move Pedestrian 0,0033 msPedestrian Force 0.0855 msFollow Vehicle 0,00010 msFollow Forces 0,0064 msStop Vehicle 0.000028 ms

Cost 0,00117 msOptimal Policy 0.0000010 ms

43


8. Discussion

The previous chapter presents the results from the simulations. It visualizes overall performanceof the implemented system. This chapter analyses these results and also discusses implementationof the proposed problem.

8.1. Analysis of the results

Over the next subsections, the results from the previous chapter are discussed. The discussion isperformed in the same order as the results were presented.

8.1..1 Motion actions of traffic participants towards goal

The movement of traffic participants towards their respective goal is because of the implementeddriving force. However, with the purpose of achieving a collision-free driving maneuver in a mixedtraffic scenario consideration of the driving force alone for motion is not sufficient. For example,if a static obstacle lies between the current position of a user and its respective goal a passingmaneuver is necessary to avoid the conflict. For this purpose the implemented repulsive forceexerted by objects will keep a certain offset from the obstacle in close proximity.

At present, the road users while planning their path towards destination do not consider thelocation of any static obstacle. In reality, once a static obstacle is located a diverse route foravoiding a close encounter with the obstacle is planned. However, consideration of this principlemay differ with traffic participant due to a diverse field of sight. Therefore, future implementationof a global planner together with a mapping and localization algorithm overcome such problemsand certainly improves the calculation of the shortest path towards a goal.

The driving force is further dependent on knowing the desired goal of road users. In this work,the goal of agents is provided to the system. Such information is rarely known in beforehand as itrequires to know the momentary intention of the agent. Thus, to estimate the heading of users inthe shared space requires the decision system to be provided with data of how these users normallybehave when certain characteristics about their state are found. A classification and analysis ofpossible intentions based on real test data can consequently make the system more robust formotion prediction.

8.1..2 Obstacle avoidance

The traffic users constantly avoid physical contact with any static obstacles or boundaries throughrepulsive forces. The strength of this repulsive force is based upon the distance between theposition of traffic participant and the nearest point of the static obstacle. As the user movesclosure to an obstacle the repulsive force will increase consistently which simultaneously decreasesthe acceleration of motion. The concept of a repulsion force is in a close configuration to theworking principle of ultrasonic sensors. Therefore, the effect of this force is valid within certainenclosure distance of dynamic traffic participants.

As can be observed by the results the overall effect of the repulsion force is that agents getunaffected by the obstacles at far distances but as they move closer a large deceleration is performedto avoid a collision. To prevent such behavior, and provide for a smoother traffic flow, the system isrequired to consider the obstacle at earlier distances or beforehand for re-planning an intermediateroute. This is usually provided by path planning algorithms. In this work, we assume it isprovided to the system. However, the repulsion force provides a mechanism for avoiding staticobstacles where the environment is partially known to the planning algorithm and new informationis frequently being uncovered.

8.1..3 Interaction force among traffic users

In a model of mixed traffic schemes, social behavior and interaction within cars and pedestrians aretwo major factors to take into consideration. The implementation of interaction forces is one of theleading challenges in a shared space system. The accuracy of the system to predict their movement

44


lies in the credibility of the models describing these forces. Hence, the models are required to bevalidated and calibrated on empirical data.

The pedestrian interaction force is based on the work of Moussaid et al. [97], where agentsare observed to change direction mainly when a potential conflict is seen. Only during directencounters, the agents decelerate profoundly. The simulation captures this behavior. As of now,the system assumes pedestrians desire to move in constant convenient velocity. This is true for mostcases but sudden changes in behavior must still be accounted for to provide a completely reliableand adaptable system. The system further assumes the pedestrians desire to prevent collision withvehicles. The overall result observed is that pedestrians quickly accelerate to move away fromvehicles if they are too close.

Vehicles, on the other hand, decelerate once a pedestrian is found. They give priority to ensuresafe interaction. The interaction among vehicles is much harder to model in a space without trafficrules. It requires the drivers to act according to social rules such as eye contact to ensure noaccidents occur. This means the system is required to have a way to signal to other drivers its ownintention to make an agreement of how different situations should be handled.

The results show that the system stops and give way to all vehicles. During direct encounters,the autonomous vehicle stops and awaits the involved vehicle to move away before continuing tomove. When a vehicle is spotted behind and at close proximity, the autonomous vehicle acceleratesto prevent rear-end collisions. However, for vehicles placed in front, the autonomous vehicle adaptsto their speed. As of now, the system is not capable of overtaking any vehicles as it would requirean agreement with the vehicles in front before acting.

The behavior of vehicles in shared space is demanded to be more closely investigated to modelthese complex interaction [100]. Prevention of head-on encounters is solved by applying a similarforce as the pedestrian interaction force. Generally, lanes make sure vehicles heading in the oppositedirection are separated. However, in a shared space frontal encounters are possible as such thevehicles are required to change the angle of direction to prevent a collision. In reality, the change ofangle should be calculated at a far distance to keep a certain offset of vehicles. In narrowed spaces,the situation looks different and requires close cooperation between drivers to decide who shouldgive way to whom. Models that capture these behaviors are important to provide an accurateprediction of traffic participants in shared space.

8.1..4 Selection of an optimal policy in a typical construction terrain

The selection of policy is mainly dependent on the implemented cost function. There is no standardmetric or method for defining reward models [1]. As a designer, we are required to explicitly definewhat the agent should aspire. Evaluation of the cost functions is made by simulating the decisionsystem in several situations to ensure the appropriate behavior is achieved. The results clearlyshow this is the case.

The autonomous vehicle will stop from a certain distance whenever agents are moving towardsit. The distance is defined by the C parameter in equation 23. The parameter is set to C = 2.5as it was observed that a higher value made the agent be overly cautious and decelerating a lot inorder to keep a safe distance to other agents. The same holds true for value α > 15. The agent hasa very low progression due to penalty gained from being close to other agents. However, for anyvalue of α < 12 has the opposite effect. The agent mainly favors progression and the avoidanceof collision will mainly only be handled by the navigation force, which allows considerable closerencounters. A balanced behavior is observed when α = 14, the system will then maintain highprogression without the need to stop too frequently while still obtaining a safe distance to users.

The desire for high progression motivates the autonomous vehicle to always select the followpolicy whenever a vehicle is in front of it and heading towards the same destination. However,the leading vehicle and the autonomous vehicle are not required to form a straight line for thefollow policy to be selected. Furthermore, the autonomous vehicle will not change position to forma line whenever the policy is selected. This is not required in an open space environment and itmight further frighten other drivers if employed. Instead, the autonomous vehicle will observe theleading vehicle from a distance and follow as long as they are close enough and heading in thesame direction.

45


8.1..5 WCET estimation and end-to-end timing analysis

The results presented clearly provide evidence that the system is capable of performing real-timedecision as the age and reaction constraints are met. The WCET for individual components arerelatively small but highly dependent on the number of agents observed in the space. From theresults, it is evident that the system is scalable when it comes to adding more policies to the system.However, to further validate the reliability of the decision-making module, the module needs tobe tested on a real system. In such a context the tasks will be required to communicate withother systems to perform all expected operations for navigation. Including collecting and mappingdata regarding the state of the environment with the various vision- and localization algorithms,timing, and monitoring of the current state of the agent from other control systems, and acquiringinformation about intermediate destinations from planning algorithms for obstacle avoidance. Inthis multi-rate real-time system, it is especially important to calculate the end-to-end delays in thedata chain to predict the complete timing behavior of independent tasks. We deploy this practicefor the proposed system to demonstrate the analysis process and verify the current employment.

8.2. Simulation environment and parameters

The simulations deployed were created to represent a typical construction scenario that requiresthe system to take tactical decision to avoid a collision while still considering high-level goals. Theimplementation of the simulation required to determine a lot of parameters, such as the parame-ters for the various social forces and the constant values for the reward model. The accuracy ofthe system is dependent on the choice of parameters. These values are required to be calibratedaccording to real empirical data of shared schemes. Most values are therefore set according tothe work of [97], [100], [99] and, [96]. However, the desired velocity of the vehicles in the simu-lation was set to a profoundly lower value since constructions vehicles operate at lower speeds ascompared with regular cars. This made some interaction parameters calibrated for regular cars tobe too high. These values, together with the parameters for the reward model, were determinedby experimentally performing trial and error until appropriate behavior was observed. Typicallythese values need to properly tuned and calibrated based on real empirical experiments to drawdefinite conclusions from the result. As such further investigation to validate the parameters setis needed to be performed in future work.

46


9. Conclusion and future work

This chapter first summaries the results and discussions of the previous chapter together withexplaining the proposed thesis objectives. At the end of this chapter, a few suggestions on howthe implemented method could be further improved is discussed.

9.1. Formulating a tactical decision making system as a MDP framework

In this thesis work, we have implemented a joint behavior generation and trajectory planningalgorithm for maneuvering in dynamic environments. The solution integrates the strengths ofSFM and MDP for modelling the complex environment and defining various methods for tacklingdifficult situations, including estimating the movement of road users. Estimation is based on theconcept that individuals act according to stochastic social forces (e.g, attraction, repulsion andinteraction) in certain situations. Once data is gathered about the position and velocity of trafficparticipants the system is able to evaluate possible future behaviors. A decision on how to handlepotential conflicts is then selected from a set of allowed behavior (discrete policies) imposed on thesystem.

We defined a key uncertainty regarding the interaction between the autonomous vehicle andother traffic participants, which is later demonstrated in our model. In particular, the frameworkexplains complex interaction among different traffic participants (vehicles and pedestrians) in ashared space compared to traditional street layouts. Furthermore, the proposed model is capableof solving multiple conflict strategies i.e., pedestrians interactions, vehicles interactions as well aspedestrian-vehicle interaction. The MDP prototype is implemented in a simulated platform capableof modelling the behavior of traffic users, collision detection with resolution and visualization.

9.2. Solving a MDP prototype by selecting an optimal policy for thesystem

One of the objectives of this thesis work is to select an optimal policy in a dynamic traffic scenario.As shown in the simulation results (see Section 7.), the automated driving system obtains a feasibleand safe driving maneuver based on the selected policy. In the proposed solution, the reaction ofthe autonomous vehicle alters with different traffic participants, i.e., the autonomous vehicle willalways select the stop policy when it is close to pedestrians. On the other hand, choosing apolicy for other vehicles in the traffic differ for a variety of situations arising in social dynamicenvironments. The performance behavior of such a system is not only reliable and efficient butalso make the pedestrians feel safe. Switching between a proposed set of policies is realized by autility function. The autonomous vehicle will get a reward for progress made to the goal, and shallbe punished and given a penalty for creating inconvenience to the other traffic participants.

9.3. Effect of system load on real-time characteristics

A model checking method is a widely used industrial technique for formal verification of embeddedsystems. The autonomous vehicle must take decisions for navigation in real-time to ensure safetyand maintain its performance. For that, the real-time characteristics (WCET) of each softwarecomponent are measured and later evaluated. The timing analysis results show that all deadlinesof the deployed software architecture are met.

9.4. Future work

With reference to existing solutions, which were however intended for simplified traffic conditions(rule-based traffic) and participants (either vehicle or pedestrian); application of the proposedsystem is the basis for further new research in this domain. The application offers a variety ofbenefits related to the problem specification, decision flexibility, and resource optimization.

However, the framework can further be improved by testing the algorithm with realistic dynamicmodels of vehicles and pedestrians in the traffic. Following this, future work will focus on refiningthe vehicle model of varying sizes and types together with different velocities for pedestrians andvehicles.

47


The simulated solution for navigation is implemented as a unified model which obviates therequirement of any other external module to perform the tasks of mapping, localization, obstacledetection, and planning. However, while validating the performance of the predictive navigationexperimentally these external modules are rather important. Future work would include conductingperformance testing of the methods on an autonomous vehicle and shall abundantly demonstratethe generality of the approach by applying it to other scenarios.

The reliability of a decision-making system strongly depends upon tracking the unknown inten-tions of other traffic participants and filtering through noise present in any sensor data. This thesiswork did not include these uncertainty constraints. Formulating the solution in POMDP frame-work together with the integration of an advanced belief state update algorithm would certainlyimprove the performance of the estimator (see Section 2.2..2).

The primary objective of the proposed algorithm is to make a prototype for a decision-makingsystem navigating in a cluttered dynamic traffic scenario. Therefore, to carry out the task on areal platform, a thorough optimization of the algorithm is of utmost importance. In addition tothis, future work may seek validation of real-time constraints on WCET estimations incorporatinga standardized industrial benchmark method.

Another interesting topic of future work for VCE would be to incorporate explicit and im-plicit communication between the autonomous vehicle and the control room. This facilitates thegathering of multiple traffic data before the construction work starts. Implementation of such acommunication mechanism may be complex but it will certainly improve the overall performanceof the system.

48


References

[1] D. Mehta, G. Ferrer, and E. Olson, “Autonomous navigation in dynamic social environ-ments using multi-policy decision making,” in Intelligent Robots and Systems (IROS), 2016IEEE/RSJ International Conference on. IEEE, 2016, pp. 1190–1197.

[2] T. Okuyama, T. Gonsalves, and J. Upadhay, “Autonomous driving system based on deepq learnig,” in 2018 International Conference on Intelligent Autonomous Systems (ICoIAS),March 2018, pp. 201–205.

[3] J. Rowley, A. Liu, S. Sandry, J. Gross, M. Salvador, C. Anton, and C. Fleming, “Examiningthe driverless future: An analysis of human-caused vehicle accidents and development of anautonomous vehicle communication testbed,” in 2018 Systems and Information EngineeringDesign Symposium (SIEDS), April 2018, pp. 58–63.

[4] X. Chen and Y. Miao, “Driving decision-making analysis of car-following for autonomousvehicle under complex urban environment,” in 2016 9th International Symposium on Com-putational Intelligence and Design (ISCID), vol. 1, Dec 2016, pp. 315–319.

[5] I. Naskoudakis and K. Petroutsatou, “A thematic review of main researches on constructionequipment over the recent years,” Procedia Engineering, vol. 164, pp. 206 – 213,2016, selected papers from Creative Construction Conference 2016. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S1877705816339522

[6] X. Su, J. Pan, and M. Grinter, “Improving construction equipment operation safety from ahuman-centered perspective,” Procedia Engineering, vol. 118, pp. 290 – 295, 2015, definingthe future of sustainability and resilience in design, engineering and construction. [Online].Available: http://www.sciencedirect.com/science/article/pii/S1877705815020846

[7] D. Schmidt and K. Berns, “Construction site navigation for the autonomous excavator thor,”in 2015 6th International Conference on Automation, Robotics and Applications (ICARA),Feb 2015, pp. 90–97.

[8] W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision-making for autonomousvehicles,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp.187–210, 2018. [Online]. Available: https://doi.org/10.1146/annurev-control-060117-105157

[9] C. Hubmann, M. Becker, D. Althoff, D. Lenz, and C. Stiller, “Decision making for au-tonomous driving considering interaction and uncertain prediction of surrounding vehicles,”in 2017 IEEE Intelligent Vehicles Symposium (IV), June 2017, pp. 1671–1678.

[10] B. Wu and Y. Feng, “Policy reuse for learning and planning in partially observable markovdecision processes,” in 2017 4th International Conference on Information Science and ControlEngineering (ICISCE), July 2017, pp. 549–552.

[11] S. Chauvin, “Hierarchical decision-making for autonomous driving,” in ESR Labs AG, August2018.

[12] K. H. Wray, S. J. Witwicki, and S. Zilberstein, “Online decision-making forscalable autonomous systems,” in Proceedings of the Twenty-Sixth International JointConference on Artificial Intelligence, IJCAI-17, 2017, pp. 4768–4774. [Online]. Available:https://doi.org/10.24963/ijcai.2017/664

[13] J. Wei, J. M. Snider, T. Gu, J. M. Dolan, and B. Litkouhi, “A behavioral planning frameworkfor autonomous driving,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings, June2014, pp. 458–464.

[14] S. Zhou, Y. Wang, M. Zheng, and M. Tomizuka, “A hierarchical planning andcontrol framework for structured highway driving,” IFAC-PapersOnLine, vol. 50,no. 1, pp. 9101 – 9107, 2017, 20th IFAC World Congress. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S2405896317323121

49

http://www.sciencedirect.com/science/article/pii/S1877705816339522


https://doi.org/10.1146/annurev-control-060117-105157

https://doi.org/10.24963/ijcai.2017/664



[15] L. P. K. Barry, Jennifer and T. Lozano-Perez, “Ohierarchical solution of large markovdecision processes,” in ICAPS-10 Workshop on Planning and Scheduling Under Uncertainty,IJCAI-17, 2010, pp. 12–16. [Online]. Available: http://hdl.handle.net/1721.1/61387

[16] X. Hu, L. Chen, B. Tang, D. Cao, and H. He, “Dynamic path planning for autonomousdriving on various roads with avoidance of static and moving obstacles,” MechanicalSystems and Signal Processing, vol. 100, pp. 482 – 500, 2018. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0888327017303825

[17] “Taxonomy and definitions for terms related to driving automation systems for on-roadmotor vehicles,” SAE International, 2018, stand. J3016, SAE Intl., Warrendale, P. [Online].Available: https://doi.org/10.4271/J3016 201806

[18] R. Evertsz, J. Thangarajah, N. Yadav, and T. Ly, “A framework for modelling tacticaldecision-making in autonomous systems,” Journal of Systems and Software, vol. 110, pp.222 – 238, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0164121215001892

[19] M. Czubenko, Z. Kowalczuk, and A. Ordys, “Autonomous driver based on an intelligentsystem of decision-making,” Cognitive computation, vol. 7, no. 5, pp. 569–581, 2015.

[20] S. J. Russell and P. Norvig, “Artificial intelligence,” in A Modern Approach, 3rd ed. PearsonEducation, Inc.,, 2010.

[21] J. Pinto and A. Fern, “Learning partial policies to speedup mdp tree search via reductionto i.i.d. learning,” Journal of Machine Learning Research, vol. 18, no. 65, pp. 1–35, 2017.[Online]. Available: http://jmlr.org/papers/v18/15-251.html

[22] L. Sha, T. Abdelzaher, K.-E. Arzen, A. Cervin, T. Baker, A. Burns, G. Buttazzo, M. Cac-camo, J. Lehoczky, and A. K. Mok, “Real time scheduling theory: A historical perspective,”Real-time systems, vol. 28, no. 2-3, pp. 101–155, 2004.

[23] M. Joseph and P. Pandya, “Finding response times in a real-time system,” The ComputerJournal, vol. 29, no. 5, pp. 390–395, 1986.

[24] “Rubus concepts, methods and tools.” [Online]. Available: https://www.arcticus-systems.com/

[25] “Volvo construction equipment.” [Online]. Available: http://www.volvoce.com

[26] “Bae systems, hagglunds.” [Online]. Available: http://www.baesystems.com/hagglunds

[27] K. Hanninen, J. Maki-Turja, M. Nolin, M. Lindberg, J. Lundback, and K.-L. Lundback, “Therubus component model for resource constrained real-time systems,” in 2008 InternationalSymposium on Industrial Embedded Systems. IEEE, 2008, pp. 177–183.

[28] S. Mubeen, J. Maki-Turja, and M. Sjodin, “Communications-oriented development ofcomponent-based vehicular distributed real-time embedded systems,” Journal of SystemsArchitecture, vol. 60, no. 2, pp. 207–220, 2014.

[29] T. Gu and J. M. Dolan, “On-road motion planning for autonomous vehicles,” in InternationalConference on Intelligent Robotics and Applications. Springer, 2012, pp. 588–597.

[30] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoid-ance,” in Robotics research. Springer, 2011, pp. 3–19.

[31] W. Xu, J. Wei, J. M. Dolan, H. Zhao, and H. Zha, “A real-time motion planner withtrajectory optimization for autonomous vehicles,” in Robotics and Automation (ICRA), 2012IEEE International Conference on. IEEE, 2012, pp. 2061–2067.

[32] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. P. How, “Real-time motionplanning with applications to autonomous urban driving,” IEEE Transactions on ControlSystems Technology, vol. 17, no. 5, pp. 1105–1118, 2009.

50

http://hdl.handle.net/1721.1/61387


https://doi.org/10.4271/J3016_201806



http://jmlr.org/papers/v18/15-251.html

https://www.arcticus-systems.com/

https://www.arcticus-systems.com/

http://www.volvoce.com

http://www.baesystems.com/hagglunds


[33] J. Tumova, G. C. Hall, S. Karaman, E. Frazzoli, and D. Rus, “Least-violating control strategysynthesis with safety rules,” in Proceedings of the 16th international conference on Hybridsystems: computation and control. ACM, 2013, pp. 1–10.

[34] C.-I. Vasile, J. Tumova, S. Karaman, C. Belta, and D. Rus, “Minimum-violation scltl mo-tion planning for mobility-on-demand,” in Robotics and Automation (ICRA), 2017 IEEEInternational Conference on. IEEE, 2017, pp. 1481–1488.

[35] S. Brechtel, T. Gindele, and R. Dillmann, “Probabilistic decision-making under uncertaintyfor autonomous driving using continuous pomdps,” in Intelligent Transportation Systems(ITSC), 2014 IEEE 17th International Conference on. IEEE, 2014, pp. 392–399.

[36] “Darpa (defense advanced research projects agency) darpa urban challenge, u.s. departmentof defense.” 2007. [Online]. Available: http://archive.darpa.mil/grandchallenge/(2007).

[37] J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller, T. Dang, U. Franke,N. Appenrodt, C. G. Keller et al., “Making bertha drive-an autonomous journey on a historicroute.” IEEE Intell. Transport. Syst. Mag., vol. 6, no. 2, pp. 8–20, 2014.

[38] M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Ettinger, D. Haehnel,T. Hilden, G. Hoffmann, B. Huhnke et al., “Junior: The stanford entry in the urban chal-lenge,” in The DARPA Urban Challenge. Springer, 2009, pp. 91–123.

[39] N. J. Goodall, “Ethical decision making during automated vehicle crashes,” TransportationResearch Record, vol. 2424, no. 1, pp. 58–65, 2014.

[40] D. Ferguson, C. Baker, M. Likhachev, and J. Dolan, “A reasoning framework for autonomousurban driving,” in Intelligent Vehicles Symposium, 2008 IEEE. IEEE, 2008, pp. 775–780.

[41] M. Kapadia and N. I. Badler, “Navigation and steering for autonomous virtual humans,”Wiley Interdisciplinary Reviews: Cognitive Science, vol. 4, no. 3, pp. 263–272, 2013.

[42] D. Langer, J. Rosenblatt, and M. Hebert, “A behavior-based system for off-road navigation,”IEEE Transactions on Robotics and Automation, vol. 10, no. 6, pp. 776–783, 1994.

[43] E. Tunstel Jr, T. Lippincott, and M. Jamshidi, “Behavior hierarchy for autonomous mobilerobots: Fuzzy-behavior modulation and evolution,” Intelligent Automation & Soft Comput-ing, vol. 3, no. 1, pp. 37–49, 1997.

[44] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, “Off-road obstacle avoidance throughend-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739–746.

[45] C. Guo, K. Kidono, and M. Ogawa, “Learning-based trajectory generation for intelligentvehicles in urban environment,” in 2016 IEEE Intelligent Vehicles Symposium (IV), June2016, pp. 1236–1241.

[46] F. Havlak and M. Campbell, “Discrete and continuous, probabilistic anticipation for au-tonomous robots in urban environments,” IEEE Transactions on Robotics, vol. 30, no. 2, pp.461–474, 2014.

[47] Q. Tran and J. Firl, “Modelling of traffic situations at urban intersections with probabilisticnon-parametric regression,” in Intelligent Vehicles Symposium (IV), 2013 IEEE. IEEE,2013, pp. 334–339.

[48] S.-H. Lee and S.-W. Seo, “A learning-based framework for handling dilemmas in urban auto-mated driving,” in Robotics and Automation (ICRA), 2017 IEEE International Conferenceon. IEEE, 2017, pp. 1436–1442.

[49] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller, “Automated driving in un-certain environments: Planning with interaction and uncertain maneuver prediction,” IEEETransactions on Intelligent Vehicles, vol. 3, no. 1, pp. 5–17, 2018.

51

http://archive.darpa.mil/grandchallenge/ (2007).


[50] F. Damerow and J. Eggert, “Risk-aversive behavior planning under multiple situations withuncertainty,” in Intelligent Transportation Systems (ITSC), 2015 IEEE 18th InternationalConference on. IEEE, 2015, pp. 656–663.

[51] W. Zhan, C. Liu, C.-Y. Chan, and M. Tomizuka, “A non-conservatively defensive strategyfor urban autonomous driving,” in Intelligent Transportation Systems (ITSC), 2016 IEEE19th International Conference on. IEEE, 2016, pp. 459–464.

[52] S. Thrun, “Monte carlo pomdps,” in Advances in neural information processing systems,2000, pp. 1064–1070.

[53] H. Kurniawati, D. Hsu, and W. S. Lee, “Sarsop: Efficient point-based pomdp planning byapproximating optimally reachable belief spaces.” in Robotics: Science and systems, vol.2008. Zurich, Switzerland., 2008.

[54] A. Furda and L. Vlacic, “Enabling safe autonomous driving in real-world city traffic usingmultiple criteria decision making,” IEEE Intelligent Transportation Systems Magazine, vol. 3,no. 1, pp. 4–17, 2011.

[55] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation in dense humancrowds: Statistical models and experimental studies of human–robot cooperation,” The In-ternational Journal of Robotics Research, vol. 34, no. 3, pp. 335–356, 2015.

[56] J. Wei, J. M. Dolan, and B. Litkouhi, “Autonomous vehicle social behavior for highwayentrance ramp management,” in Intelligent Vehicles Symposium (IV), 2013 IEEE. IEEE,2013, pp. 201–207.

[57] C. Guo, K. Kidono, R. Terashima, and Y. Kojima, “Toward human-like behavior generationin urban environment based on markov decision process with hybrid potential maps,” in 2018IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018, pp. 2209–2215.

[58] A. G. Cunningham, E. Galceran, R. M. Eustice, and E. Olson, “Mpdm: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving,” in Robotics and Au-tomation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 1670–1677.

[59] A. Marref and A. Betts, “Accurate measurement-based wcet analysis in the ab-sence of source and binary code,” in 2011 14th IEEE International Symposium onObject/Component/Service-Oriented Real-Time Distributed Computing. IEEE, 2011, pp.127–135.

[60] G. A. Kildall, “A unified approach to global program optimization,” in Proceedings of the1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages.ACM, 1973, pp. 194–206.

[61] J. B. Kam and J. D. Ullman, “Monotone data flow analysis frameworks,” Acta Informatica,vol. 7, no. 3, pp. 305–317, 1977.

[62] R. Wilhelm and B. Wachter, “Abstract interpretation with applications to timing validation,”in International Conference on Computer Aided Verification. Springer, 2008, pp. 22–36.

[63] R. Alur and D. L. Dill, “A theory of timed automata,” Theoretical computer science, vol.126, no. 2, pp. 183–235, 1994.

[64] T. Schule and K. Schneider, “Exact runtime analysis using automata-based symbolic simu-lation,” in First ACM and IEEE International Conference on Formal Methods and Modelsfor Co-Design, 2003. MEMOCODE’03. Proceedings. IEEE, 2003, pp. 153–162.

[65] T. Schuele and K. Schneider, “Abstraction of assembler programs for symbolic worst caseexecution time analysis,” in Proceedings. 41st Design Automation Conference, 2004. IEEE,2004, pp. 107–112.

52


[66] G. Logothetis and K. Schneider, “Exact high level wcet analysis of synchronous programs bysymbolic state space exploration,” in Proceedings of the conference on Design, Automationand Test in Europe-Volume 1. IEEE Computer Society, 2003, p. 10196.

[67] C. Ferdinand and R. Heckmann, “ait: Worst-case execution time prediction by static programanalysis,” in Building the Information Society. Springer, 2004, pp. 377–383.

[68] “Bound-t tool homepage,” 2005. [Online]. Available: http://www.bound-t.com

[69] “Bound-t tool homepage,” 1998. [Online]. Available: https://www.absint.com/

[70] S. Byhlin, A. Ermedahl, J. Gustafsson, and B. Lisper, “Applying static wcet analysis toautomotive communication software,” in 17th Euromicro Conference on Real-Time Systems(ECRTS’05). IEEE, 2005, pp. 249–258.

[71] D. Sehlberg, A. Ermedahl, J. Gustafsson, B. Lisper, and S. Wiegratz, “Static wcet analysis ofreal-time task-oriented code in vehicle control systems,” in Second International Symposiumon Leveraging Applications of Formal Methods, Verification and Validation (isola 2006).IEEE, 2006, pp. 212–219.

[72] J. Gustafsson, A. Ermedahl, B. Lisper, C. Sandberg, and L. Kallberg, “Alf-a language forwcet flow analysis,” in 9th International Workshop on Worst-Case Execution Time Analysis(WCET’09). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2009.

[73] A. Ermedahl, J. Gustafsson, and B. Lisper, “Experiences from industrial wcet analy-sis case studies,” in 5th International Workshop on Worst-Case Execution Time Analysis(WCET’05). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2007.

[74] S. Stattelmann, “Precise measurement-based worst-case execution time estimation,” Master’sthesis, 2009.

[75] H. Pohlheim, J. Wegener, and H. Sthamer, “Testing the temporal behavior of real-timeengine con-trol software modules using extended evolutionary algo-rithms,” 2000.

[76] U. Khan and I. Bate, “Wcet analysis of modern processors using multi-criteria optimisation,”in 2009 1st International Symposium on Search Based Software Engineering. IEEE, 2009,pp. 103–112.

[77] N. Williams, “Wcet measurement using modified path testing,” in 5th International Work-shop on Worst-Case Execution Time Analysis (WCET’05). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2007.

[78] N. Williams, B. Marre, P. Mouy, and M. Roger, “Pathcrawler: Automatic generation ofpath tests by combining static and dynamic analysis,” in Dependable Computing - EDCC5, M. Dal Cin, M. Kaaniche, and A. Pataricza, Eds. Berlin, Heidelberg: Springer BerlinHeidelberg, 2005, pp. 281–292.

[79] N. M. Bernat Guillem, Burns Alan, “Probabilistic timing analysis: An approach using cop-ulas,” Journal of Embedded Computing, vol. 1, no. 2, pp. 179–194, 2005.

[80] I. Wenzel, R. Kirner, B. Rieder, and P. Puschner, “Measurement-based timing analysis,” inInternational Symposium on Leveraging Applications of Formal Methods, Verification andValidation. Springer, 2008, pp. 430–444.

[81] J.-F. Deverge and I. Puaut, “Safe measurement-based wcet estimation,” in 5th InternationalWorkshop on Worst-Case Execution Time Analysis (WCET’05). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2007.

[82] A. Betts, N. Merriam, and G. Bernat, “Hybrid measurement-based wcet analysis at the sourcelevel using object-level traces,” in 10th International Workshop on Worst-Case ExecutionTime Analysis (WCET 2010). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2010.

53

http://www.bound-t.com

https://www.absint.com/


[83] “Bound-t tool homepage,” 2004. [Online]. Available: https://www.rapitasystems.com/

[84] B. Dreyer, C. Hochberger, A. Lange, S. Wegener, and A. Weiss, “Continuous non-intrusivehybrid wcet estimation using waypoint graphs,” in 16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer In-formatik, 2016.

[85] S. Mubeen, J. Maki-Turja, and M. Sjodin, “Support for end-to-end response-time and delayanalysis in the industrial tool suite: Implementation issues, experiences and a case study,”Computer Science and Information Systems, vol. 10, no. 1, pp. 453–482, 2013.

[86] A. Rajeev, S. Mohalik, M. G. Dixit, D. B. Chokshi, and S. Ramesh, “Schedulability andend-to-end latency in distributed ecu networks: formal modeling and precise estimation,” inProceedings of the tenth ACM international conference on Embedded software. ACM, 2010,pp. 129–138.

[87] T. Methodology, “Ver. 2,” TIMMO (TIMing MOdel), Deliverable, vol. 7, 2009.

[88] N. Feiertag, K. Richter, J. Nordlander, and J. Jonsson, “A compositional framework forend-to-end path delay calculation of automotive systems under different path semantics,”in IEEE Real-Time Systems Symposium: 30/11/2009-03/12/2009. IEEE CommunicationsSociety, 2009.

[89] M. Becker, D. Dasari, S. Mubeen, M. Behnam, and T. Nolte, “Analyzing end-to-end delays inautomotive systems at various levels of timing information,” ACM SIGBED Review, vol. 14,no. 4, pp. 8–13, 2018.

[90] S. Mubeen, J. Maki-Turja, and M. Sjodin, “Implementation of holistic response-time analysisin rubus-ice,” 2011.

[91] S. Mubeen, J. Maki-Turja, and M. Sjodin, “Implementation of end-to-end latency analysis forcomponent-based multi-rate real-time systems in rubus-ice,” in 2012 9th IEEE InternationalWorkshop on Factory Communication Systems. IEEE, 2012, pp. 165–168.

[92] J. F. Nunamaker Jr, M. Chen, and T. D. Purdin, “Systems development in informationsystems research,” Journal of management information systems, vol. 7, no. 3, pp. 89–106,1990.

[93] I. Crnkovic and M. Larsson, Building Reliable Component-Based Software Systems. Nor-wood, MA, USA: Artech House, Inc., 2002.

[94] T. A. Henzinger and J. Sifakis, “The Embedded Systems Design Challenge,” in 14th Inter-national Symposium on Formal Methods, 2006.

[95] P. Iyenghar, A. Noyer, J. Engelhardt, and E. Pulvermueller, “Translating timing requirementsof embedded software systems modeled in simulink to a timing analysis model,” in 2016 IEEE21st International Conference on Emerging Technologies and Factory Automation (ETFA),Sep. 2016, pp. 1–4.

[96] B. Anvari, M. Bell, A. Sivakumar, and W. Y. Ochieng, “Modelling shared space users via rule-based social force model,” Transportation Research Part C: Emerging Technologies, vol. 51,02 2015.

[97] M. Moussad, D. Helbing, S. Garnier, A. Johansson, M. Combe, and G. Theraulaz, “Moussad,m., helbing, d., garnier, s., johansson, a., combe, m. theraulaz, g. 2009. experimental studyof the behavioural mechanisms underlying self-organization in human crowds. proceedingsof the royal society of london series b-biological sciences, 276: 27552762.” Proceedings of theRoyal Society B: Biological Sciences, vol. 276, p. 27552762, 01 2009.

[98] F. Farina, D. Fontanelli, A. Garulli, A. Giannitrapani, and D. Prattichizzo, “Walking ahead:The headed social force model,” PLOS ONE, vol. 12, p. e0169734, 01 2017.

54

https://www.rapitasystems.com/


[99] D. Helbing and B. Tilch, “Generalized force model of traffic dynamics,” Physical review. E,Statistical physics, plasmas, fluids, and related interdisciplinary topics, vol. 58, 06 1998.

[100] N. Rinke, C. Schiermeyer, F. Pascucci, V. Berkhahn, and B. Friedrich, “A multi-layer so-cial force approach to model interactions in shared spaces using collision prediction,” Toappear in: Transportation Research Procedia, Proceedings of the 14th World Conference onTransport Research (WCTR) (2016), vol. 25, 01 2016.

55

DECISION-MAKING FOR AUTONOMOUS CONSTRUCTION...

Documents

Transcript of DECISION-MAKING FOR AUTONOMOUS CONSTRUCTION...