Software Architecture Wang hunan [email protected].

Software Architecture

Wang hunan

[email protected]

6: Case Study: Air Traffic Control High Availability architecture solution

Topics

The background of ATCRelationship to the Architecture

Business Cycle Requirements and Qualities Architectural Solution Summary the Quality Goals of the

ATC SystemDiscussion

1. The background of ATC

Air traffic control (ATC) is a service provided by ground-based controllers who direct aircraft on the ground and in the air.

The primary purpose of ATC systems worldwide is to separate aircraft to prevent collisions, to organize and expedite the flow of traffic, and to provide information and other support for pilots when able

ATC system demo

The picture about ATC

1.1 ATC Training system

From Procedure control to Radar control

1.2 Radar system for ATC

Primary RadarSecondary Radar

1.3 The procedure of ATC

Tower Control

Ground Control

Route Control

1.4 En route centers in USA

1.5 En route centers in China

华北ZBAA

西南ZUUU

中南ZGGG

华东ZSSS

西北ZLXY

东北ZYTX

1.6 Advanced Automation System

Advanced Automation System (AAS)

The constituent of AAS

Initial Sector Suite System

(ISSS)

radio systems

flight plan databases

Initial Sector Suite System

Initial Sector Suite System (ISSS), which was intended to be an upgraded hardware and software system for the 22 en route centers in the United States.

1.7 sector suite

Illustrate the concept of Sector suite

Sector suite

13

4 5

2

ATC console

13

4 5

2

S1

S2

S3

Whole display

2. Relationship to the Architecture Business Cycle

3. Requirements and Qualities (1)1. Ultrahigh availability, meaning that the

system is absolutely prohibited from being inoperative for longer than very short periods （ less than 5 minutes a year ）

2. High performance, meaning that the system has to be able to process large numbers of aircraft—as many as 2,440—without "losing" any of them.

3.1 High Availability System Classes Goal: Build Class 6 Systems

Availability

90.%

99.%

99.9%

99.99%

99.999%

99.9999%

99.99999%

System Type

Unmanaged

Managed

Well Managed

Fault Tolerant

High-Availability

Very-High-Availability

Ultra-Availability

Unavailable(min/year)

50,000

5,000

500

50

5

.5

.05

AvailabilityClass

1234567

3.2 Requirements and Qualities (2)

3. Openness, meaning that the system has to be able to incorporate commercially developed software components,

4. The ability to field subsets of the system

5. The ability to make modifications to the functionality and handle upgrades in hardware and software

6. The ability to operate with and interface to a bewildering set of external systems

3.3 The scale of ISSS

1. ISSS is designed to support up to 210 consoles per en route center.

2. ISSS requirements call for a center to control from 400 to 2,440 aircraft tracks simultaneously.

3. There may be 16 to 40 radars to support a single facility.

4. A center may have from 60 to 90 control positions

5. The code to implement ISSS contains about 1 million lines of Ada

3.4 The function of ISSS(1)

1. Acquire radar target reports that are stored in an existing ATC system called the Host Computer System

2. Convert the radar reports for display and broadcast them to all of the consoles

3. Handle conflict or other data transmitted by the host computer

4. Interface to the Host for input and retrieval of flight plans

3.4.1 The function of ISSS(2)

5. Provide extensive monitoring and control information

6. Provide a recording capability for later playback.

7. Provide graphical user interface facilities, such as windowing, on the consoles.

8. Provide reduced backup capability in the event of failure of the Host, the primary communications network, or the primary radar sensors.

4. Architectural Solution

Just as an architecture affects behavior, performance, fault tolerance, and maintainability, so it is shaped by stringent requirements in any of these areas

4.1 Reviews of general software structures

Common software architecture structures

Module

Decom-position

Class

Uses

Layered

Component-and-connector

Client/Server

Shared Data

Concurrence

Process

Allocation

Work Assi- gnment

Implementation

Deployment

4.2 Physical view(1)

ISSS is a distributed system, consisting of a number of elements connected by local area networks ：

1. The Host Computer System is the heart of the en route automation system. Used to process data

2. Common consoles are the air traffic controller's workstations. They provide displays of aircraft position information

4.2.1 Physical view(2)

3. The common consoles are connected to the Host computers by means of the Local Communications Network (LCN).

The LCN is composed of four parallel token ring networks for redundancy. One network supports the broadcast(1) of surveillance data to all processors. One network is used for point-to-point communications (2) between pairs of processors; one provides a channel for display data to be sent from the common consoles to recording units(3) for later playback; and one is a spare(4).


4. The Enhanced Direct Access Radar Channel (EDARC) provides a backup display of aircraft position

5. The Backup Communications Network (BCN) is an Ethernet network using TCP/IP protocols. It is used as a backup network in some LCN failure conditions

6. Monitor-and-Control (M&C) consoles give system maintenance personnel an overview of the state of the system and allow them to control its operation


7. The Test and Training subsystem provides the capability to test new hardware and software and to train users without interfering with the ATC mission

8. The central processors are mainframe-class processors that provide the data recording and playback functions for the system in an early version of ISSS.

Tactics：

Redundancy

Spare

Data DisplayBackup

4.3 Module Decomposition View

The module elements of the ISSS operational software are called Computer Software Configuration Items (CSCIs), defined in the government software development standard

ISSS System

Display Management

Common System Services

National Airspace System Modification

Recording, Analysis, and Playback

The IBM AIX operating system

4.4 Process View

ISSS is constructed to operate on a plurality of processors. Processors are logically combined to form a processor group, the purpose of which is to host separate copies of one or more applications. This concept is critical to fault tolerance and (therefore) availability.

4.4.1 Operational Unit and Functional Group

The different application copies are referred to as primary address space (PAS) or standby address space (SAS). The collection of one primary address space and its attendant standby address spaces is called an operational unit

simply run independently on different processors. These are called functional groups

4.4.2 Differ of Operating Unit and functional Group

Applications interact in a client-server fashion. In summary, an application may be either an

operating unit or a functional group. The two differ in whether the application's functionality is backed up by one or more secondary copies, which keep up with the state and data of the primary copy and wait to take over in case the primary copy fails.

Process View

Tactics:

OU

4.4.4 How does the SAS take over the PAS

1. A SAS is promoted to the new PAS.2. The new PAS reconstitutes with the clients of

that operational unit by sending them a message.

3. A new SAS is started to replace the previous PAS.

4. The newly started SAS announces itself to the new PAS, which starts sending it messages as appropriate to keep it up to date.

4.4.3 Add a new operational unit(1)

1. Identify the necessary input data and where it resides.

2. Identify which operational units require output data from the new operational unit.

3. Fit this operational unit's communication patterns into a systemwide acyclic graph.

4. Design the messages to achieve the required data flows.

5. Identify internal state data that must be used for check pointing and the state data that must be included in the update communication from PAS to SAS.

4.4.3 Add a new operational unit(2)

6. Partition the state data into messages that fit well on the networks.

7. Define the necessary message types.8. Plan for switchover in case of failure: Plan updates to

ensure complete state.9. Ensure consistent data in case of switchover.10. Ensure that individual processing steps are

completed in less time than a system "heartbeat."11. Plan data-sharing and data-locking protocols with

other operational units

4.5 Client-Server View

The clients and servers were carefully designed to have consistent interfaces. This was facilitated by using simple message-passing protocols for interaction

4.6 Code View( Component View)

A code view shows how functionality is mapped to code units

In ISSS, an Ada (main) program is created from one or more source files; it typically comprises a number of subprograms, some of which are gathered into separately compilable packages

An Ada program may contain one or more tasks, which are Ada entities capable of executing concurrently with each other.

4.6.1 Code View(1)

Ada 1 Ada 2

File 1 File 2

Ada 3

Ada 1 Ada n

AIX Process1

AIX Processn

File Map

Process Map

4.7 Layered View

Underlying the operation of the ATC application programs on the ISSS processors system is a commercial UNIX operating system, AIX. However, UNIX does not provide all the services necessary to support a fault-tolerant distributed system such as ISSS. Therefore, additional system services software was added

分层视图

Extensions In AIX kernel's address space

C Program

Ada Program

ApplicationL/G SMMM

4.7 Fault Tolerance View

Fault tolerance to an important role in the design of the system ：

1. structure describes how faults are detected and isolated and how the system recovers.

2. PAS/SAS scheme traps and recovers from errors that are confined within a single application

3. the fault-tolerant hierarchy is designed to trap and recover from errors that are the result of cross-application interaction

4.8.1 Various levels of fault detection

The ISSS fault-tolerant hierarchy provides various levels of fault detection and recovery. Each level asynchronously ：

1. Detects errors in self, peers, and lower levels.

2. Handles exceptions from lower levels.

3. Diagnoses, recovers, reports, or raises exceptions.

4.8.2 Various levels of system

1. Physical (network, processor, and I/O devices)

2. Operating system3. Runtime environment4. Application5. Local availability6. Group availability7. Global availability8. System monitor and control

Tactics ：

Redundancy

Heartbeat

Ping/Echo

Physical layer

Runtime

OS

Application

LAM

GAM

M&C

4.8.2 Fault recovery

The type of recovery used depends on the current operational, as follows:

1. In a switchover, the SAS takes over almost immediately from its PAS.

2. A warm restart uses checkpoint data (written to nonvolatile memory).

3. A cold restart uses default data and loses state history.

4. A cutover is used to transition to new (or old) logic or adaptation data.

4.9 Adaptation Data

ISSS makes extensive use of the modifiability tactic of "configuration files," which it calls adaptation data.

Site-specific adaptation data tailors the ISSS system across the 22 en route centers in which it was planned to be deployed

data represents an elegant and crucial shortcut to modifying the system in the face of site-specific requirements

4.9 The negative side of adaptation data

1. Presents a complicated mechanism to maintainers.

2. Complicated interactions may occur between various pieces of adaptation data, which could affect correctness

3. Finally, adaptation data significantly increases the state space within which the operational software must correctly perform, and this has broad implications for system testing.

4.10 Code template for application

The implementation plan for these copies called for both to come from true copies of the same source code.

The structure is a continuous loop that services incoming events

Summary of Code structure template

Loop

1. Receive and process normal events (PAS)

2. Update status information (PAS)

3. Status and Data receive (SAS)

4. SAS take over PAS

5. Finished the original requirement

End loop

5. Quality Goals of the ATC System(1)Goal How Achieved Tactic(s) Used

High Availability

Hardware redundancy; software redundancy (layered fault detection and recovery)

State resynchronization; shadowing; active redundancy; removal from service; limit exposure; ping/echo; heartbeat; exception; spare

High Performance

Distributed multiprocessors; front-end schedulability analysis, and network modeling

Introduce concurrency

Openness Interface wrapping and layering

Abstract common services; maintain interface stability

5.1 Quality Goals of the ATC System(2)

Goal How Achieved Tactic(s) Used

Modifiability Templates and table-driven adaptation data; careful assignment of module responsibilities; strict use of specified interfaces

Abstract common services; semantic coherence; maintain interface stability; anticipate expected changes; generalize the module; component replacement; adherence to defined protocols; configuration files

Ability to Field Subsets

Appropriate separation of concerns

Abstract common services

Interoperability

Client-server division of functionality and message-based communications

Adherence to defined protocols; maintain interface stability

Discussion

在 ISSS 的构架设计中，采取了哪些实现高可用性的战术？它对性能质量属性有何影响？

下课时提交讨论结果

Software Architecture Wang hunan [email protected].

Documents

Transcript of Software Architecture Wang hunan [email protected].