Software Architecture Wang hunan [email protected].
-
Upload
sabrina-carter -
Category
Documents
-
view
225 -
download
0
Transcript of Software Architecture Wang hunan [email protected].
6: Case Study: Air Traffic Control High Availability architecture solution
Topics
The background of ATCRelationship to the Architecture
Business Cycle Requirements and Qualities Architectural Solution Summary the Quality Goals of the
ATC SystemDiscussion
1. The background of ATC
Air traffic control (ATC) is a service provided by ground-based controllers who direct aircraft on the ground and in the air.
The primary purpose of ATC systems worldwide is to separate aircraft to prevent collisions, to organize and expedite the flow of traffic, and to provide information and other support for pilots when able
ATC system demo
The picture about ATC
1.1 ATC Training system
From Procedure control to Radar control
1.2 Radar system for ATC
Primary RadarSecondary Radar
1.3 The procedure of ATC
Tower Control
Ground Control
Route Control
1.4 En route centers in USA
1.5 En route centers in China
华北ZBAA
西南ZUUU
中南ZGGG
华东ZSSS
西北ZLXY
东北ZYTX
1.6 Advanced Automation System
Advanced Automation System (AAS)
The constituent of AAS
Initial Sector Suite System
(ISSS)
radio systems
flight plan databases
Initial Sector Suite System
Initial Sector Suite System (ISSS), which was intended to be an upgraded hardware and software system for the 22 en route centers in the United States.
1.7 sector suite
Illustrate the concept of Sector suite
Sector suite
13
4 5
2
ATC console
13
4 5
2
S1
S2
S3
Whole display
2. Relationship to the Architecture Business Cycle
3. Requirements and Qualities (1)1. Ultrahigh availability, meaning that the
system is absolutely prohibited from being inoperative for longer than very short periods ( less than 5 minutes a year )
2. High performance, meaning that the system has to be able to process large numbers of aircraft—as many as 2,440—without "losing" any of them.
3.1 High Availability System Classes Goal: Build Class 6 Systems
Availability
90.%
99.%
99.9%
99.99%
99.999%
99.9999%
99.99999%
System Type
Unmanaged
Managed
Well Managed
Fault Tolerant
High-Availability
Very-High-Availability
Ultra-Availability
Unavailable(min/year)
50,000
5,000
500
50
5
.5
.05
AvailabilityClass
1234567
3.2 Requirements and Qualities (2)
3. Openness, meaning that the system has to be able to incorporate commercially developed software components,
4. The ability to field subsets of the system
5. The ability to make modifications to the functionality and handle upgrades in hardware and software
6. The ability to operate with and interface to a bewildering set of external systems
3.3 The scale of ISSS
1. ISSS is designed to support up to 210 consoles per en route center.
2. ISSS requirements call for a center to control from 400 to 2,440 aircraft tracks simultaneously.
3. There may be 16 to 40 radars to support a single facility.
4. A center may have from 60 to 90 control positions
5. The code to implement ISSS contains about 1 million lines of Ada
3.4 The function of ISSS(1)
1. Acquire radar target reports that are stored in an existing ATC system called the Host Computer System
2. Convert the radar reports for display and broadcast them to all of the consoles
3. Handle conflict or other data transmitted by the host computer
4. Interface to the Host for input and retrieval of flight plans
3.4.1 The function of ISSS(2)
5. Provide extensive monitoring and control information
6. Provide a recording capability for later playback.
7. Provide graphical user interface facilities, such as windowing, on the consoles.
8. Provide reduced backup capability in the event of failure of the Host, the primary communications network, or the primary radar sensors.
4. Architectural Solution
Just as an architecture affects behavior, performance, fault tolerance, and maintainability, so it is shaped by stringent requirements in any of these areas
4.1 Reviews of general software structures
Common software architecture structures
Module
Decom-position
Class
Uses
Layered
Component-and-connector
Client/Server
Shared Data
Concurrence
Process
Allocation
Work Assi- gnment
Implementation
Deployment
4.2 Physical view(1)
ISSS is a distributed system, consisting of a number of elements connected by local area networks :
1. The Host Computer System is the heart of the en route automation system. Used to process data
2. Common consoles are the air traffic controller's workstations. They provide displays of aircraft position information
4.2.1 Physical view(2)
3. The common consoles are connected to the Host computers by means of the Local Communications Network (LCN).
The LCN is composed of four parallel token ring networks for redundancy. One network supports the broadcast(1) of surveillance data to all processors. One network is used for point-to-point communications (2) between pairs of processors; one provides a channel for display data to be sent from the common consoles to recording units(3) for later playback; and one is a spare(4).
4.2.2 Physical view(3)
4. The Enhanced Direct Access Radar Channel (EDARC) provides a backup display of aircraft position
5. The Backup Communications Network (BCN) is an Ethernet network using TCP/IP protocols. It is used as a backup network in some LCN failure conditions
6. Monitor-and-Control (M&C) consoles give system maintenance personnel an overview of the state of the system and allow them to control its operation
4.2.3 Physical view(4)
7. The Test and Training subsystem provides the capability to test new hardware and software and to train users without interfering with the ATC mission
8. The central processors are mainframe-class processors that provide the data recording and playback functions for the system in an early version of ISSS.
Tactics:
Redundancy
Spare
Data DisplayBackup
4.3 Module Decomposition View
The module elements of the ISSS operational software are called Computer Software Configuration Items (CSCIs), defined in the government software development standard
ISSS System
Display Management
Common System Services
National Airspace System Modification
Recording, Analysis, and Playback
The IBM AIX operating system
4.4 Process View
ISSS is constructed to operate on a plurality of processors. Processors are logically combined to form a processor group, the purpose of which is to host separate copies of one or more applications. This concept is critical to fault tolerance and (therefore) availability.
4.4.1 Operational Unit and Functional Group
The different application copies are referred to as primary address space (PAS) or standby address space (SAS). The collection of one primary address space and its attendant standby address spaces is called an operational unit
simply run independently on different processors. These are called functional groups
4.4.2 Differ of Operating Unit and functional Group
Applications interact in a client-server fashion. In summary, an application may be either an
operating unit or a functional group. The two differ in whether the application's functionality is backed up by one or more secondary copies, which keep up with the state and data of the primary copy and wait to take over in case the primary copy fails.
Process View
Tactics:
OU
4.4.4 How does the SAS take over the PAS
1. A SAS is promoted to the new PAS.2. The new PAS reconstitutes with the clients of
that operational unit by sending them a message.
3. A new SAS is started to replace the previous PAS.
4. The newly started SAS announces itself to the new PAS, which starts sending it messages as appropriate to keep it up to date.
4.4.3 Add a new operational unit(1)
1. Identify the necessary input data and where it resides.
2. Identify which operational units require output data from the new operational unit.
3. Fit this operational unit's communication patterns into a systemwide acyclic graph.
4. Design the messages to achieve the required data flows.
5. Identify internal state data that must be used for check pointing and the state data that must be included in the update communication from PAS to SAS.
4.4.3 Add a new operational unit(2)
6. Partition the state data into messages that fit well on the networks.
7. Define the necessary message types.8. Plan for switchover in case of failure: Plan updates to
ensure complete state.9. Ensure consistent data in case of switchover.10. Ensure that individual processing steps are
completed in less time than a system "heartbeat."11. Plan data-sharing and data-locking protocols with
other operational units
4.5 Client-Server View
The clients and servers were carefully designed to have consistent interfaces. This was facilitated by using simple message-passing protocols for interaction
4.6 Code View( Component View)
A code view shows how functionality is mapped to code units
In ISSS, an Ada (main) program is created from one or more source files; it typically comprises a number of subprograms, some of which are gathered into separately compilable packages
An Ada program may contain one or more tasks, which are Ada entities capable of executing concurrently with each other.
4.6.1 Code View(1)
Ada 1 Ada 2
File 1 File 2
Ada 3
Ada 1 Ada n
AIX Process1
AIX Processn
File Map
Process Map
4.7 Layered View
Underlying the operation of the ATC application programs on the ISSS processors system is a commercial UNIX operating system, AIX. However, UNIX does not provide all the services necessary to support a fault-tolerant distributed system such as ISSS. Therefore, additional system services software was added
分层视图
Extensions In AIX kernel's address space
C Program
Ada Program
ApplicationL/G SMMM
4.7 Fault Tolerance View
Fault tolerance to an important role in the design of the system :
1. structure describes how faults are detected and isolated and how the system recovers.
2. PAS/SAS scheme traps and recovers from errors that are confined within a single application
3. the fault-tolerant hierarchy is designed to trap and recover from errors that are the result of cross-application interaction
4.8.1 Various levels of fault detection
The ISSS fault-tolerant hierarchy provides various levels of fault detection and recovery. Each level asynchronously :
1. Detects errors in self, peers, and lower levels.
2. Handles exceptions from lower levels.
3. Diagnoses, recovers, reports, or raises exceptions.
4.8.2 Various levels of system
1. Physical (network, processor, and I/O devices)
2. Operating system3. Runtime environment4. Application5. Local availability6. Group availability7. Global availability8. System monitor and control
Tactics :
Redundancy
Heartbeat
Ping/Echo
Physical layer
Runtime
OS
Application
LAM
GAM
M&C
4.8.2 Fault recovery
The type of recovery used depends on the current operational, as follows:
1. In a switchover, the SAS takes over almost immediately from its PAS.
2. A warm restart uses checkpoint data (written to nonvolatile memory).
3. A cold restart uses default data and loses state history.
4. A cutover is used to transition to new (or old) logic or adaptation data.
4.9 Adaptation Data
ISSS makes extensive use of the modifiability tactic of "configuration files," which it calls adaptation data.
Site-specific adaptation data tailors the ISSS system across the 22 en route centers in which it was planned to be deployed
data represents an elegant and crucial shortcut to modifying the system in the face of site-specific requirements
4.9 The negative side of adaptation data
1. Presents a complicated mechanism to maintainers.
2. Complicated interactions may occur between various pieces of adaptation data, which could affect correctness
3. Finally, adaptation data significantly increases the state space within which the operational software must correctly perform, and this has broad implications for system testing.
4.10 Code template for application
The implementation plan for these copies called for both to come from true copies of the same source code.
The structure is a continuous loop that services incoming events
Summary of Code structure template
Loop
1. Receive and process normal events (PAS)
2. Update status information (PAS)
3. Status and Data receive (SAS)
4. SAS take over PAS
5. Finished the original requirement
End loop
5. Quality Goals of the ATC System(1)Goal How Achieved Tactic(s) Used
High Availability
Hardware redundancy; software redundancy (layered fault detection and recovery)
State resynchronization; shadowing; active redundancy; removal from service; limit exposure; ping/echo; heartbeat; exception; spare
High Performance
Distributed multiprocessors; front-end schedulability analysis, and network modeling
Introduce concurrency
Openness Interface wrapping and layering
Abstract common services; maintain interface stability
5.1 Quality Goals of the ATC System(2)
Goal How Achieved Tactic(s) Used
Modifiability Templates and table-driven adaptation data; careful assignment of module responsibilities; strict use of specified interfaces
Abstract common services; semantic coherence; maintain interface stability; anticipate expected changes; generalize the module; component replacement; adherence to defined protocols; configuration files
Ability to Field Subsets
Appropriate separation of concerns
Abstract common services
Interoperability
Client-server division of functionality and message-based communications
Adherence to defined protocols; maintain interface stability
Discussion
在 ISSS 的构架设计中,采取了哪些实现高可用性的战术?它对性能质量属性有何影响?
下课时提交讨论结果