1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas...

37
1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http:// www.infomall.org

Transcript of 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas...

Page 1: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

11

Comments onSoftware Systems

HATC Corporation, BeijingDecember 6 2005

Geoffrey Fox

CTO Anabas Corporation andComputer Science, Informatics, Physics

Pervasive Technology LaboratoriesIndiana University Bloomington IN 47401

[email protected]://www.infomall.org

Page 2: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

22

Design, analysis, and management of a BIG software project I

General Principles• Quality control in software development • documentation/archives • codes

Design of the architecture of a large-scale or complicated system How to start

• Methodology• Decomposition• Subtask and goal

How to choose programming language and development environment• Trend of programming language (C, C++, and Java)• Platforms (Windows, Unix, and Linux)• Is there any de facto programming language(s) for a certain type of

applications (e.g. C and C++ used to be popular in real-time systems)

Page 3: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

33

Design, analysis, and management of a BIG software project II

How to design a client-side (stand alone) air traffic control – a real-time client-side monitor system • Principles

• Reliability

• Performance

• Interface between subsystem and main framework

How to design a large-scale distributed air traffic control system Architecture

• Modularity

• Reusability (difficulty for us)

• Design model (two-tier or three-tier)

Algorithm and performance of air traffic flow control Training of senior system architect

Page 4: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

44

Overall Remarks Talk based on my experience which is very different

from that of your company I have developed software in a small company and in

university setting with a mix of students and staff I watched other large software activities including

Apache and other open source Preferred software model changes faster than software

engineering techniques• C++• Corba• Java• Web Services

Maybe some software engineering

Page 5: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

55

General Principles I Have a clear management structure with one person in

charge of important decisions• Decisions can and should be debated

Communicate electronically and preserve records in a searchable fashion• Email possible if a clean master list but probably Wikis and

Blogs are better

• Equip with Search – Google web or desktop better than most built in search capabilities

Obviously use CVS or equivalent for preserving version control

Document all actions in Wiki/Blog/email

Page 6: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

66

General Principles II Computers are getting faster which implies we do not

have to worry about efficiency as much Build smaller modules

• As modules decrease in size, the overhead of interacting with them increases

• But smaller modules with simple functionality are much easier to build and test

So avoid pointers even more and prefer to communicate data, not pointers thereto, when communicating between modules

Use databases; not ad-hoc storage mechanisms where performance cost can tolerate

Page 7: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

77

General Principles III Test as much as you can by having others (Q/A)

exercise code – especially where you need to evaluate system results (output) to see if correct

Use tools like Junit to provide automated repeatable tests

The harder tests are “where you don’t know answer” Then I used to prepare two codes

• One was “production system” with all the bells and whistles

• The other had few options and just did main problem Always test incrementally

• Each module separately

• Full system as it builds up

Page 8: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

88

General Principles IV Minimize configuration variables that must be changed

for each installation Rather provide a message-based and user-based

interface that system can use to set operating parameters

Make each module as independents as possible; build together• Module• Documentation• User interface (portlets are an example)• Configuration interface

Store configuration data in a database that is independent of system

Page 9: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

99

Web services Web Services build

loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles.

Web Services interact by exchanging messages in SOAP format

The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

Databases

Humans

ProgramsComputational resources

Devices

reso

urce

s

BP

EL,

Jav

a, .N

ET

serv

ice

logi

c

<env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body></env:Envelope> m

essa

ge p

roce

ssin

g

SO

AP

and

WS

DL

SOAP messages

Page 10: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1010

A typical Web Service In principle, services can be in any language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python

PaymentCredit Card

WarehouseShippingcontrol

WSDL interfaces

WSDL interfaces

Security CatalogPortalService

Web Services

Web Services

Page 11: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1111

Messaging StructureMessaging Structure Web Service Communication is Web Service Communication is

messaging (transport protocol, routing) messaging (transport protocol, routing) using SOAP protocolusing SOAP protocol

Invoke Other Servicesfrom Header or Body

Messaging

Process SOAPHeader Body

Process SOAPBody Header

Customizable HandlerChain processesSOAP Header

Serviceitself

Service itself

Page 12: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1212

Merging the OSI Levels All messages pass through multiple operating systems and each

O/S thinks of message as a header and a body Important message processing is done at

• Network

• Client (UNIX, Windows, J2ME etc)

• Web Service Header

• Application

EACH is < 1ms (except forsmall sensor clients andexcept for complex security)

But network transmissiontime is often 100ms or worse

Thus no performance reasonnot to mix up places processingdone

IP

TCP

SOAP

App

Page 13: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1313

Linking Modules

From method based to RPC to message based to event-based publish-subscribe Message Oriented Middleware

Module A

Module B

Method Calls.001 to 1 millisecond

Service A

Service B

Messages

0.1 to 1000 millisecond latency

Coarse Grain Service ModelClosely coupled Java/Python …

Service B Service A

PublisherPost Events

“Listener”Subscribe to Events

Message Queue in the Sky

Page 14: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

OGCEOGCEConsortium

Individual portlet for the Proxy Manager

Use tabs or choose different portlets to navigate through interfaces to different services

2 Other Portlets

Each Servicehas its own portlet

Page 15: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1515

Portal ArchitecturePortal ArchitectureC

lient

s (P

ure

HT

ML,

Jav

a A

pple

t ..

)

Agg

rega

tion

and

Ren

derin

g

PortalInternalServices

Portlet Class

Portlet Class

Portlet Class

Portlet Class:WebForm

SERVOGrid(IU)

Web/Gridservice

Web/Gridservice

Web/Gridservice

Computing

Data Stores

Instruments

GridPortetc.

(Java)COG Kit

Clients Portal Portlets Libraries Services Resources

LocalPortlets

Remoteor ProxyPortlets

Hierarchical arrangement

Page 16: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1616

General Principles V Do not spend too long documenting and prefer methods

like javadoc that again are naturally associated with code

Do describe actions (as opposed to code functionality) in your Wiki/Blog/email

The quality and speed of different people varies a lot• Evaluate this and assign responsibilities according

Do not let anybody take decisions into their own hands Debate goals and processes but once decision is made

all must adhere to it• Decisions can be changed and should be if needed

Page 17: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1717

General Principles VI Evaluate carefully timing constraints Use simplest most robust approach that satisfies time

constraints• That’s why I recommend databases for configuration as this

is not a time critical part of system Note computer does one instruction in 10-6 milliseconds

but a network communication takes 1-100 milliseconds• Invoking a process has about 1 millisecond overhead

• Method calls 0.01 to 0.01 milliseconds

• Using a database a few milliseconds

• People only notice 30 milliseconds

Page 18: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1818

Consequences of Rule of the Millisecond Useful to remember critical time scales

• 1) 0.000001 ms – CPU does a calculation• 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency• 2b) 0.001 to 0.01 ms – Overhead of a Method Call• 3) 1 ms – wake-up a thread or process either?• 4) 10 to 1000 ms – Internet delay: Workflow

So use pointers and the compute memory system when latencies of ≤ 1 millisecond but use URI looked up in a context store when longer delays allowed

Transfer data when read-only and long latency allowed Always choose the slowest allowed methodology and

remember when in doubt, Moore’s law favors computer performance and systems always get more complex and harder to maintain.

ClassicProgramming

Page 19: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

1919

Architecture of a large System Divide system hierarchically into parts

• Interaction between parts will be messages with no conventional pointers• Can have URI’s that need to be looked up in a database (essentially)

Keep doing this until overhead prohibitive• Overhead is “surface”/”volume” for ALL systems – people, software … -

and always decreases in relative importance as system gets bigger Remember computers are going to get faster than slower so err

on side of modularity versus performance Rare to be worth optimizing performance but rather make a

good design that has no bad aspects making performance unnecessarily bad

Specify data structures in XML NOT Java or C++ first• Design ATCML first specifying data structures needed in Air Traffic

Control • Map to SQL for databases (don’t use XML databases)• Map to C++ or Java for programming

Page 20: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

2020

Philosophy of Web Service Grids Much of Distributed Computing was built by natural

extensions of computing models developed for sequential machines

This leads to the distributed object (DO) model represented by Java and CORBA• RPC (Remote Procedure Call) or RMI (Remote Method

Invocation) for Java Key people think this is not a good idea as it scales badly

and ties distributed entities together too tightly• Distributed Objects Replaced by Services

Note CORBA was considered too complicated in both organization and proposed infrastructure• and Java was considered as “tightly coupled to Sun”• So there were other reasons to discard

Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages

Page 21: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

2121

What is a Simple Service? Take any system – it has multiple functionalities

• We can implement each functionality as an independent distributed service

• Or we can bundle multiple functionalities in a single service Whether functionality is an independent service or one of many

method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL

Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond”• Distributed services incur messaging overhead of one (local) to

100’s (far apart) of milliseconds to use message rather than method call

• Use scripting or compiled integration of functionalities ONLY when require <1 millisecond interaction latency

Apache web site has many (pre Web Service) projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services• Makes it hard to integrate sharing common security, user

profile, file access .. services

Page 22: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

22

Grids of Grids of Simple Services• Link via methods messages streams• Services and Grids are linked by messages• Internally to service, functionalities are linked by methods• A simple service is the smallest Grid• We are familiar with method-linked hierarchy

Lines of Code Methods Objects Programs Packages

Overlayand ComposeGrids of Grids

Methods Services Component Grids

CPUs Clusters ComputeResource Grids

MPPs

DatabasesFederatedDatabases

Sensor Sensor Nets

DataResource Grids

Page 23: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

2323

Choice of languages One needs to evaluate real-time version but I would

prefer Java to C++ or C Java has good software development tools and current

generation of programmers well trained in it C++ allows higher performance but find out if you need

this Prefer Web Service model if performance allowed

• Use message-based interaction not method based where possible

• Web services if requires messages and interoperability with outside world

• JDBC is message based interaction with external database Aim at supporting both Windows or Linux platforms if

possible

Page 24: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

2424

Client Side Air Traffic Control Analyze all performance requirements Remember life cycle costs are larger than build costs

• Difficult consequences if contract just to build – not to maintain

Use Model View Controller architecture and separate Model and View• Control is often the interaction between Model and View

• So client is not same as user module; always separate business logic from user interface

Use GIS!

Page 25: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

2525

Web Services and M-MVC Web Services are naturally

M-MVC – Message based Model View Controller with • Model is Web Service

• Controller is Messages (NaradaBrokering)

• View is rendering

R F I O

ViewView

PortalAggregate WS User Facing fragments

desktop handheld phone

Input port Output port

User Facing Port

PortFacingResource

Web ServiceApplication or

Model

WSRP and JSR168 Portlets

R F I O

ViewView

PortalAggregate WS User Facing fragments

desktop handheld phone

Input port Output port

User Facing Port

PortFacingResource

Web ServiceApplication or

Model

R F I O

ViewView

PortalAggregate WS User Facing fragments

PortalAggregate WS User Facing fragments

desktopdesktop handheldhandheld phonephone

Input port Output port

User Facing Port

PortFacingResource

Web ServiceApplication or

ModelUser Facing Port

PortFacingResource

Web ServiceApplication or

Model

WSRP and JSR168 Portlets

Model

Subscribe UI event

View

Broker

Subscribe re

nderingPublis

h UI event

Publish rendering

Explicit message-based Publish/Subscribe MVC model

ModelModel

Subscribe UI event

View

BrokerBroker

Subscribe re

nderingPublis

h UI event

Publish rendering

Explicit message-based Publish/Subscribe MVC model

As Controller

Page 26: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

26

I: Data Mining and GIS Grid

WMS handlingClient requests

WMS Client

UDDI

WFS2

Databases withNASA, USGS features

SERVOGrid Faults

WFS1 NASA WMS

HTTP

SOAP

WFS3

Data Mining Grid

WMS Client

Page 27: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

27

Typical use of Grid Messaging in NASA

Datamining Grid

Sensor Grid

Grid Eventing GIS Grid

Page 28: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

28

Typical use of Grid Messaging

HPSearchManages

NaradaBrokering

Sensor Grid

WS-ContextStores dynamic data

Filter orDatamining

WFS (GIS data)

Post beforeProcessing

Post afterProcessing

Notify

SubscribeGrid DatabaseArchives

Web Feature Service

GIS Grid

GeographicalInformation System

Page 29: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

29

I: Data Mining Grid

HPSearchWorkflow

UDDI

Databases withNASA,USGS features

SERVOGrid FaultsWFS4

SOAP

WS-Context

WFS3

PI Data Mining

Filter

GIS Grid

Filter

NaradaBrokering

Pipeline

System Services

Page 30: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

3030

Architecture Consider requirements of application along side performance of

computers and networks• Remember performance of hardware will increase as will cost

of people Don’t fix number of tiers but rather build system from entities

linked by messages such as services linked by SOAP• Messaging good even if not SOAP• SOAP has “container overhead”

Build a data architecture in XML for all information that will be in messages

Use pointers internally to entities Things in messages use system metadata to look up references

• i.e. database lookup not hardware memory model As before use the slowest most general method possible

• Avoid unnecessary performance Build a fault tolerance model into initial architecture

Page 31: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

3131

ATC Performance and Algorithm Find size (in latency, bandwidth) of critical

requirements Use publish-subscribe technology to support link

between data sources and programs• Introduces a few (1-5) millisecond delay but much easier to

build and more fault tolerant

• Prefer asynchronous links as makes more modular and more robust

Performance requirements drive architecture Build hierarchical algorithm to match hierarchical

architecture

Page 32: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

3232

How to become a Software Architect Work hard! Understand modern technologies and their trends so

future enhances design choices Be able to understand system (requirements) in a clear

fashion Be able to decompose systems in a clear methodical

fashion Isolate detail into modules and use two or three level

programming model

Page 33: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

33

Two-level Programming I• The Web Service (Grid) paradigm implicitly assumes a

two-level Programming Model• We make a Service (same as a “distributed object” or

“computer program” running on a remote computer) using conventional technologies– C++ Java or Fortran Monte Carlo module

– Data streaming from a sensor or Satellite

– Specialized (JDBC) database access

• Such services accept and produce data from users files and databases

• The Grid is built by coordinating such services assuming we have solved problem of programming the service

Service Data

Page 34: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

3434

Two-level Programming II The Grid is discussing the composition of distributed

services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams

Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs

Such interpretative environments are the single processor analog of Grid Programming

Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately

Service1 Service2

Service3 Service4

Page 35: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

35

WS 2 WS N-1Web Service 1 Web Service N

3 Layer Programming Model

Level 2 Programming choosing services by virtualizationApplication Semantics (Metadata, Ontology) Semantic Grid

Level 1 Programming inside servicesApplication expressed in in Java Fortran C++ MPI etc.

Level 3 Grid Programming composing multiple servicesService Workflow, Transactions, Mediation

WS-* Infrastructure

Substantial work in UK e-Science program, international semantic web community

Page 36: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

3636

Plethora of Standards Java is very powerful partly due to its many “frameworks” that

generalize libraries e.g.• Java Media Framework• Java Database Connectivity JDBC

Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services”• About 60 WS-* specifications introduced in last 2-3 years• These are low level with higher level standards such as access

database (OGSA-DAI) or “Submit a job” built on top of these Many battles both between standard bodies and between companies as

each tries to set standards they consider best; thus there are multiple standards for many of key Web Service functionalities

Microsoft a key player and stands to benefit as Web Services open up enterprise software space to all participants• e.g. MQSeries (IBM) and Tibco have to change their messaging

systems to support new open standards

Page 37: 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas Corporation and Computer Science, Informatics, Physics.

3737

The WS-* Infrastructure Core Grid Services build on and/or extend the 60 or so

WS-* Infrastructure specifications which define• Container Model, XML, WSDL …• Service Internet ( (Reliable) Messaging, Addressing)

including extensions for high performance transport and representation. This is natural basis for streaming applications

• Service Discovery• Workflow and Transactions• Security• Metadata and State including lifetime• Notification• Policy, Agreements• Management (service interactions)• Portals and User Interfaces