WorkStations

Workstation and System Configurations

Brian Bramer, Faculty of Computing and Engineering Sciences

De Montfort University, Leicester, UK

Workstation and System Configurations 1

Contents

1 Introduction

2 Performance requirements due to system and application software

2.1 Outline of a typical small to medium sized configuration 2.2 Operating system and system software requirements

2.2.1 Support for a Multi-Programming Environment 2.2.2 Support for Virtual Memory 2.2.3 Main Memory Requirements 2.2.4 Disk Requirements

2.3 Application Dependent Performance Factors

3 Important Factors in System Performance

3.1 Factors Influencing Overall System Performance 3.2 Factors Influencing Processor Performance

3.2.1 Internal Processor Architecture 3.2.2 Clock Speed 3.2.3 Memory Speed 3.2.3 Address Bus Size 3.3.4 Data Bus Size

4 Processor Performance Enhancement Techniques

4.1 Prefetch and Pipelining 4.2 Cache Memory 4.3 Example Processor Evolution: Intel and Motorola Microprocessors 4.4 CISC and RISC Processors 4.5 Special Purpose Processors, Multi-processors, etc.

4.5.1 Special Purpose Processors 4.5.2 Multi-Processors and Parallel Processors

4.5.2.1 Data Parallel Processing 4.5.2.2 Control Parallel Processing

5 Integrated Circuits and Performance Enhancement

6 System Configurations

6.1 Personal Computers, Workstations, Minis, Distributed, etc. 6.2 Performance Factors in a Distributed Environment

7 General requirements, disk backup, disk viruses, etc.


8 Conclusions

9 References

1 Introduction

When considering the acquisition of a computer system the first task undertaken is a to carry out a feasibility study. The concept of installing a new or upgrading an existing system is analysed to determine cost effectiveness in terms of end-user requirements and advantages gained, e.g. increased productivity of skilled staff, reduced product development times, a more viable product, etc. The result of the feasibility study will be a report to be submitted to senior management to request funds to implement the proposed system.

The feasibility study to generate system requirements not only in terms of software (to solve the end-users problems) but also hardware to support that software. The hardware requirements will be in terms of computer processor power (do you need a £1000 office PC or a £20000 professional workstation with real-time 3D graphics capability?), memory size (do you need an 32Mbytes or 256Mbytes of RAM), disk space (even individual PC based packages often need a 1Gbyte each), network support (to communicate with servers or other users), etc. In addition, many end-users often forget the requirements of the system software (operating system, compilers, etc.). These notes consider hardware requirements to support software and discuss what factors effect overall system performance.


2 Performance requirements due to system and application software

For further information check the following links

The WWW Virtual Library on computing - http://src.doc.ic.ac.uk/bySubject/Computing/Overview.html CPU Information centre - http://bwrc.eecs.berkeley.edu/CIC/ Intel's developer site - http://developer.intel.com/ Intel PC technology discussion - http://developer.intel.com/technology/ PC reference information - http://www.pcguide.com/index.htm IBM PC compatible FAQ - http://www.undcom.com/compfaq.html History of CPUs - http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_history.html CPU Information & System Performance Summary - http://bwrc.eecs.berkeley.edu/CIC/summary/ Chronology of Events in the History of Microcomputers - http://www.islandnet.com/~kpolsson/comphist/

2.1 Outline of a typical small to medium sized configuration

Fig 1 Typical microcomputer configuration using a common bus system

Fig 1 is a representation of the hardware (physical components) of a simple single processor computer system comprising:

1. CPU and associated circuits, e.g. microprocessor integrated circuit chip - see http://www.mkdata.dk/click/module3a.htm

2. Co-processor(s), e.g. for real number floating point calculations and/or graphics.

3. Main or primary memory, i.e. RAM (Random Access read/write Memory) and ROM (Read Only Memory) see http://www.cms.dmu.ac.uk/~cph/Teaching/CSYS1001/lec15/c1001l15.html

Disk interfaces for floppy/hard disks as secondary memory for saving programs and data - see http://www.cse.dmu.ac.uk/~cph/Teaching/CSYS1001/lec17/c1001l17.html


http://www.cse.dmu.ac.uk/~cph/Teaching/CSYS1001/lec17/c1001l17.html

http://www.cms.dmu.ac.uk/~cph/Teaching/CSYS1001/lec15/c1001l15.html

http://www.mkdata.dk/click/module3a.htm

http://www.islandnet.com/~kpolsson/comphist/

http://bwrc.eecs.berkeley.edu/CIC/summary/

http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_history.html

http://www.undcom.com/compfaq.html

http://www.pcguide.com/index.htm

http://developer.intel.com/technology/

http://developer.intel.com/

http://bwrc.eecs.berkeley.edu/CIC/

http://src.doc.ic.ac.uk/bySubject/Computing/Overview.html

and http://www.pcguide.com/ref/hdd/index.htm

4. User I/O interface which controls the display screen and the keyboard.

5. Input/output interface devices (for connecting external devices such as printers), e.g. serial or parallel I/O interfaces.

In Fig 1 an information highway or bus system connects the various components of the system:

Address Bus carries the address of the memory location or I/O device being accessed

Data Bus which carries the data signals.

Control Bus which carries the control signals between the CPU and the other components of the system, e.g. signals to indicate when a valid address is on the address bus and if data is to be read or written.

See http://www.intel.com/network/performance_brief/pc_bus.htm and http://www.pcguide.com/ref/mbsys/buses/func.htm for a discussion of PC busses and http://agpforum.org/ and http://www.pcguide.com/ref/mbsys/buses/types/agp.htm for a discussion on the AGP (Accelerated Graphics Port).

A sophisticated system may be much more complex than Fig. 1 with multiple processors, cache memories (see below), separate bus systems for main memory, fast and slow I/O devices, etc.

When attempting to estimate the requirements of a proposed system in terms of processor performance, main memory and disk size, etc., attention must be paid to the needs of both system and user software in terms of:

1. supporting the operating system and other general software, e.g. editors, compilers, window manager, network manager, etc.;

2. supporting user application software, e.g. CAD packages, databases, word processors, etc.

2.2 Operating system and system software requirements

An early unsophisticated command line PC operating system such a MS-DOS 6.2 can run on an IBM/PC compatible microcomputer with 640Kbytes of PAM memory and a relativity small disk (e.g. 20Mbytes, MS-DOS itself needs approximately 6Mbytes of disk space). A more sophisticated operating system would require much more RAM memory and disk space. For example, Windows 98 which provides a windowed environment with multitasking/virtual memory capabilities, needs a minimum of 32Mbytes of RAM memory and takes approximately 200Mbytes of disk space.

2.2.1 Support for a multiprogramming environment.

In a multiprogramming (or multiprocessing or multitasking) environment the user or users can be running more than one program concurrently with certain programs being executed while others are waiting for I/O from disk or terminals. In a single processor system only one program can be executing at any instant and the operating system schedules programs for execution. It is therefore possible to have more programs currently available for execution than there is room in main memory and it is then necessary for some programs to be swapped out to disk (into a


http://www.pcguide.com/ref/mbsys/buses/types/agp.htm

http://agpforum.org/

http://www.pcguide.com/ref/mbsys/buses/func.htm

http://www.intel.com/network/performance_brief/pc_bus.htm

http://www.pcguide.com/ref/hdd/index.htm

reserved swap area). A sophisticated environment where many programs may be concurrent could well require a large portion of disk to be set aside for the swap area. For example, a typical professional workstation running UNIX could require a swap area of between 200 and 500Mbytes depending upon application, and allowance must be made for this. In addition, modern multiprogramming environments also support virtual memory.

2.2.2 Support for virtual memory

Over the past 40 years sophisticated large scale computer based applications (e.g. engineering CAD) have always required more main memory than was physically available (or affordable) on the computers of the time. To overcome this problem virtual memory techniques evolved in the late 1960's (Denning 1970).

Virtual memory makes use of a phenomenon known as locality of reference in which memory references of both instructions and data tend to cluster. Over short periods of time a significant amount of:

(a) instruction execution is localized either within loops or heavily used subroutines, and

(b) data manipulation is on local variables or upon tables or arrays of information.

Most virtual memory systems use a technique called paging in which the program and data is broken down into 'pages' (typical size 4Kbytes) which are held on disk. Pages are then brought into main memory as required and 'swapped' out when main memory is full. This technique allows program size to be much larger than the physical main memory size (typically a modern professional workstation may have 64 to 512Mbytes of main memory but a virtual memory size of 4Gbyte). As the number and/or size of concurrent programs increases a phenomenon known a thrashing can occur in which the system spends all its time swapping pages to and from disk and doing nothing else. It is therefore important to configure sufficient physical memory even under a virtual memory environment. This problem often becomes apparent over a period of time as new releases of software (including the operating system) are mounted on a system. New versions of software are always larger (sometimes two or three times) and users experience a sudden reduction in response times and extended program run times. This often necessitates the upgrading of main memory on existing systems every year or two.

2.2.3 Main memory requirements

Sufficient main memory is required to hold the operating system kernel (those functions permanently in main memory) and those functions which will be loaded as required. If window managers and/or network managers are also being used allowance should be made for their requirements. Typically on a PC a simple command line operating system (e.g. MS-DOS) required between 80 and 200Kbytes depending upon functions loaded and a more sophisticated environment such as UNIX or Windows 2000 would require between 8 and 32Mbytes. The follow are minimum recommendations for the size of RAM memory for IBM PC compatible microcomputer operating systems (large scale applications such as a large database could require more):

Windows 3.1 minimum 4Mbytes preferred 8Mbytes


Windows 95 Windows 98 Windows NT/2000

LINUX SCO UNIX

minimum 16Mbytes preferred 32Mbytes minimum 32Mbytes preferred 64/128Mbytes minimum 64Mbytes preferred 128/256Mbytes minimum 16 Mbytes preferred 64/128Mbytes

minimum 64Mbytes preferred 256Mbytes

If the main memory is too small there will be insufficient space for user programs and data or, in a multiprogramming/virtual memory environment, excessive swapping and paging between main memory and disk will occur.

2.2.4 Disk requirements

In addition to disk space required to support the user application programs and data sufficient disk space is required to hold the operating system, utilities, compilers, help system, etc. This can range from 500Kbytes on a small PC running MS-DOS to 350Mbytes on a professional workstation running UNIX (where the help files alone can be 100 to 150Mbytes). In addition, space must be reserved for the swap space in a multiprogramming/virtual memory environment. For example, the following figures show how the disk space taken by the operating system on an mid 1990's IBM PC increased as more sophisticated software was installed:

MS-DOS 6.2 5.8 Mbytes

plus CD-ROM driver 6.9 Mbytes

plus Windows 3.1 16.3 Mbytes

plus Win32S 18.5 Mbytes

plus Windows 95 41 Mbytes

One would then need to allow another 20 to 200Mbyes for swap space (depending upon application). Other examples of PC operating system requirements are:

OS/2 40 Mbytes plus swap space

Windows 98 100/150Mbytes plus swap space

Windows NT/2000 200/300Mbytes plus swap space

LINUX (a free PC version of UNIX) 200 Mbytes plus swap space

LINUX plus X-windows 350 Mbytes plus swap space

Some operating systems (e.g. certain versions of Linux) require swap space to be allocated when


the disk is initialized (by setting up a swap partition). Others (e.g. Windows 95/98) have a swap file which extends and contracts as required (will cause problems if the disk fills up!)

2.3 Application dependent performance factors

The importance of particular processor performance factors can depend upon the application, for example:

Processor dependent:

the performance of applications in this category is largely dependent on instruction execution speed and the performance of the ALU (arithmetic/logic unit used to manipulate integer data), e.g. AI (artificial intelligence) applications are a very good example (Lisp and Prolog programs, simulating neural networks, etc.).

Floating point dependent:

many mathematical/scientific applications will require a good real number calculation performance, e.g. the analysis of the structure of a bridge using finite element mesh techniques.

I/O (input/output) dependent applications:

applications which extensively manipulate disk file based information will require a good I/O bandwidth, e.g. a large database holding details of clients orders which may be simultaneously accessed by staff in various departments (production, sales, accounting, etc.)

In practice one the above factors may predominate in a particular application (e.g. I/O bandwidth is critical in database applications) or a broader overall system performance may be required.

Sufficient main memory and disk space must be provided to support the executable code and user data sets. Examples of IBM PC compatible software disk requirements are:


Wordstar 7 6 Mbytes minimum, 17 Mbytes maximum

Turbo C++ 3.1 8.5 Mbytes typical

Borland C++ 5 170 Mbytes typical (depends on libraries installed)

Visual C++ 2 68 Mbytes minimum, 104 Mbytes typical

Oraclerunning under SCO UNIX may require 256Mbytes of RAM to support a sophisticated database system.

Java JDK1.2.2 150Mbytes plus more for extra APIs

Viewlogic CAD 800/1000 Mbytes

It is worth noting that although Java is not particularly large in disk requirements it needs powerful processors and lots of memory to run complex Java applications using sophisticated APIs, e.g. minimum Pentium 400 with 64/128Mbytes of memory. In a recent experiment Sun's Java IDE Forte was mounted on a 5 year old DEC Alpha with 64Mbytes of memory and took 15 minutes to load!

Generally software houses or package sales documentation will provide guidance on processor and memory requirements, e.g. so much memory and disk space for the base system plus so much per user giving an immediate guide to the size of system required (one then needs to add operating system requirements).


3 Important factors in system performance

3.1 Factors influencing overall system performance

Processor performance

(see next sub-section for a detailed discussion) determines instruction execution speed, arithmetic performance, interrupt handling capability, etc.

Main or primary memory size

is critical in system performance. In a single user system it limits the size of program and/or data set which can be processed. In a multiprogramming/virtual memory environment it effects the number of concurrent processes which can be held without swapping to and from disk. In general the more powerful the processor the larger the memory, i.e. a general rule is that as processor power increases so does the user requirements and this leads to larger and more complex programs. When determining the main memory size required for a system allowance must be made for the operating system, e.g. a sophisticated operating system such as UNIX or Windows 98 typically requires 8 to 32Mbyte for resident components and work area.

Secondary memory (disk) size

determines the number of programs and data sets which can be accessed on-line at any instant. For example, in a single user word processing environment only one or two documents will be accessed at a time, which could be held on a small floppy disk. On the other hand, a large multi-user minicomputer could have 50 simultaneous users running large programs with large data sets requiring 10000Mbytes or more of disk space. Again when estimating disk requirements allowance has the to made for the operating system, e.g. UNIX typically requires of the order of a 300Mbytes if all utilities and help files are on-line.

Input/output bandwidth

is a measure of how fast information can be transferred between the processor, memory and I/O devices (see data bus size in next sub-section).

Network capability

is important in a distributed environment where a number of separate systems are connected via a network, e.g. personal workstations accessing a shared central database.

3.2 Factors influencing processor performance

The performance of the processor in terms of program size and execution speed is determined by a number of factors.


3.2.1 Internal processor architecture

which determines:

a. The number of processor registers (high speed memory within the CPU) used for the storage of temporary information and intermediate results. For example, holding local variables in CPU registers reduces traffic to/from main memory and hence overall program execution time.

b. The number of instructions available: a statement in a high level language is mapped into a sequence of processor instructions. The approach taken in what has become known as CISC architecture (complex instruction set computer) was that increasing the number of instructions shortened the executable code and the program executed faster (see the discussion on CISC/RISC machines below).

c. The number of addressing modes: the processor uses addressing modes to access the operands (data to be operated on) of instructions. The approach in CISC architectures was to increase the number of addressing modes to allow direct manipulation of more and more complex data structures, e.g. records and arrays of records.

d. The data size of the ALU (Arithmetic/Logic Unit). The ALU can directly manipulate integer data of a specific size or sizes, e.g. 8, 16, 32 or 64 bit numeric values. For example, a 32-bit ALU can add a pair of 32-bit numbers with one instruction whereas a 16-bit ALU would require two instructions.

The control unit of first (valve) and second (transistor) generation computer systems was 'hardwired' in that physical circuitry fetched, decoded and executed instructions. The major problem with very complex 'hardwired' circuits is that modifications are difficult and expensive. The advent of integrated circuits (used in third and fourth generation computers) enabled the building of ROMs on the processor chip which then allowed practical microprogramming (Stallings 2000). In a microprogrammed control unit the fetch, decode and execute of instructions are controlled by a ROM based 'microprogram' in the control unit which 'executes' the instructions received by the processor as a succession of simple microinstructions. The advantage of using ROM based microcode is that it is 'easier' to modify that an equivalent 'hardwired' circuit.

Over the past twenty years more and more instructions have been added making the microprogram of typical CISC computers (e.g. Intel 8086 and Motorola 68000 families) very complex and difficult to debug (see the discussion on CISC and RISC machines below).

See CPU Information & System Performance Summary - http://bwrc.eecs.berkeley.edu/CIC/summary/ and CPU Information centre - http://bwrc.eecs.berkeley.edu/CIC/

3.2.2 Clock Speed

Events within the system are synchronized by a clock which controls the basic timing of instructions or parts of instructions. A particular microprocessor may be available in a range of clock speeds. For example, Table 1 presents a summary of the relative performance of the Motorola MC68000 family against clock speed (the performance in Mips is a guide and will be effected by factors such as cache hit rate, etc.). All things being equal, a 25MHz MC68020 will




execute instructions twice as fast as an 12.5MHz version, but costs more.

clock MHz 68008 68000 68010 68020 68030 68040

8 10

12.5 16.65

25 33 50

0.5

0.6 0.8

1.3

0.65 0.8 1.1

1.7 2.2 3.0 6.0

5.0

12.0

22.0 29.0

Table 1 Relative performance (in Mips) of the Motorola MC68000 family against clock speed (figures are a guide - results depend on clock speed, memory access time, cache hit rate, etc.)

The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at two, three or four times the external clock speed making a particular improvement in processor bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where a large data bus and fast I/O devices are more important.

3.2.3 Memory speed

Main memory speed should match the speed of the processor. A 25MHz MC68020 requires faster (hence more expensive) memory than a 12.5MHz version. If necessary, memory attached to a MC68020 can delay the processor on a memory read/write by using WAIT states, which makes the processor idle for one or more clock periods and hence slows the overall execution speed. A common tactic in the early 1990's was to build machines with a fast processor and clock but with slow (and cheap) memory, e.g. the unwary could be caught by a machine advertised as having a 25MHz CPU but which could execute programs slower than a 12.5MHz machine.

3.2.4 Address Bus size

The number of address lines determines the memory address space of a processor, i.e. both the maximum amount of physical main memory which can be accessed (if fitted) and the maximum logical memory size in a virtual memory environment. Therefore the address bus size effects maximum program/data size and/or the amount of swapping and paging in a multiprogramming/virtual memory environment. For example, 16 address lines can access a maximum of 64Kbytes, 20 lines 1Mbyte, 24 lines 16Mbyte and 32 lines 4Gbyte.

It must be noted that even though a processor has a particular address space this does not mean that a computer system will be or can be fitted with the maximum amount. For example, a processor with 32 address lines has an address space of 4Gbyte but typical 32-bit machines are fitted with anything between 4Mbyte and 256Mbyte of physical memory. The 4Gbyte address space becomes important under a virtual memory environment where very large programs can be executed on machines with much smaller physical memory. In practice there is a maximum amount of memory which can be fitted to a particular model of machine (determined by the


layout of the machine in terms of bus slots, physical space available, etc.). One of the major differences between personal workstations and mini/mainframe computer systems is that the latter can generally be fitted with much larger physical memory.

3.2.5 Data bus size

The width of the data bus determines how many memory read/write cycles are required to access instructions/data and has a major effect on I/O bandwidth, e.g. if a processor has a 16-bit data bus it will require two memory accesses to read a 32-bit number while a processor with a 32-bit data bus would require a single access. A question often asked is why a multi-user minicomputer can be up to ten times the cost of a personal workstation with similar processor performance. The answer is that when purchasing minicomputers and mainframe systems one is buying, to a large extent, I/O bandwidth and physical memory capacity. An example (from the mid 1980's) is the comparison between an Apollo DN3000 workstation (based on a MC68020 12MHz microprocessor) and the DEC VAX 8200 minicomputer

processor rating I/O bandwidth typical cost (1987)

Apollo DN3000 1.2 Mips 1Mbyte/sec £20,000

DEC VAX 8200 1.2 Mips 13 Mbytes/sec £200,000

The figures are order of magnitude guides but do give an indication of different areas of application of the systems. The Apollo was a single user workstation used for highly interactive computational tasks and the VAX was typically be used by a number of concurrent users (e.g. five to ten) to run tasks which are not heavy in computational terms but which require a system capable of supporting the I/O of a number of users (e.g. multi-user databases, sales/stock control packages, accounting packages, etc.)


Microprocessor manufacturer & type

address bus size in bits

maximum memory bytes

data bus size in bitscloc

k

Intel 8080 Zilog Z80

Motorola 6800 Intel 8088 (IBM/PC)

Intel 8086 (IBM/PC XT) Motorola 68008

Motorola 68000, 68010 Intel 80186, 80286

Motorola 68020/30/40 Intel 80386SX Intel 80386DX Intel 80486DX Intel 80486SX

Intel 80486DX2 Intel 80486DX4 Intel Pentium 400

16 16 16 20 20 20 24 24 32 24 32 32 32 32 32 32

64K 64K 64K 1M 1M 1M 16M 16M 4G

16M 4G 4G 4G 4G 4G 4G

8 8 8 8 16 8 16 16 32 16 32 32 32 32 32

32/64 PCI

*1 *1 *2 *3 *4

Table 2 Common microprocessors with address and data bus sizes

Note: K = 1024 (210), M = 1048576 (220), G = 1073741824 (230) The 40486SX is identical to the DX except that it has no floating point coprocessor

Table 2 shows address and data bus sizes for various microprocessors:

1. 1 The early microcomputers (e.g. Intel 8080, Zilog Z80, and Motorola 6800 series) have a 16-bit address bus which can address a maximum memory size of 65536 bytes or 64 Kbytes, i.e. 1111111111111111 in binary.

2. The Intel 8086 (used in the original IBM PC microcomputer) and Motorola MC68008 have a 20-bit address bus which can address a maximum memory size of 1048576 bytes or 1 Mbyte.

3. The Intel 80186/286 and Motorola MC68000/10 have a 24-bit address bus which can address a maximum memory size of 16777216 bytes or 16 Mbytes.

4. The Intel 80386/486/Pentium and Motorola MC68020/30/40 have a 32-bit address bus which can address a maximum memory size of 4294967296 bytes or 4 Gbytes.

Table 2 shows the maximum amount of primary memory which can be addressed. In practice a computer system may be fitted with less, e.g. typically a modern Pentium system has 32, 64, 128 or 256 Mbytes. Although the primary memory is organized in bytes an instruction or data item may use several consecutive bytes of storage, e.g. using 2, 4 or 8 bytes to store 16-bit, 32-bit or 64-bit values respectively.

The size of the data bus determines the number of bits which can be transferred between system components in a single read or write operation. This has a major impact on overall system


performance, i.e. a 32-bit value can be accessed with a single memory read operation on a 32-bit bus but requires two memory reads with a 16-bit bus. In practice the more powerful the processor the larger the data and address busses.

The size of the address and data busses has a major impact on the overall cost of a system, i.e. the larger the bus the more complex the interface circuits and the more 'wires' interconnecting system components. Table 2 shows that there are versions of some processors with a smaller data and addresses busses, e.g. the Intel 80386SX is (from a programmers viewpoint) internally identically to the 80386 but has a 20-bit address bus and a 16-bit external data bus (but the internal data bus is 32-bits). These are used to build low cost systems which are able to run application programs written for the full processors (but with reduced performance).

The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at two, three or four times the external clock speed making a particular improvement in processor bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where a large data bus and fast I/O devices are more important.

Table 2a shows the Intel processors with address, data bus sizes (internal and external), internal cache size, presence of internal co-processor and internal clock speed.

IBM PC compatibles processor model

address bus size in bits

maximum

memory bytes

internal data bus

in bits

external data bus in bits

internal cache in

bytes

internal co-

processor

internal

clock

Intel 8088 (IBM/PC) Intel 8086 (IBM/PC XT)

Intel 80186, 80286 Intel 80386SX Intel 80386DX Intel 80486DX Intel 80486SX

Intel 80486DX2 Intel 80486DX4 Intel Pentium 400

20 20 24 32 24 32 32 32 32 32

1M 1M 16M 4G

16M 4G 4G 4G 4G 4G

16 16 16 32 32 32 32 32 32 64

8 16 16 32 16 32 32 32 32

32/64 PCI

none none none none none 8K 8K 8K 16K 16K

no no no no no yes no yes yes yes

*1 *1 *1 *1 *1 *1 *1 *2 *2

or*3 *4

Table 2a Intel processors

Notes:

Address bus size

determines the memory address space of a processor, e.g. 32 address lines can address a maximum of 4Gbyte of memory

Data bus size


determines how many memory read/write cycles are required to access instructions/data has a major effect of input/output bandwidth (important in file servers and database servers)

Cache memory

a fast memory logically positioned between the processor and bus/main memory - can be on chip (as in 80486) and/or external

Floating point co-processor

is important in real number calculations (twenty times speed up over normal CPU) important in mathematical, scientific and engineering applications

Clock Speed

The clock times events within the computer - the higher the clock the faster the system goes - (assuming memory, bus, etc. matches the speed)

Internal clock speed

the 80486DX2, 80486DX4 and Pentium processors contain clock doublers/triplers/quadrouplers, etc. on-chip operations are performed at 2/3/4 times the external clock speed - external operations are the same


4 Processor Performance Enhancement Techniques

Modern processors, including microprocessors, use instruction pipelining and cache memory techniques first used in the large mainframe computers of the 1960's and 1970's (Foster 1976). Also see Chronology of Events in the History of Microcomputers - http://www.islandnet.com/~kpolsson/comphist/ ,CPU Information & System Performance Summary - http://bwrc.eecs.berkeley.edu/CIC/summary/ and CPU Information centre - http://bwrc.eecs.berkeley.edu/CIC/

4.1 Prefetch and Pipelining

A program consists of a sequence of instructions in main memory. Under the control of the Control Unit each instruction is processed in a cyclic sequence called the fetch/execute or instruction cycle:

Fetch Cycle

A machine code instruction is fetched from main memory and moved into the Instruction Register, where it is decoded.

Execute Cycle

The instruction is executed, e.g. data is transferred from main memory and processed by the ALU.

To speed up the overall operation of the CPU modern microprocessors employ instruction prefetch or pipelining which overlap the execution of one instruction with the fetch of the next or following instructions. For example, the MC68000 uses a two-word (each 16-bits) prefetch mechanism comprising the IR (Instruction Register) and a one word prefetch queue. When execution of an instruction begins, the machine code operation word and the word following are fetched into the instruction register and one word prefetch queue respectively. In the case of a multi-word instruction, as each additional word of the instruction is used, a fetch is made to replace it. Thus while execution of an instruction is in progress the next instruction is in the prefetch queue and is immediately available for decoding. Powerful processors make extensive use of pipelining techniques in which extended sequences of instructions are prefetched with the decoding, addressing calculation, operand fetch and execution of instructions being performed in parallel (Stallings 2000). In addition, modern processors cater for the pipelining problems associated with conditional branch instructions. For more details see http://www.cs.herts.ac.uk/~comrrdp/pipeline/pipetop.html and http://www.cs.umass.edu/~weems/CmpSci535/535lecture8.html

4.2 Cache memory - also see (see http://www.infc.ulst.ac.uk/~desi/b94mn/cache.htm)

There has always been a problem of maintaining comparability between processor and memory speed (Foster 1976, Stallings 2000). Increasing processor speed is relatively cheap in comparison to corresponding increases in the speed of the bus and main memory configuration (hence the use of WAIT states to match processors to slower and cheaper memory).

A cache memory makes use of the locality of reference phenomenon already discussed in the


http://www.infc.ulst.ac.uk/~desi/b94mn/cache.htm

http://www.infc.ulst.ac.uk/~desi/b94mn/cache.htm

http://www.cs.umass.edu/~weems/CmpSci535/535lecture8.html

http://www.cs.herts.ac.uk/~comrrdp/pipeline/pipetop.html



http://www.islandnet.com/~kpolsson/comphist/

section on virtual memory, i.e. over short periods of time references of both instructions and data tend to cluster. The cache is a fast memory (matched to CPU speed), typically between 4K and 256Kbytes in size, which is logically positioned between the processor and bus/main memory. When the CPU requires a word (instruction or data) a check is made to see if it is in the cache and if so it is delivered to the CPU. If it is not in the cache a block of main memory is fetched into the cache and it is likely that future memory references will be to other words in the block (typically a hit ratio of 75% or better can be achieved). Clearly memory writes have to be catered for and the replacement of blocks when new block is to be read in. Modern microprocessors (Intel 80486 and Motorola MC68040) have separate on-chip instruction and data cache memories - additional external caches may also be used, see Fig 2. Cache memory is particularly important in RISC machines where the one instruction execution per cycle makes heavy demands on main memory.

The concept of a cache has been extended to disk I/O. When a program requests a block or blocks several more are read into the cache where it is immediately available for future disk access requests. Disk caches may take two forms:

Software disk cache

in which the operating system or disk driver maintain the cache in main memory, i.e. using the main CPU of the system to carry out the caching operations.

Hardware disk cache

in which the disk interface contains its own cache RAM memory (typically 4 to 16Mbytes) and control circuits, i.e. the disk cache is independent of the main CPU.

Hardware disk caches are more effective but require a more complex (and expensive) disk controller and tend to be used with fast disks in I/O bound applications, e.g. databases.

Fig 2 Showing CPU (with ALU, Control Unit and internal cache), external cache, RAM memory and busses


4.3 Example Processor Evolution: Intel and Motorola Microprocessors

The Motorola MC68000 family has evolved considerably since the introduction of the MC68000 in 1979 (the Intel 8086 family has evolved along similar lines - see Fig. 3):

MC68000 - 1979

NMOS technology approximately 68000 transistors. 16-bit data bus, 24-bit address bus (maximum 16 Mbyte memory) 2 word prefetch queue (including IR) approximately 0.6 Mips at 8MHz

MC68008 - 1982

NMOS technology - from a programmers viewpoint almost identical to 68000 8-bit data bus, 20 bit address bus (maximum 1Mbyte memory) approximately 0.5 Mips at 8MHz

MC68010 - 1982

as 68000 with the following enhancements: three word prefetch queue (tightly looped software runs in 'loop mode') memory management support (for virtual memory) approximately 0.65 Mips at 8MHz

MC68020 - 1984

CMOS technology with 200000 transistors true 32-bit processor with 32-bit data and address busses (4 Gbyte address space) extra instructions and addressing modes three clock bus cycles (68000 bus cycles take four clock cycles) extended instruction pipeline on-chip 256 byte instruction cache co-processor interface, e.g. for MC68881 floating-point co-processor approximately 2.2 Mips at 16MHz

MC68030 - 1987

300000 transistors extended pipelining 256 byte on-chip instruction cache and 256 byte on-chip data cache on-chip memory management unit approximately 5.0 Mips at 16MHz

MC68040 - 1989

1200000 transistors 4Kbyte on-chip instruction cache and 4Kbyte on-chip data cache on-chip memory management unit and floating point processor pipelined integer and floating point execution units operating concurrently approximately 22.0 Mips at 25MHz


Fig 3 Showing the relative performance of Intel processors - from http://bwrc.eecs.berkeley.edu/CIC/summary/icomp.gif


http://bwrc.eecs.berkeley.edu/CIC/summary/icomp.gif

Fig 3a Showing the relative performance of Intel overdrive processors - from http://bwrc.eecs.berkeley.edu/CIC/summary/icomp-cmp.gif

Overdrive processors use newer technology (DX4 and Pentium) in chips which plug into earlier motherboards

for comparisions of the Intel Pentium III processors see http://www.intel.com/procs/perf/icomp/index.htm

4.4 CISC and RISC processors (Stallings 2000)

Over the past thirty years as the size the silicon wafers increased and circuit elements reduced the architecture of processors become more and more complex. In an attempt to close the semantic gap between high level language operations and processor instructions more and more powerful and complex instructions and addressing modes were implemented. As microprocessors evolved this continued until many of todays advanced microprocessors (e.g. Intel 80486, Motorola 68040) have hundreds of instructions and tens of addressing modes. This type of processor architecture is called a complex instruction set computer or CISC. There are a number of drawbacks with this approach:

1. The instruction set and addressing modes are so complex that it becomes very difficult to write compilers which can take advantage of particular very powerful instructions, i.e.


http://www.intel.com/procs/perf/icomp/index.htm

http://bwrc.eecs.berkeley.edu/CIC/summary/icomp-cmp.gif

optimize the generated code correctly.

2. The microprogram of the control units becomes very complex and difficult to debug.

3. Studies of typical programs have shown that the majority of computation uses only a small subset of the instruction set, i.e. a large percentage of the chip area allocated to the processor is used very little. Table 3 (Tanenbaum 1990) presents the results of studies of five programming languages (SAL is a Pascal like language and XPL an a PL/1 like language) and presents the percentages of various statement types in a sample of programs. It can be seen that assignments, IFs and procedure CALLs account for typically 85% of program statements. Further analysis (Tanenbaum 1990) has shown that 80% of assignments are of the form variable:=value, 15% involve a single operator (variable:=a+b) and only 5% percent of expressions involve two or more operators.

Statement SAL XPL Fortran C Pascal Average

Assignment IF

CALL LOOP GOTO

other

47 17 25 6 0 5

55 17 17 5 1 5

51 10 5 9 9 16

38 43 12 3 3 1

45 29 15 5 0 6

47 23 15 6 3 7

Table 3 Percentage of statement types in five programming languages (Tanenbaum 1990)

An alternative approach to processor architecture was evolved called the reduced instruction set computer or RISC. The number of instructions was reduced by an order of magnitude and the space created used for more processor registers (a CISC machine typically has 20 registers a RISC machine 500) and large on-chip cache memories. All data manipulation is carried out on and using data stored in registers within the processor, only LOAD and STORE instructions move data between main memory and registers (RISC machines do not allow direct manipulation upon data in main memory). There are a number of advantages to this approach:

1. Compiler writing becomes much easier with the limited instruction set.

2. All instructions are of the same length simplifying pipelineing. Instructions execute within one clock cycle - with modern RISC machines executing some instructions in parallel (this requires sophisticated pipelining techniques). CISC instructions can be various lengths (e.g. 2 bytes to 10 bytes in length) taking varying times to execute and making pipelining complex.

3. The control unit is sufficiently simple that it can be 'hardwired'.

4. The circuit design, layout and modelling is simplified, reducing development time, e.g. Table 4 shows the design and layout effort involved in the development of some modern


RISC and CISC microprocessors (Stallings 2000).

The disadvantages are:

1. Programs are typically 25% to 45% larger than on an equivalent CISC machine (not a major problem with cheap main memory and large caches);

2. Executing one instruction per clock cycle makes heavy demands on main memory therefore RISC machines tend to have larger cache memories than equivalent CISC machines.

Until the late 1980's there was no out and out winner with RISC and CISC machines of similar price giving similar overall performance. However, problems have arisen with the latest generations of CISC microprocessors which incorporate sophisticated on-chip instruction pipelines, memory management units, large instruction and data caches, floating point units, etc. As clock speeds were increased (to improve performance) severe problems occurred in maintaining reliable production runs with commercially available machines appearing up to a year after the announcement of the microprocessor concerned. An interesting pointer to the current trend towards RISC technology is that all the latest high performance workstations are RISC based (in some cases replacing CISC models), e.g. IBM 6000, DEC 5000, Hewlett Packard 9000/700.

CPU TransistorsDesign

(person-months)Layout

(person-months)

RISC I RISC II

MC68000 Z8000

Intel APx-432

44,000 41,000 68,000 18,000 110,000

15 18 100 60 170

12 12 70 70 90

Table 4 Design and layout effort for some microprocessors (Stallings 2000)

4.5 Special Purpose Processors, Multi-processors, etc.

Adding extra processors can significantly enhance the overall performance of a system by allowing tasks to be performed by specialised hardware and/or in parallel with 'normal' processing .

4.5.1 Special Purpose Processors.

The use of specialised processors to perform specific functions was implemented in the large mainframe computer systems of the 1970's, e.g. the PPU's (peripheral processing units) of the CDC 6600 (Foster 1978). Today's high performance systems may contain a number of specialised processors:


Floating point co-processor

to carry out real number calculations.

Graphics processor

to control the graphics display. This can range from a fairly simple graphics controller chip which provides basic text, pixel and line drawing capabilities up to specialised processors which support advanced graphics standards such as X windows.

Input/Output control processors

which carry out complex I/O tasks without the intervention of the CPU, e.g. network, disk, intelligent terminal I/O, etc. For example, consider a sophisticated network where the network communications and protocols are handled by a dedicated processor (sometimes the network processor and associated circuits is more powerful and complex than the main CPU of the system).

In a 'simple' system all the above tasks would be carried out by sequences of instructions executed by the CPU. Implementing functions in specialised hardware has the following advantages which enhance overall system performance:

(a) the specialised hardware can execute functions much faster than the equivalent instruction sequence executed by the general purpose CPU; and

(b) it is often possible for the CPU to do other processing while a specialist processor is carrying out a function (at the request of the CPU), e.g. overlapping a floating point calculation with the execution of further instructions by the CPU (assuming the further instructions are not dependent upon the result of the floating point calculation).

4.5.2 Multi-processors and Parallel Processors

John von Neuman in 1949 (Foster 1978, Tanenbaum 1990) developed EDSAC, the first electronic stored program computer, in which a single CPU sent sequential requests over a bus to memory for instructions and data. The vast majority of computer systems (CISC and RISC) built since that time are essentially developments of the basic von Neuman machine.

One of the major limitations when increasing processor clock rate is the speed, approximately 20cm/nsec, at which the electrical signals travel around the system. Therefore to build a computer with 1nsec instruction timing, signals must travel less than 20cm to and from memory. Attempting to reducing signal path lengths by making systems very compact leads to cooling problems which require large mainframe and supercomputers to have complex cooling systems (often the downtime of such systems is not caused by failure of the computer but a fault in the cooling system). In addition, many of the latest 32-bit microprocessors have experienced over-heating problems. It therefore becomes harder and harder to make single processor systems go faster and an alternative is to have a number of slower CPUs working together. In general modern computer systems can be categorised as follows:

SISD single instruction single data architecture

SIMD single-instruction multiple-data architecture


MIMD multiple-instruction multiple-data architecture

The von Neuman machine is SISD architecture in which some parallel processing is possible using pipelining and co-processors.

4.5.2.1 Data parallel processing

In data parallel processing one operation acts in parallel on different data. For example, the SIMD (single-instruction multiple-data) architecture is one in which a control unit issues the same instruction to a number of identical processing elements or PEs. For example, such an architecture is useful in specialised applications where a sequence of instructions is to be applied to a regular data structure. For example, image processing applications (from pattern recognition to flight simulators) require sequences of operations to be applied to all pixels (picture elements) of an image; which may be done pixel by pixel in a single processor system or in parallel in a SIMD system. Many complex application areas (aerodynamics, seismology, meteorology) require high precision floating point operations to be carried out on large arrays of data. 'Supercomputers' designed to tackle such applications are typically capable of hundreds of millions of floating point operations per second.

4.5.2.2 Control parallel processing

Data parallel processing is applicable to a limited range of applications where a sequence of instructions is applied to a data structure. General purpose computing, however, requires multiple instruction multiple data processing. Such an environment is called control parallel processing in which different instructions act on different data in parallel.

The MIMD (multiple-instruction multiple-data) architecture is one in which multiple processors autonomously execute different instructions on different data. For example:

Multi-processing

in which a set of processors (e.g. in a large mini or mainframe system) share common main memory and are under the integrated control of an operating system, e.g. the operating system would schedule different programs to execute on different processors.

Parallel processing

in which a set of processors co-operatively work on one task in parallel. The executable code for such a system can either be generated by:

(a) submitting 'normal' programs to a compiler which can recognize parallelism (if any) and generate the appropriate code for different processors;

(b) programmers working in a language which allows the specification of sequences of parallel operations (not easy - the majority of programmers have difficulty designing, implementing and debugging programs for a single processor computer).


5 Integrated circuits and performance enhancement

An integrated circuit chip is a small device a few centimetres square which contains a complex electronic circuit fabricated onto a wafer of semiconductor material. Over the years the techniques used to fabricate the wafers have improved, e.g. the maximum chip edge size increased from 2mm in 1960 to 13mm in 1990, see Fig. 4, and the minimum feature size decreased from 50 microns in 1960 to 0.8 microns in 1990 allowing more circuits per unit area, see Fig. 5. The result was that integrated circuits became larger and more complex, see Fig. 6, with the number of transistors per chip doubling every 18 months (Moore's law originally handed down in 1975 and still going strong) (Stallings 2000). Alongside the increase in complexity there has been a corresponding reduction in cost, see Fig. 7:

Over the past 30 years, the performance/dollar ratio of computers has increased by a factor of over one million (Gelsinger et al 1989).

For example, in 1790 the cost of memory (magnetic core) was between 50 pence and £1 per byte, e.g. 4K of 12-bit PDP8 memory was approximately £4000. By the mid 1970's 16K of 32-bit PDP11 memory cost £4000. Today IBM PC compatible memory is between £25 and £40 per Mbyte.

The generations of integrated circuit technology range from small scale integration (SSI), to medium scale integration (MSI), to large scale integration (LSI), very large scale integration (VLSI) and ultra large scale (ULSI) integration. These can be represented by ranges of complexity (numbers of components on the chip), see Table 5.

Until recently a sophisticated workstation would have contained a large number of complex integrated circuit chips, e.g. the microprocessor, floating point co-processor, memory management unit, instruction and data caches, graphics controller, etc. As chip complexity increased it became possible to build more and more powerful on-chip microprocessors with larger and larger address and data busses. The major problem, however, with increasing off-chip bus widths is that every extra bit requires a contact (pin or leg) on the chip edge to connect it to the outside world and an extra 'wire' and interface components on the external bus and associated circuits. Thus every extra bus lines makes the overall system more complex and expensive, i.e. mini and mainframe computer systems (which have large data buses) can be an order of magnitude greater in cost than a personal workstation of equivalent CPU performance.

The ability to fabricate more components on a single chip (Fig. 6 and Fig 6a) has meant that a number of functions can be integrated onto a single integrated circuit, e.g. modern microprocessors contain the microprocessor, floating point co-processor, memory management unit and instruction and data caches on a single chip. The advantages of having the majority of the major components on-chip that very wide internal busses can be used decoupling cycle timing and bandwidth of on-chip operations from off-chip considerations. Hence the processor can run at a very fast cycle time relative to the frequency of the external circuitry. On-chip clock multipliers enhances this effect.


complexity typical circuit function

SSI MSI LSI VLSI ULSI

2-64 64-2000

2000-64000 64000-

2000000 2000000-64000000

e.g. simple gates AND, OR, EXOR, NOT, etc. e.g. counters, registers, adders, etc. e.g. ALUs, small microprocessors, I/O interfaces e.g. microprocessors, DMA controllers, etc. e.g. parallel processors, 1 Mbyte memory chips

Table 5 Integrated circuit generations: complexity and typical circuit function

Fig. 4 Maximum chip edge size against time

Fig. 5 Minimum feature size in microns against time


Fig. 6 Number of components per chip against time

Fig 6a CPU transistor count Intel 8086 family


Fig. 7 Average main memory cost per byte

Fig. 8 Trends in CPU performance growth (Hennessy & Jouppi 1991) Note: no account is taken of other factors such as I/O bandwidth, memory capacity, etc.


6 System configurations

6.1 Personal computers, workstations, minis, distributed, etc.

In the late 1970s computer systems could be classified into microcomputers, minicomputers and mainframe computers:

A microcomputer:

a single user computer system (cost £2000 to £5000) based on an 8-bit microprocessor (Intel 8080, Zilog Z80, Motorola 68000). These were used for small industrial (e.g. small control systems), office (e.g. word-processing, spreadsheets) and program development (e.g. schools, colleges) applications.

A minicomputer:

a medium sized multi-user system (cost £20000 to £200000) used within a department or a laboratory. Typically it would support 4 to 16 concurrent users depending upon its size and area of application, e.g. CAD in a design office.

A mainframe computer:

a large multi-user computer system (cost £500000 upwards) used as the central computer service of a large organization, e.g. Gas Board customer accounts. Large organizations could have several mainframe and minicomputer systems, possibly on different sites, linked by a communications network.

As technology advanced the classifications have become blurred and modern microcomputers are as powerful as the minicomputers of ten years ago or the mainframes of twenty years ago.

Fig. 8 shows the rate of CPU performance growth since the 1960's (Hennessy & Jouppi 1991) as measured by a general purpose benchmark such as SPEC (these trends still continue - see Fig. 3). Microprocessor based systems have been increasing in performance by 1.5 to 2.5 times per year during the past six to seven years whereas mini and mainframe improvement is about 25% per year (Hennessy & Jouppi 1991). It must be emphasized that Fig. 8 only compares CPU performance and no account is taken of other factors such as the larger I/O bandwidth and memory capacity of mini and mainframe systems and the special applications which require supercomputers.

Today system configurations may be summarized as PCs (personal computers), professional workstations, multi-user mini/mainframe computers and distributed environments.

6.1.1 Personal computers

PC - Personal Computer:

a generic term for a small (relatively) personal microcomputer system (cost £500 to £5000) used for a wide range of relatively low-level computer applications (see Table 6 for a summary of the features of a typical PC). The most common PCs are the IBM PC and compatible machines (based on the Intel 8086/80286/80386/80486/Pentium


family of microprocessors).

Bus size:

Until the late 1980's the major factor which limited the overall performance of IBM PC compatible computers was the widespread use of the 16 bit IBM PC/AT bus (the 16 bit refers to the data bus size) developed in the mid 1980s to support the 80286 based IBM PC/AT microcomputer. This bus system was widely accepted and became known as the ISA bus (Industry Standard Architecture). Unfortunately in terms of faster 80386/80486 computer systems the ISA bus was very slow, having a maximum I/O bandwidth of 8 Mbytes/sec. This caused a severe I/O bottleneck within 80486 systems when accessing disk controllers and video displays via the bus, see Fig 9.

Some IBM PC compatibles were available with the IBM Microchannel bus or the EISA (Extended Industry Standard Architecture) bus, both of which are 32 bit bus systems having I/O bandwidths of 20 to 30 Mbytes/sec or greater. An EISA bus machine, however, could cost £500 to £1000 more than the equivalent ISA bus system with corresponding increases in the cost of the I/O boards (typically two to three times the cost of an equivalent ISA bus card). The EISA bus maintains compatibility with ISA enabling existing ISA cards to be used with it.

The problem with EISA was that it made the PC quite expensive and this led to the development of local busses which are cheaper and have similar or better performance. There were two major contenders:

1. VESA a 32-bit local bus which was the first to appear

2. PCI a 32/64-bit local bus which is supported by Microsoft and Intel

Because VESA was the first to appear it became popular in the early/mid eighties. Since that time PCI has taken over - mainly because it was supported by Microsoft and Intel and could be use to support the Pentium which has a 64-bit data bus (Intel quote peak bandwidths of 132Mbytes/sec). Early Pentium systems had a PCI local bus used for high performance devices (video, disk, etc.) plus an ISA bus for slower devices (serial and parallel I/O, etc.), see Fig. 10. Many of todays Pentiums systems do not have ISA bus slots which can cause problems if on wishes to interface with old devices, e.g. specialist hardware boards.


Fig 9 Showing the ISA bus of an IBM PC compatible microcomputer

Fig 10 Showing IBM PC compatible microcomputer with a PCI local bus


PCI bus The original PCI bus was rated 32 bits at 33MHz giving a maximum throughput of 132Mbytes per second. Since then PCI-2 has appeared rated 32/64bits at 66MHz giving a maximum throughput of 528Mbytes persecond. Unfortunately the PCI bus is now quite dated and is becoming a performance bottleneck in modern Pentium systems - see http://www.intel.com/network/performance_brief/pc_bus.htm and http://www.pcguide.com/ref/mbsys/buses/func.htm for a discussion of PC busses.

For example, many Pentium motherboards are also equipped with a AGP (Accelerated Graphics Port) which was developed to support high performance graphics cards for 2D and 3D applications - see http://developer.intel.com/technology/agp/tutorial/, http://agpforum.org/ and http://www.pcguide.com/ref/mbsys/buses/types/agp.htm

Display

The main problem with running sophisticated graphics applications on a PC is that the screen quality in terms of addressable pixels and physical size is deficient:

1. PC VGA graphics is only 640*480 pixels compared with a workstation 'norm' of greater than 1000*1000 pixels). The super VGA graphics (1024*768 pixels by 256 colours) of modern PCs is much better.

2. Screen updating can be very slow (relative to a workstation) on a machine with an ISA bus (see discussion above on PC bus systems).

3. cheaper PCs sometimes use an interlaced display to reduce overall system cost, i.e.:

Non-interlaced display: every line of pixels is displayed 50 or 60 times per second. Interlaced display: alternate lines of pixels are displayed 25 or 30 times per second thus horizontal lines of one pixel thickness flicker.

4. The physical screen size of a PC is typically 14/15/17 inches against the workstation norm of 19/21 inches.

Operating system

The most common operating system of IBM PC compatibles is generally some variety of Windows (95/98/NT/2000). Although OK for many application environments UNIX is still preferred for high-performance robust application areas.

6.1.2 Professional workstations

Professional workstation a generic term applied to the (relatively) high powered personal computing systems operating in a distributed environment evolved by Apollo (Nelson & Leach 1984) and Sun (Pratt 1984) in the early 1980's. The main advantages of professional workstations over PCs are:

a. Computing power is an order of magnitude higher: the early machines were based on the Motorola MC68000 family of microprocessors, today the tendency is to


http://www.pcguide.com/ref/mbsys/buses/types/agp.htm

http://agpforum.org/

http://developer.intel.com/technology/agp/tutorial/

http://www.pcguide.com/ref/mbsys/buses/func.htm

http://www.intel.com/network/performance_brief/pc_bus.htm

use RISC based architectures. Main memory and disk size is corresponding higher.

b. Bus system (Stallings 2000): in the past professional workstations used 32 bit bus systems, e.g. VME with an I/O bandwidth of 40 Mbytes/sec. Modern workstations have moving to 64 bit or greater buses or independent memory and I/O bus systems. .

c. UNIX operating system: the de facto industry standard for medium sized computer systems.

d. Integrated environment: the workstations are designed to operate in a sophisticated multiprogramming networked distributed environment. The operating system is integrated with the window manager, network file system, etc.

e. Multiprogramming/virtual memory operating system: the workstations are designed to run large highly interactive computational tasks requiring a sophisticated environment.

f. High quality display screen: a large high quality non-interlaced display with mouse input is used to interact with the window managed multiprogramming environment.

A modern high-performance PC, equipped with high-performance graphics card and high quality display can compete with low end workstations (at similar cost). More specialised applications such as real-time 3D graphics still require professional workstations.

6.1.3 Multi-user minicomputer and mainframe computer systems.

The terms mini and mainframe are becoming very blurred but in general refer to large multi-user configurations with good I/O bandwidth and main memory capacity, i.e. a number of users (100 to 100000) concurrently running relatively straight-forward applications on a common computer system. High powered multi-user systems typically have an I/O bandwidth and physical memory capacity at least an order of magnitude greater than PCs and workstations of similar CPU power. Such multi-user environments may be accessed via a network by X terminals, PCs or professional workstations. PCs or professional workstations can be used as multi-user machines so long as the amount of concurrent I/O does not reduce overall performance.

6.1.4 Distributed environment

A modern computer configuration tends to consist of a number of individual computer systems connected via a network to form a distributed environment:

1. User workstations: PCs and/or professional workstations running highly interactive software tools, e.g. word processing, spreadsheets, CAD design, CASE tools, etc.

2. Fileservers: powerful (relative to the user workstations) computer system which holds central operating system files, user files, centralized databases, etc. This


may be a high powered PC, a specialised workstation (without a user) or a minicomputer.

3. Nodal processors: some commercial, industrial or scientific applications require a powerful centralized system to support specialised tasks beyond the capacity of the user workstations, e.g. heavy floating point numeric processing, very large databases, etc. Depending upon the configuration a fileserver may provide this support otherwise a dedicated minicomputer, mainframe or supercomputer would be used.

4. Bridges and gateways: in a distributed environment care must be taken not to overload the network and to provide adequate security for sensitive information. Splitting a large network up into a number of semi independent networks linked by bridges and gateways can assist with these problems.

6.2 Performance factors in a distributed environment

Distributed environments can be very complex and important factors include:

1. Performance of the network or networks: this is dependent upon the physical network configuration and speed, the communications protocol used, number of bridges and gateways, etc.

2. The number of user workstations, their distribution over the network(s) together with support fileservers and nodal processors.

3. The size of main memory and disks on the user workstations. The network traffic can be reduced if the operating system and commonly used software is held on local disks (needs careful management of new software releases). A cheaper alternative used to be to have a small disk on the user workstation which held the operating system swop space so at least the operating system did not have to page over the network (also see diskless nodes below).

4. The number of fileservers and their power in terms of processor performance, main memory size and disk I/O performance. The distribution of software packages and user files around the fileservers is critical:

(a) complex intensive centralized tasks could well require a dedicated fileserver, e.g. an advanced database environment or the analysis of large engineering structures using finite element mesh techniques;


(b) spreading the end-user files around the fileservers prevents overloading of particular fileservers (and if a fileserver breaks down some users can still do their work).

5. The number (if any) of diskless nodes. One of the general rules when purchasing disks is the larger the disk the less the cost per byte. One way to reduce costs in a distributed system is to equip selected machines with large disks which then act a 'hosts' to a number of diskless nodes. On start up a diskless node 'boots' the operating system over the network from the 'host' and carries out all disk access over the network. In practice the 'host' machines may be other user workstations or fileservers. An additional advantage was that management of the overall system was made easier with less disks to maintain and update. Diskless system work provided the following factors are taken into consideration:

(a) The network does not become overloaded with traffic from too many diskless nodes; (b) The ratio of diskless to disked host nodes does not become too high, i.e. placing excessive load on the hosts. In practice a ratio of three to one gives reasonable performance, however, systems with ratios of ten to one (or greater) have been implemented with correspondingly poor performance. (c) There is sufficient main memory in the diskless nodes such than excessive swapping/paging (over the network) does not occur in a multiprogramming/virtual memory environment. Sudden degradations in performance can often be observed when new software releases cause the problem of excessive paging as programs increase in size. (d) The network speed is sufficiently high to cope with the overall demands of the workstations. Until the late 1980's this was not a problem, with typical network speeds of 10Mbit/sec and typical professional workstations having a power of 1 to 5 Mips. However, modern machines make diskless nodes impossible without very fast networks.

Clearly great care is needed in configuring a distributed environment with a slight error giving the impression of 'clockwork' powered machines. Common problems (often due to lack of funds) are:

1. too few fileservers for the number of user workstations and/or poor distribution of fileservers across the network;

2. too little main memory on fileservers causing bottlenecks in the accessing of centralized file systems;

3. too many diskless to disked nodes and/or too little main memory in diskless nodes.


7 General requirements, disk backup, disk viruses, etc.

The size and performance of the system(s) required to support end-users tasks depends upon a range of factors:

1. the maximum number of concurrent users (at any time) and the complexity of the tasks performed determines processor power, main memory size, I/O bandwidth, disk speed, etc.;

2. the total number of registered users determines the number of workstations, fileservers and on-line storage;

3. the size of the operating system, utilities, end-user packages and data sets determines the size of the main memory and on-line disk storage;

In the case of a multi-user environment, where user files are stored on centralized fileserver(s), provision must be made for disk backup. Backups would be carried out on a regular basis onto magnetic tape (half inch industry standard, tape streamer, DAT drive, etc.). On many systems (particularly PCs) there arises the problem of users moving information on and off the system via floppy disks and other media. Although provision needs to be made for the transfer of information care needs to be taken. For example, if user machines are equipped with floppy disks the following problems may occur:

1. Users may take copies of programs and data either for their own use or to sell.

2. Users may copy unlicensed programs onto the system thus breaking copyright and making the employer liable to prosecution.

3. Users may consciously or unconsciously (while copying files) copy viruses onto the

system.All of the above actions can lead to serious problems. In particular, viruses create havoc by deleting and corrupting files and disks (educational environments where students move from PC laboratory to PC laboratory are very subject to this problem with often half the machines on a campus having a virus at any moment).

The main way to avoid the above problems is to provide users with no direct facility for copying files to/from movable media, i.e. PCs are not fitted with floppy disks. All disks and tapes brought into an organization are processed by the systems staff who check for illicit copying of files and any viruses.

Other avenues for illicit copying are via connections to external networks or by attaching portable computers to local networks. Rigorous procedures for controlling access to networks (e.g. extensive password protection) and the movement of portable and semi-portable machines can reduce these problems.

8 Conclusions


This paper reviewed a range of issues critical in system performance evaluation:

1. The effect of system and end-user software on the overall requirements of a computer system.

2. Factors effecting overall system performance in terms of CPU power, memory size, data bus size, etc.

3. The techniques used to improve processor performance and how modern integrated circuits have enabled these to be implemented in low to medium cost systems.

4. The range of system configurations (PCs, workstations, multi-user, distributed) with particular attention to factors which are critical in a distributed system.

9 References

Bramer, B, 1989, 'Selection of computer systems to meet end-user requirements', IEEE Computer Aided Engineering Journal, Vol. 6 No. 2, April, pp. 52-58.

Denning, P J, 1970, 'Virtual memory', ACM Computing Surveys, Vol. 2 No. 3, September.

Foster, C C, 1976, 'Computer Architecture', Van Nostrand Reinhold.

Gelsinger, PP, Gargini, P A, Parker, G H, & YU A Y C, 1989, 'Microprocessors circa 2000', IEEE Spectrum, Vol. 26 No. 10, October, pp 43-47.

Hennessy, J L & Jouppi, N P, 1991, 'Computer technology and architecture: an evolving interaction', IEEE Computer, Vol. 24, No. 9, September, pp 18-28.

Nelson, D L & Leach, P J, 1984, 'The Architecture and Applications of the Apollo Domain', IEEE CG&A, April, pp 58-66.

Pratt, V R, 1984, 'Standards and Performance Issues in the Workstations Market', IEEE CG&A, April, pp 71-76.

Stallings, W, 2000, 'Computer organization and architecture', Fifth Edition, Prentice Hall, ISBN 0-130085263-5.

Tanenbaum, A S, 1990, 'Structured Computer Organisation', Prentice-Hall.


WorkStations

Documents

Transcript of WorkStations